Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason]

...

However, it's arguable that the stop-list makes an assumption that everyone speaks English – in some non-English-language countries, a nonspam corpus may contain no English terms while the spam corpus is mostly-English, in which case those stop-words would actually make good spam signs.

Hence, I don't think it's a good idea to increase the stop-list with additional "common" english words.