Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] more post-mass-check corpus sanitation tips

...

You can also remove the offending files, or messages from the source mailboxes, directly. (This is advisable as you'll probably wind up mass-checking them again at some point.) However, this depends on what format you use to store messages; Maildirs, mboxes, etc. etc. (Maildirs are easiest, since you can just delete the files named in the 'id.fps' file.)

...

  • ALL_TRUSTED: once a mass-check completes, it's worth grepping the spam.log for ALL_TRUSTED and checking what mails it hits.
  • NO_RELAYS: spam which has no relaying Received headers, and therefore did not traverse the internet, is probably not spam.

See also 'Corrupt Messages' below for other stuff to clear out.

Here's a command line to grep a log for a rule name, and generate an mbox of the results, then open it in "mutt":

No Format

grep 'ALL_TRUSTED' ham.log > grepped.log
./mboxget < grepped.log > mbox
mutt -f mbox

Cleaning Out False Negatives

...

  • BAYES_99: once a mass-check completes, it's worth grepping the ham.log for BAYES_99 and checking what mails it hits.
  • any of the other top-listed rules in the HitFrequencies report, especially network tests such as the SURBL rules

See also 'Corrupt Messages' below for other stuff to clear out.

Corrupt Messages

Occasionally, these will crop up – some MUAs have a tendency to mess up mail messages or folders, making them unsuitable for use with MassCheck. SpamAssassin includes a few rules that can help identify corrupt messages.

...