Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] add whitelist_bounce_relays; ignore if you've already started mass-checks

...

Note that it's essential that you mass-check both ham and spam for this run, as otherwise the Bayes rules will be affected.

What To Run

Then run these commands:

No Format
  wget http://people.apache.org/~jm/mcsnapshot.tgz
  tar xvfz mcsnapshot.tgz
  cd mcsnapshot
  perl Makefile.PL < /dev/null
  make

  cd masses
  mkdir spamassassin
  rm -f spamassassin/*
  echo "bayes_auto_learn 0" > spamassassin/user_prefs
  echo "lock_method flock" >> spamassassin/user_prefs
  echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
  echo "use_auto_whitelist 0" >> spamassassin/user_prefs
  echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs

  nohup ./mass-check --progress --bayes --net -j 4 --restart=400 --learn=35 --reuse \
        --after=1072933200 <targets>

Explanation

<targets> is the list of directories, mboxes, etc., like
spam:dir:~/Mail/spam. See the comments at the top of "mass-check" for details.

...

If you have an unusual network layout, you may need to specify
trusted_networks and/or internal_networks in the
spamassassin/user_prefs file. But SA should be able to infer it in most cases. A good way to tell is if you see no SPF_PASS results – SPF will not be used if the message passes through one or more trusted relays.

whitelist_bounce_relays example.com is an (optional) bit of configuration, which will highlight bounce messages in your corpora. You probably don't want these in your corpora – unless you're certain they're good, non-spam bounces, generated in response to a mail you really did send. Feel free to insert the name(s) of your genuine MSA relays here, if you like, to whitelist the "good" bounces in your corpus.

Once That's Done

Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages. (This step is very important.)

...