...
Note that it's essential that you mass-check both ham and spam for this run, as otherwise the Bayes rules will be affected.
What To Run
Then run these commands:
No Format |
---|
wget http://people.apache.org/~jm/mcsnapshot.tgz
tar xvfz mcsnapshot.tgz
cd mcsnapshot
perl Makefile.PL < /dev/null
make
cd masses
mkdir spamassassin
rm -f spamassassin/*
echo "bayes_auto_learn 0" > spamassassin/user_prefs
echo "lock_method flock" >> spamassassin/user_prefs
echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
echo "use_auto_whitelist 0" >> spamassassin/user_prefs
echo "whitelist_bounce_relays example.com" >> spamassassin/user_prefs
nohup ./mass-check --progress --bayes --net -j 4 --restart=400 --learn=35 --reuse \
--after=1072933200 <targets>
|
Explanation
<targets>
is the list of directories, mboxes, etc., like
spam:dir:~/Mail/spam
. See the comments at the top of "mass-check" for details.
...
If you have an unusual network layout, you may need to specify
trusted_networks
and/or internal_networks
in the
spamassassin/user_prefs
file. But SA should be able to infer it in most cases. A good way to tell is if you see no SPF_PASS results – SPF will not be used if the message passes through one or more trusted relays.
whitelist_bounce_relays example.com
is an (optional) bit of configuration, which will highlight bounce messages in your corpora. You probably don't want these in your corpora – unless you're certain they're good, non-spam bounces, generated in response to a mail you really did send. Feel free to insert the name(s) of your genuine MSA relays here, if you like, to whitelist the "good" bounces in your corpus.
Once That's Done
Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages. (This step is very important.)
...