Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] update for 3.2.0

Rescore Mass-Check Instructions

(These are the instructions for the now completed re-run of see RescoreDetails310 for historical 3.1.0 mass-checks; see RescoreMassCheck for the overview of the general process in toto. This page left as-is for the next time we have to do it!check documentation.)

Here's the procedure you'll need to follow, if you wish to submit data for the rescoring run for 3.12.0 using MassCheck:

Clean up the corpus of mail you intend to MassCheck (see CorpusCleaning), and get . The 'checking for false positives and false negatives' stage of corpus cleaning can be done after mass-checks complete, if you like.

Get an rsync account (see RsyncAccounts). The latter . If you are submitting nightly mass-check results, the account you use for that will work. Otherwise, getting an account can be done while mass-check is running, btw, since it's not needed until the end; and the 'checking for false positives and false negatives' stage of corpus cleaning can be done afterwards as well.

It's helpful, but not required, to have some or all of the helper applications installed:

  • the Mail::SPF::Query module
  • the Net::DNS module
  • Razor
  • Pyzor

If you're running nightly mass-checks, please feel free to disable them when running the rescore mass-check runs. Also, please note that the nightly submission accounts will work for rescore submissions as well.

Note that it's essential that you mass-check both ham and spam for this run, as otherwise the Bayes rules will be affected.

...

No Format
  wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.12.0-pre4pre??.tar.gz
  tar xvfz Mail-SpamAssassin-3.12.0-pre4pre??.tar.gz
  cd Mail-SpamAssassin-3.12.0
  perl Makefile.PL < /dev/null
  make

  cd masses
  mkdir spamassassin
  rm -f spamassassin/*
  echo "bayes_auto_learn 0" > spamassassin/user_prefs
  echo "lock_method flock" >> spamassassin/user_prefs
  echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
  echo "use_auto_whitelist 0" >> spamassassin/user_prefs

  nohup ./mass-check --progress --bayes --net -j 4 --restart=400 --learn=35 --reuse \
        --after=1072933200 <targets>

...

Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages. (This step is very important.)

Then submit your results!

No Format
  USER="[whatever your username is]"
  RSYNC_PASSWORD="[whatever your password is]"
  export RSYNC_PASSWORD

  rsync -Pcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
  rsync -Pcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log

(note: previously, we used -C on those rsync commands. it should be removed as the current host seems to be running a version of rsync that cannot handle that, giving this error: 'filter rules are too modern for remote rsync. rsync error: syntax or usage error (code 1) at exclude.c(1119)'.)

That's it!

The results for this run will need to be in by Friday July 22nd Tuesday Feb 6th (tentatively). If you're still running then, submit what you have so far and beg for more time. We may be pushing it out a little further anyway depending on how things go (wink)