Differences between revisions 60 and 61
Revision 60 as of 2012-09-09 13:53:05
Size: 4607
Editor: Darxus
Comment: Internally link trusted_networks stuff, make it sound more important
Revision 61 as of 2012-10-26 17:10:02
Size: 4725
Editor: Darxus
Comment: Update from externally hosted auto-mass-check script to automasscheck-minimal in trunk
Deletions are marked like this. Additions are marked like this.
Line 16: Line 16:
 1. Download the auto-mass-check script ([[http://git.fedorahosted.org/git/?p=auto-mass-check.git;a=summary|browse repo]]):
  . {{{
  git clone git://git.fedorahosted.org/auto
-mass-check.git
}}}
 1. Copy `auto-mass-check/auto-mass-check
.sh` to `~/bin/`
 1. Copy `auto-mass-check/auto-mass-check.cf` to `~/.auto-mass-check.cf`
 1. Modify `~/.auto-mass-check.cf` to point at your ham and spam folders. Be sure to configure properly for mbox (mbox) or Maildir (dir) folder formats. Leave the RSYNC options unchanged for now, because you will be running auto-mass-check in test mode at first.
 1. The masschecker is set to run 8 threads by default. Depending on your box's resources this could too much or too little. Modify your auto-mass-check.sh to set this and changes JOBS=8 as needed. NOTE: there are several JOBS=X entries for each masscheck type that each need to be changed.
 1. Set [[TrustPath|TRUSTED_NETWORKS and INTERNAL_NETWORKS]] in ~/.auto-mass-check.cf if necessary. Without this your network tests use the wrong relay.
 1. Download [[http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/automasscheck-minimal.sh?view=co|automasscheck-minimal.sh]] and [[http://svn.apache.org/viewvc/spamassassin/trunk/masses/contrib/automasscheck-minimal/automasscheck-minimal.cf.dist?view=co|automasscheck-minimal.cf.dist]]
 1. Copy `automasscheck-minimal.sh` to `~/bin/`
 1. Copy `automasscheck-minimal.cf.dist` to `~/.automasscheck-minimal.cf`
 1. Modify `~/.automasscheck-minimal.cf` to point at your ham and spam folders. Be sure to configure properly for mbox (mbox) or Maildir (dir) folder formats. Leave the RSYNC options unchanged for now, because you will be running automasscheck in test mode at first.
 1. The masschecker is set to run 8 threads by default. Depending on your box's resources this could too much or too little. Modify your automasscheck-minimal.sh to set this and changes JOBS=8 as needed. NOTE: there are several JOBS=X entries for each masscheck type that each need to be changed.
 1. Set [[TrustPath|TRUSTED_NETWORKS and INTERNAL_NETWORKS]] in ~/.automasscheck-minimal.cf if necessary. Without this your network tests use the wrong relay.
Line 26: Line 23:
 1. Run `auto-mass-check.sh`.  1. Run `automasscheck-minimal.sh`.
Line 31: Line 28:
  * Repeat auto-mass-check until you are certain both folders are cleaned.
 1. Edit `~/.auto-mass-check.cf` and set RSYNC_USERNAME and RSYNC_PASSWORD with values from step 1.
 1. Run `auto-mass-check.sh`, which will upload your results.
  * Repeat automasscheck until you are certain both folders are cleaned.
 1. Edit `~/.automasscheck-minimal.cf` and set RSYNC_USERNAME and RSYNC_PASSWORD with values from step 1.
 1. Run `automasscheck-minimal.sh`, which will upload your results.
Line 35: Line 32:
 1. If your upload looks good, then you're probably ready to automate nightly checks. Configure auto-mass-check to run as a cron job as your non-root user at or after 9AM UTC. (After weekly-versions.txt / nightly-versions.txt gets updated in rsync.spamassassin.org::corpus . If you run it earlier it will [[https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6753|break things]]. )

([[https://fedorahosted.org/auto-mass-check/|External documentation for auto-mass-check script.]])
 1. If your upload looks good, then you're probably ready to automate nightly checks. Configure automasscheck to run as a cron job as your non-root user at or after 9AM UTC. (After weekly-versions.txt / nightly-versions.txt gets updated in rsync.spamassassin.org::corpus . If you run it earlier it will [[https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6753|break things]]. )
Line 45: Line 40:

[[https://fedorahosted.org/auto-mass-check/|Old auto-mass-check script]]

Nightly Mass-Check Runs

What?

Nightly MassCheck runs are the way people submit data on the effectiveness of current rules on their recent spam and ham. It is used to generate the very rule scores that determine the effectiveness of SpamAssassin (distributed via sa-update), and to evaluate rules via the RuleQaApp. The accuracy of SpamAssassin is directly related to the number of people contributing to nightly MassChecks.

This does not require sending us your email. Just logs of which rules hit your emails.

Usually a script is run from cron which automatically downloads the latest development version of SpamAssassin, runs it against your spam and ham, and then uploads a log of the results. One line per email, with a list of the SpamAssassin rules each email hit. Your actual email is not uploaded with this method.

An advantage to participating is that it makes SpamAssassin more accurate on your email.

How?

  1. Send an email to private@spamassassin.apache.org requesting an rsync account for nightly mass-checks. It is helpful if you include a few sentences with your background and expertise for vetting purposes. NOTE: New masscheck contributors are now being accepted since about 2012-08-09.

  2. When your request is processed, you'll be notified and added to the RuleQA Mailing List for SpamAssassin.

  3. Ensure SpamAssassin and its plugins are fully installed.

  4. Download automasscheck-minimal.sh and automasscheck-minimal.cf.dist

  5. Copy automasscheck-minimal.sh to ~/bin/

  6. Copy automasscheck-minimal.cf.dist to ~/.automasscheck-minimal.cf

  7. Modify ~/.automasscheck-minimal.cf to point at your ham and spam folders. Be sure to configure properly for mbox (mbox) or Maildir (dir) folder formats. Leave the RSYNC options unchanged for now, because you will be running automasscheck in test mode at first.

  8. The masschecker is set to run 8 threads by default. Depending on your box's resources this could too much or too little. Modify your automasscheck-minimal.sh to set this and changes JOBS=8 as needed. NOTE: there are several JOBS=X entries for each masscheck type that each need to be changed.
  9. Set TRUSTED_NETWORKS and INTERNAL_NETWORKS in ~/.automasscheck-minimal.cf if necessary. Without this your network tests use the wrong relay.

  10. Ensure there is no router/firewall blocking port 873 (used by rsync protocol)
  11. Run automasscheck-minimal.sh.

    • Look in ~/masscheckwork/nightly_mass_check/masses/ for ham-*.log and spam-*.log files. (Or weekly_mass_check on Saturday.)

    • Are the filenames good? They should be named something like ham-username.log or ham-net-username.log.

    • Read CorpusCleaning and HandClassifiedCorpora for guidelines of how to identify ham in your spam folder, and spam in your ham folder, and which messages you should be simply deleted.

    • If you move/delete messages, do not forget to "Compact Folder" to be sure they are actually gone.
    • Repeat automasscheck until you are certain both folders are cleaned.
  12. Edit ~/.automasscheck-minimal.cf and set RSYNC_USERNAME and RSYNC_PASSWORD with values from step 1.

  13. Run automasscheck-minimal.sh, which will upload your results.

  14. Ask a more experienced participant (probably the person who recruited you) to check your results on the server. They can see the uploaded log files by running a command like rsync --old-d username@rsync.spamassassin.org::corpus/. You can also verify that your corpora show up on http://ruleqa.spamassassin.org/ - the green box near the top shows all included usernames.

  15. If your upload looks good, then you're probably ready to automate nightly checks. Configure automasscheck to run as a cron job as your non-root user at or after 9AM UTC. (After weekly-versions.txt / nightly-versions.txt gets updated in rsync.spamassassin.org::corpus . If you run it earlier it will break things. )

Alternative Methods

The easiest of all methods is to upload your corpora and let us process it for you: UploadedCorpora

The corpus-nightly script is a less maintained alternative to the auto-mass-check script: CorpusNightlyScript

Or you can do it manually: ManualNightlyMassCheck

Old auto-mass-check script

NightlyMassCheck (last edited 2012-10-26 17:10:02 by Darxus)