Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: missing edit-log entry for this revision

...

A good rule has a very extreme S/O (near as possible to 1.0 or 0.0) and a high percentage of hits in the correct category. In other words,
RCVD_IN_OPM_HTTP is a very good rule, because it hits 5.2028% of all spam mails without hitting any ham at all (no false positives).

"freqs" is the best way to determine a rule's usefulness, since it immediately shows up any false-positive issues. The development team run a nightly mass-check and freqs report from the rules in CVS to test them, with several people scanning their corpora, and the results are put up at: http://www.pathname.com/~corpus/ .

Measuring Rule Overlap

There's one more tool to determine how much 2 rules overlap with each other – "overlap". This is occasionally useful if you suspect that two rules are redundant, checking the same data or hitting exactly the same messages as each other. Take a look at the comments at the top of the "masses/overlap" script for details on how to run this against one or more "mass-check" output log files.