Like a Bayesian learning system, SpamAssassin's GeneticAlgorithm requires a corpus of hand-classified mail. Our guidelines are (quoting and expanding on "masses/CORPUS_POLICY"):

Once you run MassCheck, see the instructions in CorpusCleaning for details of how to verify that the top scorers are not accidental spam that got through.

(Aside: yes, it's "corpora". See PluralOfCorpus)

Minor things that are nice to have

HandClassifiedCorpora (last edited 2009-09-20 23:16:25 by localhost)