Like a Bayesian learning system, SpamAssassin's GeneticAlgorithm requires a corpus of hand-classified mail. Our guidelines are (quoting and expanding on "masses/CORPUS_POLICY"):

Once you run "mass-check" on a corpus (MassCheck), see the instructions in "CORPUS_SUBMIT" for details of how to verify that the top scorers are not accidental spam that got through.

(Aside: yes, it's "corpora". See PluralOfCorpus)