Like a Bayesian learning system, SpamAssassin's GeneticAlgorithm requires a corpus of hand-classified mail. Our guidelines are (quoting "masses/CORPUS_POLICY"):

Once you run "mass-check" on a corpus (MassCheck), see the instructions in "CORPUS_SUBMIT" for details of how to verify that the top scorers are not accidental spam that got through.