SoughtRules

The "sought" ruleset

Our spamtrap network collects multiple hundreds of megabytes of spam per day. Wouldn't it be great if there was a way to feed that directly into a script to automatically extract rules?

This is now possible, and the results are the "sought.cf" ruleset -- an automatically-generated ruleset which seeks good rules directly from the SpamAssassin spamtraps, updated every 4 hours.

[WWW] Here are instructions on how to use it.

Gory Details

If you're curious, [WWW] here is a technical explanation of the algorithm used, and [WWW] here is an examination of their efficiency against our test corpora.

last edited 2008-01-29 10:17:18 by JustinMason