Differences between revisions 2 and 3
Revision 2 as of 2009-09-20 23:16:57
Size: 764
Editor: localhost
Comment: converted to 1.6 markup
Revision 3 as of 2016-12-14 14:12:47
Size: 825
Editor: JustinMason
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
[[http://taint.org/2007/08/15/004348a.html|Here are instructions on how to use it]]. Update: this is no longer active, and should not be used.
Line 16: Line 16:
[[http://taint.org/2007/08/15/004348a.html|Here are instructions on how it was used]].

The "sought" ruleset

Our spamtrap network collects multiple hundreds of megabytes of spam per day. Wouldn't it be great if there was a way to feed that directly into a script to automatically extract rules?

This is now possible, and the results are the "sought.cf" ruleset -- an automatically-generated ruleset which seeks good rules directly from the SpamAssassin spamtraps, updated every 4 hours.

Update: this is no longer active, and should not be used.

Gory Details

If you're curious, here is a technical explanation of the algorithm used, and here is an examination of their efficiency against our test corpora. Here are instructions on how it was used.

SoughtRules (last edited 2016-12-14 14:12:47 by JustinMason)