|
⇤ ← Revision 1 as of 2008-01-29 10:17:18
Size: 758
Comment: page describing the sought.cf ruleset
|
← Revision 2 as of 2009-09-20 23:16:57 ⇥
Size: 764
Comment: converted to 1.6 markup
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 11: | Line 11: |
| [http://taint.org/2007/08/15/004348a.html Here are instructions on how to use it]. | [[http://taint.org/2007/08/15/004348a.html|Here are instructions on how to use it]]. |
| Line 15: | Line 15: |
| If you're curious, [http://taint.org/2007/03/05/134447a.html here is a technical explanation of the algorithm used], and [http://taint.org/2007/08/04/200125a.html here is an examination of their efficiency against our test corpora]. | If you're curious, [[http://taint.org/2007/03/05/134447a.html|here is a technical explanation of the algorithm used]], and [[http://taint.org/2007/08/04/200125a.html|here is an examination of their efficiency against our test corpora]]. |
The "sought" ruleset
Our spamtrap network collects multiple hundreds of megabytes of spam per day. Wouldn't it be great if there was a way to feed that directly into a script to automatically extract rules?
This is now possible, and the results are the "sought.cf" ruleset -- an automatically-generated ruleset which seeks good rules directly from the SpamAssassin spamtraps, updated every 4 hours.
Here are instructions on how to use it.
Gory Details
If you're curious, here is a technical explanation of the algorithm used, and here is an examination of their efficiency against our test corpora.
