Writing basic rules

Body rules

Wiki Markup

For our first rule, let's start with the simplest type of rules, the basic "body" rule.  These rules search the body of the message with a regular expression and if it matches,  the corresponding score is assigned.

Let's look at a really basic fictitious rule:

  Body rules also include the Subject as the first line of the body content.  See \[DumpTextPlugin\]

Let's look at a really basic fictitious rule:

No Format
body LOCAL_DEMONSTRATION_
No Format
body LOCAL_DEMONSTRATION_RULE /test/ score LOCAL_DEMONSTRATION_RULE 0.1 describe LOCAL_DEMONSTRATION_RULE This is a simple test rule

This rule does a simple case-sensitive search of the body of the email for the string "test" and adds a 0.1 to the score of the email if it finds it. Now, this rule is pretty simple as rules go. It will match "test" but also "testing" and "attest". The describe statement contains the text which will be placed into the verbose report, if verbose reports are used (this is the default setting for the body, in Spamassassin version 2.5x and upwards).

In regular expressions a \b can be used to indicate where a word-break (anything that isn't an alphanumeric character or underscore) must exist for a match. Our rule above can be made to not match "testing" or "attest" like so:

...

http://www.english.uga.edu/humcomp/perl/regex2a.html
http://www.perldoc.com/perl5.6/pod/perlre.html
http://www.troubleshooters.com/codecorn/littperl/perlreg.htm
http://directory.google.com/Top/Computers/Programming/Languages/Regular_Expressions/Perl/

Header rules

You could use your linux box and it's perl documentation:
If you have perl-doc installed you can type at a linux shell prompt:

perldoc perlretut

for a good "from scratch" tutorial. perldoc perlre
for users who already have some regexp experience.

Recommended book to learn the Perl programming language:

Learning Perl, 4th Edition

By biran d foy, Tom Phoenix, Randal L. Schwartz

Publisher: O'Reilly

Header rules

Now let's move on to header rules. Header rules let you check a message header for a string. Most commonly these rules check the Now let's move on to header rules. Header rules let you check a message header for a string. Most commonly these rules check the Subject, From, or To, but they can be written to check any message header, including non-standard ones. Let's pick up our "test" rule and change it into one that checks the subject line.

...

rules with a score set to 0 are not evaluated at all
rules with no score statement will be scored at 1.0, unless 3 or 4 is true
rules starting with a double undescore underscore are evaluated with no score, and are intended for use in meta rules where you don't want the sub-rules to have a score.
although intended for the sa development effort, any rule starting with T_ will be treated as a "test" rule and will be run with a score of 0.01 (nearly 0). This can be handy when testing rules so you don't have to create score lines for them if you think you're not going to keep them.

Last in this section I'll leave you with a word about choosing scores. I'd suggest starting off with a very low score that won't impact messages very much, like 0.1. Watch your rule and make sure it fires when you want and isn't firing when you don't want. Then start increasing the score to make it have more effect, but try not to go overboard. You should be very reluctant to have a custom rule with a score over 1.0 unless you're sure it's not going to hit any nonspam messages. Also keep in mind that you can write rules to only match on non-spam messages and give them negative scores to try to correct false-positive problems. Strong negative scores should also be treated with a bit of caution, but aren't quite as serious. Generally false positives can cause problems as valuable mail might get skipped over, but false-negatives are a minor nuisance, so you can be a bit more liberal with negative scores.

...

The first parameter applies when the Bayesian classifier and network tests are not in use
The second parameter applies when the Bayesian classifier is not in use, but the network tests are
The third parameter applies when the Bayesian classifier is in use, but network tests are not
The fourth parameter applies when the Bayesian classifier and network tests are both in use

Example:

No Format
score LOCAL_DEMONSTRATION_ALL 0.1 0.3 0.3. 0.1

...

No Format
uri LOCAL_URI_EXAMPLE /www\.example\.com\/OrderViagra\// score LOCAL_URI_EXAMPLE 0.1

...

Finding more information and examples

See more advanced description and examples: WritingRulesAdvanced

Perhaps the Perhaps the rule you want has already been written: check out the collection of custom rules at SpamAssassinRules.

...

You can also do some thorough testing of your rules by building your own corpus of spam and nonspam, or downloading a public one, and testing your rules against that corpus using the tools in the masses/ subdirectory of the spamassassin distribution. I'm not going to go into great detail here as it's a bit on the advanced side, but the basic process is to first use mass_check to generate spam and nonspam logs, then use hit_frequencies to generate statistics on a per-rule basis.The public corpus is at http://spamassassin.org/publiccorpus/generate statistics on a per-rule basis.

The public corpus is at http://spamassassin.org/publiccorpus/

Automatic rule generation

A script apparently associated with the SoughtRules automatic score generation can be downloaded from trunk:

No Format
svn checkout http://svn.apache.org/repos/asf/spamassassin/trunk cd trunk/masses/rule-dev ./seek-phrases-in-corpus ham:dir:~/Maildir/ spam:dir:~/Maildir/.bad.spam-missed/

The script contains instructions. You tell it a directory or file containing spam, and a directory or file containing non-spam, and it'll come up with rules that match the spam but not the non-spam.

If you have run MassCheck, you can generate rules for the set of spams that were below the default threshold by doing something like (you'll probably need to change the paths to corpora):

No Format


cd trunk/masses
awk '$2 < 5' < spam-*.log > missed_spam.ids
./mboxget < missed_spam.ids > missed_spam.mbox
cd rule-dev
./seek-phrases-in-corpus ham:dir:$HOME/masscheckwork/ham/ spam:detect:~/masscheckwork/nightly_mass_check/masses/missed_spam.mbox  > seek.out

Thanks

Thanks to Daniel Quinlan, Michael Moncur, Randy Diffenderfer, and many other members of the SA development team and users of the SA for their patient feedback and suggestions.

...

Once you've written and tested a few rules, consider sharing them with us at SpamAssassinRules.CategoryRules

Child pages

Versions Compared

Old Version 24

New Version 135

Key

Writing basic rules

Body rules

Header rules

Header rules

Finding more information and examples

Automatic rule generation

Thanks

Child pages

Page History

Versions Compared

Old Version 24

New Version 135

Key

Writing basic rules

Body rules

Header rules

Header rules

Finding more information and examples

Automatic rule generation

Thanks