Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: link edit

The Auto-

...

Introduction

The auto-whitelist, or AWL, tracks scores for your regular correspondents in a small on-disk database. In 3.0, it is enabled by default and turned off with the configuration setting use_auto_whitelist 0. In earlier versions, it is enabled using the -a flag to spamassassin or spamd.

The AWL is actually a very simple system. In short, the AWL is a score averaging system. It keeps track of the historical average of a sender, and pushes any subsequent mail towards that average.

So if someone that never sent you mail before sends you a mail that scores 20, and then sends you a second mail that would score 2.0 without the AWL, the AWL will push the score up to 11 on the second mail. This is auto blacklisting, based on their past history of spam.

If that same person sent you a mail that scored 0, and then later sent one that scored 7, the AWL would push the score down to 3.5. This is auto-whitelisting based on past history of nonspam.

WhiteList

In March 2014 (rev 1579980), SpamAssassin introduced the TxRep plugin. The new plugin enhances the functionality of AWL, and works around some of its shortcomings.

Introduction

Tracks scores from messages previously received and adjusts the message score, either by boosting messages from senders who send ham or penalizing senders who have sent spam previously. This not only treats some senders as if they were whitelisted but also treats spammers as if they were blacklisted. Each message from a particular sender adjusts the historical total score which can change them from a spammer if they send non-spam messages. Senders who are considered non-spammers can become treated as spammers if they send messages which appear to be spam.

A sender is identified using both their address and the most significant 2 octats of their IP address
(ex: mailer-daemon @club-internet.fr 194.158) This is intended to penalize A "sender" is identified using both the address they sent with, and their IP address, so spam claiming to be From you with forged headers will fail to get through.

But the "auto whitelist" isn't really a whitelist per-se. It does however have a "learning white/blacklist" type behavior as a result of it's averaging.

Sometimes it may appear that the AWL is assigning scores the wrong way; read AwlWrongWay for details.

A simple system is score averaging. It keeps track of the historical average of a sender, and pushes any subsequent mail towards that average.

How Does It Work?

The algorithm works using a local database of entries. Each entry has a key formed by the From: address of the mail, and the IP address it originated at, and . It contains a TOTAL score of messages and a COUNT numberof messages. The MEAN score is TOTAL/COUNT. The current algorithm works as follows:

  1. Compute the SCORE of the message without AWL (auto-whitelist) from RULES 2. Compute AWL DELTA as (MEAN-SCORE)*auto_whitelist_factor 3. Increment historical TOTAL by SCORE 4. Increment historical COUNT by one 5. Set the final score of the message to SCORE+DELTA

auto_whitelist_factor can be tweaked adjusted in the configuration to vary the impact of the MEAN, and you may find this be useful when starting off. The contents of the database can be examined using the program 'tools/check_whitelist'.

If you get the following error:

No Format

 $ /usr/share/doc/spamassassin-tools-3.0.0/tools/check_whitelist
Cannot open file /home/username/.spamassassin/auto-whitelist:

It seems that the check_whitelist program may be trying to open an older-format whitelist file. The check_whitelist program needs to be updated to use the new format (Berkley DB) whitelist and not the older format (pag/dir database).

Here is the solution:

No Format
 TODO: ADD THE SOLUTION 

How do I whitelist someone?

For example:
The first mail received from a sender scores 20 based on RULES. When they send you a second message that scores 2.0 based on RULES, the MEAN will push the score up to 11 ( 20/1 - 2.0 )on the second mail. This is auto blacklisting, based on their past history of spam.

If that same person sent an mail that scored 0, and then later sent one that scored 7, the AWL would push the score down to 3.5. This is auto-whitelisting based on past history of nonspam.

How do I whitelist/blacklist someone?

Not with the this! This is a automatic heuristic system. You want to use the manual whitelist/blacklist. See ManualWhitelist.

Previous version implementation

  • In 3.3, the plugin is not loaded by default.
  • In 3.1, it is a plugin, loaded by default in the file v310.pre.
  • In 3.0, it is enabled by default and turned off with the configuration setting use_auto_whitelist 0.
  • In earlier versions, it is enabled using the -a flag to spamassassin or spamd.

Utility

The auto-whitelist database (when using the DB Format backend) can be examined and pruned using the program 'sa-awl'. NOTE: This was incorrectly written as sa-awlUtil in previous documentation.

...

An enhanced version of that utility is available at

http://www.real-world-systems.com/mail/sa-heatu.html http://www.Real-World-Systems.com/mail/sa-heatu.tar

This tar file contains only the perl source and a small readme file with some usage examples.

Spam Assassin - *H*euristic *E*mail *A*ddress *T*racker Utility
Includes the ability to remove a entries for a single email address.
This is useful when a spammer sends one or more ham messages before sending spam.

Newer version is available at http://truxoft.com/resources/sa-heatu.v4.02.tar.gz Besides other enhancements, it offers the possibility to dump, expire, or delete records by multiple criteria, including regexes. There is no online document describing all the new features, but the command line help and the integrated changelog give detailed information about them (sa-heatu -h resp. sa-heatu --changelog).

...

Discussion

DGerman says: " Notice that I have avoided the use of the string autowhiteList where possible. Even if "we were there first", this it is more than whitelisting."

TODO: verify example calculation, verify auto_whitelist_factor and show default

Most other products use an AutoWhitelist as automation of a manual whitelist, which is created and/or maintained by monitoring outgoing email to add entries to a whitelist without a need for human intervention. In essence, rather than adjusting a sender's average score, as in SpamAssassin, most other products literally automate the addition of names to the manual whitelist. Because autowhitelist-ing is an industry standard term for automation of a whitelist, this feature in SpamAssassin should be re-named to better align with what it actual does, such as maybe AutoReScore? In essence, be aware the term AutoWhitelist means something else on most (if not all) other anti-spam products. – JDR

SpamAssassin has been using the term "autowhitelist" for this feature since about 2001 – I think we were there first (wink) – jmason

From http://wiki.apache.org/spamassassin/ManualWhitelist "the name is a misnomer. The AutoWhitelist is designed as an automatic score averaging system." Maybe score averaging system? -Roger

It seems to me that this would be the place to document the default value of auto_whitelist_factor. I gather from other sources that it's 0.5. – DanielNorton

Yes, the default auto_whitelist_factor is indeed 0.5, as described in the documentation at Mail::SpamAssassin::Plugin::AWL auto_whitelist_factor. – Adam Katz

As to how it faces users, the description can be overridden in the configuration. My (Adam Katz) production systems redescribe AWL so as to remove the confusion:

describe AWL Adjust score towards average for this senderNot with the auto-whitelist! The auto-whitelist is an averaging system. Instead, you want to use the manual whitelist. See ManualWhitelist.