Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Setting up Site-Wide Bayesian Filtering

1) Using DB_File / BerkeleyDB database (slow performance)

Slow but easy to use if you have very little traffic.

In local.cf, tell SpamAssassin where to find the Bayesian database files:

...

This tells the system that the Bayesian filter database files will be /var/spamassassin/bayes_db/bayes_msgcount, seen and _toks. Feel free to move the database wherever you want. Please note this directory needs to be RWX for all users that SpamAssassin will be executed as, or R-X if autolearning and automatic expiry are disabled; many use world RWX to simplify this, but this is insecure and not recommended. The directory also shouldn't contain any files other than your bayes database. If it contains any other files that start with "bayes" (or whatever other filename prefix you specified) it can break the database locking mechanisms SpamAssassin uses.

2) Using SQL database (good performance)

Decend performance, but requires database server. See sql/README.bayes in the release.

http://svn.apache.org/repos/asf/spamassassin/trunk/sql/README.bayes

3) Using new Redis database (extremely fast performance)

You should try to use the new Redis backend always. It is by far the fastest backend and easiest to maintain, very little moving parts and it will handle autoexpiring by itself.

See: http://svn.apache.org/repos/asf/spamassassin/trunk/contrib/HOWTO.Bayes-Redis/



Now start feeding the Bayesian filter spam and ham messages.

...