How do I get SpamAssassin to run faster?

General Advice

Examine the custom rule set files you use:

Examine the custom rules you create, or have downloaded from third parties. Poorly-written regular expressions can use resources exponentially. Avoid body, rawbody, or full rules that use + or * quantifiers.

Use spamd.

Use sa-compile if you're running SA 3.2.x or later.

If you are using network tests, install a local DNS server (BIND named, for example) on the same host to cache responses, and set the /etc/resolv.conf file to use that instead of one on another machine. See CachingNameserver.

If you're seeing periodic load spikes, it could be because the Bayes database is re-syncing and expiring old tokens. See BayesForceExpire for a workaround.

If you are memory-bound

If the spamd processes are eating up all the RAM on your machine, then you are memory-bound.

Are you experiencing high system load or possibly swapping? Look at the number of children you have spawned, and compare that to the available memory (by default each child can use 20-30 megs of RAM). Depending on load you might find success in lowering the number of children that are spawned (see -m in the spamd documentation).

If you are running with network tests active, you may be able to reduce memory load by turning off some of them. See "Network related" below.

See also OutOfMemoryProblems.

If you are I/O-bound

For heavily loaded servers, you may be experiencing high iowait times depending on how hard you are hitting your disk. You can try offloading the logging and bayes disk writes to a separate disk, or even disabling Bayes rules entirely with use_bayes 0.

If the auto-whitelist is in use (user config dirs contain files named "auto-whitelist"), you should turn that off; it provides a marginal gain in accuracy for quite a bit of I/O load. Set use_auto_whitelist 0.

If you are CPU-bound

If your server is being limited by CPU load:

Use sa-compile.

Remove custom rule sets, as detailed above. Seriously.

See OutOfMemoryProblems. Much of the advice applies for CPU-bound machines, too.

If you are UsingNetworkTests, install a local DNS server (BIND named, for example) on the same host to cache responses, and set the /etc/resolv.conf file to use that instead of one on another machine. See CachingNameserver.

External network tests often take long time (compared to a non-network installation). See UsingNetworkTests for general hints.

Consider turning off network tests, and running with "-L", if you can afford a large drop in accuracy. This is not a very good option for most people though, and while it will reduce system memory load by reducing the number of simultaneous processes, it will increase system CPU load, so be warned! See NetworkTestsLatency for more info.

Finding slow rules

Save http://svn.apache.org/repos/asf/spamassassin/trunk/masses/plugins/HitFreqsRuleTiming.pm to ~/.spamassassin/

In ~/.spamassassin/user_prefs add a line containing: loadplugin !HitFreqsRuleTiming !HitFreqsRuleTiming.pm

Run spamassassin on an email.

Find timing.log in your current directory, containing the number of seconds each rule took to run, in the 2nd and 3rd columns.

Warning: Enabling this will almost certainly slow down SpamAssassin. Possibly a lot (I haven't checked).

More general advice

SA should log per-message scan times to the system log. From that you should be able to determine a message recipient and message-ID for messages that take a long time to scan.

You may also be experiencing inbound traffic volume spikes that may be overloading your system.

So, if you can track when the processing spikes occur, two things to correlate are the number of messages you're processing at that time (you may have too many spamd children defined, or not enough memory - are you hitting swap?), and any specific messages received at that time that take an unusually long time to scan (they may be exercising weaknesses in the rules).

Older versions have had rules that were known to perform poorly in certain situations and that have been fixed in newer versions.

FasterPerformance (last edited 2012-02-28 15:41:36 by c-75-68-127-94)