The Bayes database stores up to a certain number of tokens, configured via bayes_expiry_max_db_size in local.cf (default: 150000 tokens).

Each token has an access time which records when it last contributed to a classification or appeared in a learned email. A mixture of obsolete (often ephemeral) tokens and the most-infrequently seen tokens are occasionally purged, according to a schedule and algorithm explained in the sa-learn documentation.

Thus, even if you force an expiry run every month, it doesn't mean that you only have a month of data; the most important tokens never get purged.

To view the access time of the oldest token in the database: date -r {{sa-learn --dump magic | grep "oldest atime" | cut -f 3 -w}}

[partially adapted from a post by RW to the spamassassin-users mailing list]

  • No labels