Page History

...

The benchmark code requires the bucket files to have specific names:

hambucket1.mbox	spambucket1.mbox
hamforget1.mbox	spamforget1.mbox
hambucket2.mbox	spambucket2.mbox
hambucket3.mbox	spambucket3.mbox

I suggest at least 1000 messages per bucket, for sure it should not be less than 200, and maybe even 300 depending on how much autolearning happens in phase 2. Obviously, the more messages you have the more accurate your results will be but your benchmarks will take longer. I tend to use 2000 messages per bucket (1000 in the forget buckets) and my tests take between 40 minutes to 2 hours to finish, depending on the backend and setup.

...

Phase 3:
This phase performs an sa-learn --force-expire.

Phase 34:
This is the forget phase. We use sa-learn to forget all the messages in hamforget1.mbox and then do it again for spamforget1.mbox.

Phase 45:
This is the spamassassin scan phase. Here we scan the hambucket3.mbox and then the spambucket3.mbox using the spamassassin script.

I suggest running each benchmark 3 times to make sure your test is not influenced by other system activities.

TODO: Post the code. The code is semi-ugly and not really ready for general consumption, it is close but I'm just not ready yet. Soon though, if you really want to get the code and run the benchmarks just keep bugging me and I'll eventually get around to packaging it up. In fact, I'll probably end up just putting in directly in the source (MichaelParker)

Here is the code: benchmark.tar.gz It can also be found in the Subversion tree.

You'll need to generate your own ham/spam buckets and place them in the corpus directory and change the username/passwords to match those for your database.

Of course, I'm always glad to hear feedback and suggested changes to the benchmark. It really is something that I came up with on my own and it is possible that I've completely missed the boat. (MichaelParker)CategoryBayes

...

Child pages

Versions Compared

Old Version 2

New Version Current

Key