...
- some hits are recorded from 'live' data at the time the messages were received, not post-facto testing (using 'mass-check --reuse')
- there were 9 people contributing their hit data, from a variety of geographical locations and organisational types
both \["Ham"\] and \["Spam"\] hitrates are measured, and the corpora were hand-verified in advanceWiki Markup - the corpora use (relatively) fresh mail, received between January 2004 and July 2005
...
No Format |
---|
3.437 4.8942 0.0396 0.992 0.80 1.38 SPF_SOFTFAIL 1.006 1.4292 0.0207 0.986 0.71 2.43 SPF_HELO_SOFTFAIL 2.550 3.5717 0.1676 0.955 0.53 1.14 SPF_FAIL 2.297 3.2090 0.1695 0.950 0.52 1.07 SPF_NEUTRAL 1.796 2.5029 0.1488 0.944 0.51 0.00 SPF_HELO_FAIL 0.935 1.2724 0.1488 0.895 0.43 0.00 SPF_HELO_NEUTRAL 5.334 2.5925 11.7252 0.181 0.21 -0.00 SPF_HELO_PASS 3.267 2.6241 4.7654 0.355 0.10 -0.00 SPF_PASS |
RFC-ignorant, testing against the envelope sender's domain:
No Format |
---|
3.038 4.3352 0.0132 0.997 0.86 2.60 DNS_FROM_RFC_DSN
1.174 1.6715 0.0151 0.991 0.75 1.94 DNS_FROM_RFC_BOGUSMX
3.590 5.0607 0.1620 0.969 0.57 1.45 DNS_FROM_RFC_WHOIS
13.930 19.7071 0.4615 0.977 0.47 1.71 DNS_FROM_RFC_POST
12.120 16.7154 1.4051 0.922 0.34 0.20 DNS_FROM_RFC_ABUSE
|
other network rules:
No Format |
---|
1.898 2.7081 0.0094 0.997 0.82 3.20 NO_DNS_FOR_FROM
1.449 2.0593 0.0245 0.988 0.74 1.51 DNS_FROM_SECURITYSAGE
7.200 10.0898 0.4615 0.956 0.44 0.23 DNS_FROM_AHBL_RHSBL
|
More details of the source mass-check log files and test procedure can be read in SpamAssassin bug 4505. the full More details of the source mass-check log files and test procedure can be read in SpamAssassin bug 4505. the full list of freqs can be found in the STATISTICS-set3.txt file in the 3.1.0 release. Here's a list of the data files used. Note that only a randomly-chosen one tenth of each file was used.
Use of --reuse for real-time network results: confirmed on: 2 4 users (bmenschel, jm, parker, cthielen); confirmed off: 1 user (duncf); unknown: 6 users4 users (bzoetekouw, misak, quinlan, theo).
No Format |
---|
bash-3.00$ ls -l /home/corpus-rsync/corpus/submit/ total 2839184 -r--r--r-- 1 rsync rsync 7967268 Jul 16 18:18 ham-bayes-net-bzoetekouw.log -r--r--r-- 1 rsync rsync 1987090 Jul 16 14:49 ham-bayes-net-cthielen.log -r--r--r-- 1 rsync rsync 23284450 Jul 24 08:04 ham-bayes-net-daf.log -r--r--r-- 1 rsync rsync 51469171 Jul 19 02:26 ham-bayes-net-jm.log -r--r--r-- 1 rsync rsync 45026386 Jul 19 02:27 ham-bayes-net-jm2.log -r--r--r-- 1 rsync rsync 294744 Jul 25 18:57 ham-bayes-net-misak.log -r--r--r-- 1 rsync rsync 22130676 Jul 27 04:17 ham-bayes-net-parkerm.log -r--r--r-- 1 rsync rsync 14056970 Jul 27 19:37 ham-bayes-net-quinlan.log -r--r--r-- 1 rsync rsync 8603737 Jul 27 17:01 ham-bayes-net-rod.log -r--r--r-- 1 rsync rsync 28410747 Jul 27 02:34 ham-bayes-net-theo.log -r--r--r-- 1 rsync rsync 62685697 Jul 16 18:22 spam-bayes-net-bzoetekouw.log -r--r--r-- 1 rsync rsync 11891366 Jul 16 14:50 spam-bayes-net-cthielen.log -r--r--r-- 1 rsync rsync 96553037 Jul 24 08:09 spam-bayes-net-daf.log -r--r--r-- 1 rsync rsync 28662170 Jul 19 02:28 spam-bayes-net-jm.log -r--r--r-- 1 rsync rsync 209202453 Jul 19 02:34 spam-bayes-net-jm2.log -r--r--r-- 1 rsync rsync 243487 Jul 25 18:57 spam-bayes-net-misak.log -r--r--r-- 1 rsync rsync 39357821 Jul 27 04:19 spam-bayes-net-parkerm.log -r--r--r-- 1 rsync rsync 41987897 Jul 27 19:39 spam-bayes-net-quinlan.log -r--r--r-- 1 rsync rsync 97404262 Jul 27 17:03 spam-bayes-net-rod.log -r--r--r-- 1 rsync rsync 358576609 Jul 27 02:34 spam-bayes-net-theo.log |