Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: converted to 1.6 markup

...

  • some hits are recorded from 'live' data at the time the messages were received, not post-facto testing (using 'mass-check --reuse')
  • there were 9 people contributing their hit data, from a variety of geographical locations and organisational types
  • Wiki Markupboth \["Ham"\] and \["Spam"\] hitrates are measured, and the corpora were hand-verified in advance
  • the corpora use (relatively) fresh mail, received between January 2004 and July 2005

...

No Format
  3.437   4.8942   0.0396    0.992   0.80    1.38  SPF_SOFTFAIL
  1.006   1.4292   0.0207    0.986   0.71    2.43  SPF_HELO_SOFTFAIL
  2.550   3.5717   0.1676    0.955   0.53    1.14  SPF_FAIL
  2.297   3.2090   0.1695    0.950   0.52    1.07  SPF_NEUTRAL
  1.796   2.5029   0.1488    0.944   0.51    0.00  SPF_HELO_FAIL
  0.935   1.2724   0.1488    0.895   0.43    0.00  SPF_HELO_NEUTRAL
  5.334   2.5925  11.7252    0.181   0.21   -0.00  SPF_HELO_PASS
  3.267   2.6241   4.7654    0.355   0.10   -0.00  SPF_PASS

RFC-ignorant, testing against the envelope sender's domain:

No Format

  3.038   4.3352   0.0132    0.997   0.86    2.60  DNS_FROM_RFC_DSN
  1.174   1.6715   0.0151    0.991   0.75    1.94  DNS_FROM_RFC_BOGUSMX
  3.590   5.0607   0.1620    0.969   0.57    1.45  DNS_FROM_RFC_WHOIS
 13.930  19.7071   0.4615    0.977   0.47    1.71  DNS_FROM_RFC_POST
 12.120  16.7154   1.4051    0.922   0.34    0.20  DNS_FROM_RFC_ABUSE

other network rules:

No Format

  1.898   2.7081   0.0094    0.997   0.82    3.20  NO_DNS_FOR_FROM
  1.449   2.0593   0.0245    0.988   0.74    1.51  DNS_FROM_SECURITYSAGE
  7.200  10.0898   0.4615    0.956   0.44    0.23  DNS_FROM_AHBL_RHSBL

More details of the source mass-check log files and test procedure can be read in SpamAssassin bug 4505. the full More details of the source mass-check log files and test procedure can be read in SpamAssassin bug 4505. the full list of freqs can be found in the STATISTICS-set3.txt file in the 3.1.0 release. Here's a list of the data files used. Note that only a randomly-chosen one tenth of each file was used.

Use of --reuse for real-time network results: confirmed on: 2 4 users (bmenschel, jm, parker, cthielen); confirmed off: 1 user (duncf); unknown: 6 users4 users (bzoetekouw, misak, quinlan, theo).

No Format
bash-3.00$ ls -l /home/corpus-rsync/corpus/submit/
total 2839184
-r--r--r--   1 rsync    rsync    7967268 Jul 16 18:18 ham-bayes-net-bzoetekouw.log
-r--r--r--   1 rsync    rsync    1987090 Jul 16 14:49 ham-bayes-net-cthielen.log
-r--r--r--   1 rsync    rsync    23284450 Jul 24 08:04 ham-bayes-net-daf.log
-r--r--r--   1 rsync    rsync    51469171 Jul 19 02:26 ham-bayes-net-jm.log
-r--r--r--   1 rsync    rsync    45026386 Jul 19 02:27 ham-bayes-net-jm2.log
-r--r--r--   1 rsync    rsync     294744 Jul 25 18:57 ham-bayes-net-misak.log
-r--r--r--   1 rsync    rsync    22130676 Jul 27 04:17 ham-bayes-net-parkerm.log
-r--r--r--   1 rsync    rsync    14056970 Jul 27 19:37 ham-bayes-net-quinlan.log
-r--r--r--   1 rsync    rsync    8603737 Jul 27 17:01 ham-bayes-net-rod.log
-r--r--r--   1 rsync    rsync    28410747 Jul 27 02:34 ham-bayes-net-theo.log
-r--r--r--   1 rsync    rsync    62685697 Jul 16 18:22 spam-bayes-net-bzoetekouw.log
-r--r--r--   1 rsync    rsync    11891366 Jul 16 14:50 spam-bayes-net-cthielen.log
-r--r--r--   1 rsync    rsync    96553037 Jul 24 08:09 spam-bayes-net-daf.log
-r--r--r--   1 rsync    rsync    28662170 Jul 19 02:28 spam-bayes-net-jm.log
-r--r--r--   1 rsync    rsync    209202453 Jul 19 02:34 spam-bayes-net-jm2.log
-r--r--r--   1 rsync    rsync     243487 Jul 25 18:57 spam-bayes-net-misak.log
-r--r--r--   1 rsync    rsync    39357821 Jul 27 04:19 spam-bayes-net-parkerm.log
-r--r--r--   1 rsync    rsync    41987897 Jul 27 19:39 spam-bayes-net-quinlan.log
-r--r--r--   1 rsync    rsync    97404262 Jul 27 17:03 spam-bayes-net-rod.log
-r--r--r--   1 rsync    rsync    358576609 Jul 27 02:34 spam-bayes-net-theo.log