Differences between revisions 3 and 4
Revision 3 as of 2004-12-01 20:26:12
Size: 1099
Editor: JustinMason
Comment:
Revision 4 as of 2009-09-20 23:16:57
Size: 1103
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
[http://jgc.org/ John Graham-Cumming] proposed this uniform measure of spam-filter effectiveness in [http://www.jgc.org/antispam/11162004-baafcd719ec31936296c1fb3d74d2cbd.pdf his November 16, 2004 article entitled 'Understanding Spam Filter Accuracy']. [[http://jgc.org/|John Graham-Cumming]] proposed this uniform measure of spam-filter effectiveness in [[http://www.jgc.org/antispam/11162004-baafcd719ec31936296c1fb3d74d2cbd.pdf|his November 16, 2004 article entitled 'Understanding Spam Filter Accuracy']].

Spam Filter Batting Average

John Graham-Cumming proposed this uniform measure of spam-filter effectiveness in his November 16, 2004 article entitled 'Understanding Spam Filter Accuracy'.

Essentially, it's a reformatting of the FalsePositive percentage and FalseNegative percentage, as 'spam hit rate / ham strike rate'. This can be computed from FP%/FN% as follows:

  let fp = false positive percentage
  let fn = false negative percentage
  batting average hitrate = (1 - (fn / 100))
  batting average strikerate = (fp / 100)
  batting average = "hitrate/strikerate"

so if you have an FP% of 0.03%, and an FN% of 2.47%, the batting average is

  (1 - (2.47 / 100)) "/" (0.03 / 100) =
  .9753/.0003

That's actually the correct batting average for SpamAssassin 3.0.0's scoreset 3, measured against the validation corpus when we released it. ;)

See also MeasuringAccuracy for other schemes used, or FpFnPercentages for the main one we use in SpamAssassin.

BattingAverage (last edited 2009-09-20 23:16:57 by localhost)