ROC Graphs

Wikipedia says:

In signal detection theory, a receiver operating characteristic (ROC) is a graphical plot of the sensitivity vs. 1-specificity for a binary classifier system as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the fraction of true positives (TP) vs. the fraction of false positives (FP). The usage receiver operator characteristic is also common.

ROC curves are used to evaluate the results of a prediction and were first employed in the study of discriminator systems for the detection of radio signals in the presence of noise in the 1940s. In the 1960s they began to be used in psychophysics, to assess human (and occasionally animal) detection of weak signals. They also proved to be useful for the evaluation of machine learning results, such as the evaluation of Internet search engines. They are also used extensively in epidemiology and medical research.

In SpamAssassin terms, a ROC graph is the FpFnPercentages values, graphed over multiple thresholds. There's a script in SVN used to measure this at masses/mk-roc-graphs; here's sample output:

http://taint.org/xfer/2005/roc_curves_with_3045.png

See also MeasuringAccuracy for other methods used to measure spamfilter accuracy.

  • No labels