Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason]

Rules Project:

...

sa-update and rules release cycle

(part of RulesProjectPlan)

...

TODO: need more detail here. Ping, Theo!

Scoring

The primary reason for the long release time for new rules is the need to score new rules (and rescore old rules) to generate optimal scores, to flag as much spam as reasonably, while keeping false positives to a conservative minimum.

...

DanielQuinlan favored the first options, saying, "That would not be too hard and would be more accurate than any estimation technique. There is definitely a correlation between hit rates, S/O ratio, RANK, etc. to the ultimate perceptron-generated score, but the correlations are not all that high, unfortunately."

JustinMason: yes, agreed; perceptron just does a better job, every time. Having said that, we don't need to institute a policy requiring regular perceptron runs; we can actually measure false positive rates across an entire corpus, using the 'fp-fn-statistics' masses tool, and get an idea of whether the current scoreset is FP-prone or FN-prone as a whole (indicating that the perceptron needs to be run soon), or not.

in other words, let's defer on making this a task right now (wink)