You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Rules Project: Streamlining the rules process

(part of RulesProjectPlan)

Problem description: 'People that do write rules for their own use are not willing to go through the fairly elaborate process in order to submit them to SpamAssassin (this currently requires rules to go through bugzilla and then through 70_testing.cf and eventually into our distribution). What can we do to make this process easier and more inviting?'

First off, the sandboxes idea greatly increases the number of people who can check rules into SVN. Secondly, the barriers to entry for getting a sandboxes account are much lower.

Some bulletpoints from discussion, needs expanding:

sandbox:

  • each user gets their own sandbox as discussed on RulesProjMoreInput
  • checked-in rules in the sandboxes are mass-checked in the nightly mass-checks
  • to migrate a rule from "sandbox" (dev) to "core" (production) ruleset uses C-T-R; ie. votes are not required in advance
  • C-T-R to migrate from "sandbox" to "extra" ruleset

Rules that get promoted from a "sandbox" to "core" should pass the following criteria:

  • S/O ratio of 0.95 or greater (or 0.05 or less for nice rules)
  • > 0.25% of target type hit (e.g. spam for non-nice rules)
  • < 1.00% of non-target type hit (e.g. ham for non-nice rules)
  • not too slow (wink)
  • TODO: criteria for overlap with existing rules? BobMenschel: The method I used for weeding out SARE rules that overlapped 3.0.0 rules, was to run a full mass-check with overlap analysis, and throw away anything where the overlap is less than 50%. Manually reviewing the remaining (significantly) overlapping rules was fairly easy. The command I use is: perl ./overlap ../rules/tested/$testfile.ham.log ../rules/tested/$testfile.spam.log | grep -v mid= | awk ' NR == 1 { print } ; $2 + 0 == 1.000 && $3 + 0 >= 0.500 { print } ' >../rules/tested/$testfile.overlap.out

A ruleset in the "extra" set would have different criteria.

  • DanielQuinlan suggested: The second, a collection that do not qualify for rules/core. For example, SpamAssassin intentionally doesn't filter virus bounces (yet, at least), but there is a good virus bounce ruleset out there.
  • BobMenschel: Similarly, an "extra" rules set might include rules that positively identify spam from spamware, but hit <0.25% of spam. Or an "aggressive" rules set might include rules that hit with an S/O of only 0.89, but push a lot of spam over the 5.0 threshold without impacting significantly on ham.

We can also vote for extraordinary stuff that doesn't fit into those criteria...

private list for mass-checks:

  • archives delayed 1 month?
  • moderated signups
  • automated mass-checks of attachments in specific file format
  • rules considered suitable for use are checked into the "sandbox" area for a quick nightly-mass-check, for release
  • No labels