Rules Project: Encouraging Contributions

(part of RulesProjectPlan)

Problem description: 'The SpamAssassin committers are not spending much time writing rules. Attempts to recruit people to become committers to write rules have been somewhat unsuccessful. We could always use more committers and contributors; what can we do to encourage more contribution?'

Here's some ideas.

Sandboxes

See RulesProjectSandboxes.

Mass-checking

LorenWilton noted 'A big part (perhaps the biggest part) of rules development is the mass check. Most anyone can develop a rule on their home system and see how they *think* it works. Some few (but not many) people can do a mass-check on their home system and see how it *really* works - *for them*. As proposed, this rules project doesn't address the most important part of a rules project -
some way to check the rules against a fairly large corpus.'

Nightly Mass-Checks

We currently have the NightlyMassCheck systems which do this, but turnaround time is too slow for most rule developers.

It does however offer the following good aspects:

The ruleqa CGI is now in the SpamAssassin zone, so this is still visible, even though the automc stuff is disabled. Here it is: http://buildbot.spamassassin.org/ruleqa/

'LOAFER': Suggestion: It would be good to know the % runtime figure for a sandbox rule as a missing boundary can take a rule from 1.5% to 0.0n% performance hit easily:

 perl -d:DProf mass-check -j=1 spam:dir:some_reasonable_sample_set_including_hits_and_misses
 dprofpp -O 2000 > perf.log

JustinMason: Agreed, this would be useful.

Someway of scheduling a small run during the development day would be useful, rather than waiting for the nightly. An email of users completed results would be nice to see too.

JustinMason: I think the more immediate, email-based, system is better done using List-Driven Mass-Checks as below; this is good for slow-but-comprehensive daily tests.

List-Driven Mass-Checks

Loren outlined the system used in SARE:

For active rule development, this is obviously quite useful! If you can't run mass-check locally for whatever reason, it offers a way to do this using other people's corpora in almost-real-time.

JustinMason: 'I'd like to see if there's a way to combine the two (that is, nightly and list-driven mass-checks) somehow, so that new SVN commits that update sandbox rules, are immediately mass-checked alone. However, I can't see a way to do that reliably from SVN commits alone, because (for example) meta rules may depend on other rules that were not changed as part of the same commit. So I think the "email with attached rules file" is still a better model.'

'LOAFER': There are eval rules to consider too.

JustinMason: I think we have to do those as plugins, via the sandboxes.

Here's the current proposal: