(part of RulesProjectPlan)
Problem description: SA rules development handles rules aimed at spam in English best, since most SA rules developers that feed the distribution system speak and correspond in English, and the great majority of the testing corpora are based in English. We're not as good at developing, validating, testing, or scoring rules in other languages.
Because of this,
SARE has experienced the English vs Other problem in many of our obfuscation rules, where they hit beautifully on English spam, but have horrible S/O rates for German ham (to pick an example). \\ That's why we use 70_sare_name_eng.cf files, to indicate that these rules work well only on systems which expect almost 100% English ham, and little to no ham in other languages. \\ I \[BobMenschel\] have begun to wonder whether it might be worth while having 50_scores.cf for English emails, and then 50_scores_de.cf for German emails, and have SA pick the score appropriately depending upon the language of the email, just as it picks rule descriptions based on the language of the host system. \\ |
We could invent a class of rules that were 'test rules'. They would have nil score and wouldn't report on the mail summary if they hit. But they would show up in the report-home summary is to whether they hit, and whether it was ham or spam.
Then we can make rules that pass initial testing and stick them out for what we believe is good use, or maybe even for pure testing purposes. SA systems around the world would pick up these rules with sa-update, and would report home on the hit stats. If we have a good hitter that sucks in 'de', then we move it to an english-only ruleset, or we have an exclude-de option on the front of the rule or rule grouping. If the sysadmin has set his local language correctly, things should work out correctly.