Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] take into account that scores may change outside of runGA

...

Warning: these are BIG – 4.6GB uncompressed for 3.1.0, for example, 400MB compressed.

6. upload the test logs to zone

Since stuff like the STATISTICS cannot ever be regenerated without the (randomised) test logs, these need to be saved, too. Currently, I think the best bet is to upload the rescore-logs.tgz file somewhere on spamassassin.zones.apache.org; it doesn't have to be in a public place, ASF-committer-account-required is fine.

7. upload proposed new scores

Attach the new proposed 50_scores.cf as a patch to the rescoring bug on the bugzilla, for voting and comments. There will always be comments (wink)

No Format

  cd ..
  svn diff rules/50_scores.cf > ~/newscores.diff
  [upload ~/newscores.diff in your web browser]

Then wait for votes...

8. Make the stats files

Once the scores are voted on and tweaked to everyone's satisfaction, you'll need to rebuild STATISTICS files with the new scores. First, (just to make sure you're in sync!) repatch your scores file to match what's been voted onInstall the new STATISTICS files:

No Format
  cp gen-set0-*/statisticssvn revert rules/50_scores.cf
  wget -o newscores.diff http://bugzilla.spamassassin.org/....attachment?id=....
  patch -p0 < newscores.diff

then, a little configuration; replace these with the paths to the correct gen-setN-* directories for the 4 score sets... the test logs the stats are measured against will be taken from these directories.

No Format

  genset0=/home/corpus-rsync/corpus/scoregen-3.1/gen-set0-2.0-4.0-100-nobob
  genset1=/home/corpus-rsync/corpus/scoregen-3.1/gen-set1-2.0-4.0-100-nobob
  genset2=/home/corpus-rsync/corpus/scoregen-3.1/gen-set2-2.0-4.625-100-nobob
  genset3=/home/corpus-rsync/corpus/scoregen-3.1/gen-set3-2.0-5.0-100-nobob

Once those vars are set, run these commands:

No Format

  cd masses

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset0/NSBASE/ham-test.log ham-test.log
  ln -s $genset0/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 0 > ../rules/STATISTICS-set0.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset1/NSBASE/ham-test.log ham-test.log
  ln cp gen-set1-*/statistics-s $genset1/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 1 > ../rules/STATISTICS-set1.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
  ln -s $genset2/NSBASE/ham-test.log ham-test.log
  cp gen-set2-*/statisticsln -s $genset2/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 2 > ../rules/STATISTICS-set2.txt

  rm ham*.log spam*.log ; touch ham.log spam.log
   cp gen-set3-*/statisticsln -s $genset3/NSBASE/ham-test.log ham-test.log
  ln -s $genset3/SPBASE/spam-test.log spam-test.log
  bash ./mk-baseline-results 3 > ../rules/STATISTICS-set3.txt

...


There'll be a lot of output along these lines:

No Format

ignoring 'TO_ADDRESS_EQ_REAL': immutable and score == 0

But that can be ignored. (TODO: it'd be nice to make this step a little less labour-intensive.)

8. upload new stats files

Attach the new proposed 50_scores.cf and STATISTICS*.txt as a patch to the rescoring bug on the bugzilla, for voting and comments. (wink):

No Format
  cd ..
  svn diff rules/50_scores.cf rules/STAT* > ~/newscoresnewstats.diff
  [upload ~/newscoresnewstats.diff in your web browser]

7. upload the test logs

Since stuff like the STATISTICS cannot ever be regenerated without the (randomised) test logs, these need to be saved, too. Currently, I think the best bet is to upload the rescore-logs.tgz file somewhere on spamassassin.zones.apache.org; it doesn't have to be in a public place, ASF-committer-account-required is fineAnd let all and sundry vote on that, too. Once the new scores and STATS files are approved and into SVN, and the log data is in a safe archival spot on the zone, you're done.