Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason] updated to buildbot 0.7.5

...

Wiki Markup
Progress of the mass-checks are visible on \[http://buildbotbbmass.spamassassin.org:8011/preflight/ the Buildbot 'waterfall'\]; as they complete, their results become visible on the [RuleQaApp].

...

This corpus is built from a selection of mail rsync'd up from various people; it's then "smoothed out" into several subsets. These use differing amounts of mail, starting with a small set of mail in the "mc-fast" chunk, and gradually increasing until we get to the largest block in "mc-slower". This division means that early "fast" results can arrive quickly, with less to scan, and as time goes on, more and more of the "slower" slaves complete their mass-checks and upload the results.

The "smoothing" and subset selection happens in mass-check nowadays.

What happens during the preflight buildbot process

...

Configure; a final summarisation step; first off, a 'FAST FREQS REPORT' is output, the HitFrequencies from the mass-check. Next, the logs from the mass-check are copied to a safe location, and the 'corpus-hourly' script run to generate various reports from them for the RuleQaApp. The URL for viewing the results in the RuleQaApp is printed prominently.

Administrivia: how the corpus is generated

The corpus is created from the UploadedCorpora. The script 'populate_cor' is run from cron periodically to rebuild the mass-checkable corpus from this. It attempts to 'smooth out' the multiple corpora into several new corpora, named "mc-fast", "mc-med", "mc-slow", "mc-slower", matching the buildbot slave names at http://buildbot.spamassassin.org/preflight/ .

It does this by:

...

.

...

unmigrated-wiki-markup

It attempts to use one person's corpus per each output corpus, but seeing as there's usually a glut of spam and a limited quantity of ham, it's not always anywhere near a one-to-one correlation.  All the same, by looking at \[http://buildbot.spamassassin.org/bbmass/corpus_makeup.txt the logs from the build process\], you can see where the correlations lie.

The output looks like this on-disk:

No Format

/home/bbmass/tmpfs/cor/CORPUSNAME/TYPE/LINKNAME

Each "CORPUSNAME" directory corresponds to one of the slave names, "mc-fast", "mc-med", etc. Under that, we have "TYPE", which is either "ham" or "spam". Next, "LINKNAME". This is a readable filename for the symbolic link, which gives the reader an idea of where the message came from in the source corpora.

Uploading corpora

See UploadedCorpora.

...

No Format
PASSWORD=[randompassword]
NAME=mc-new

sudo mkdir -p /home/bbmass/slaves/$NAME
sudo chown bbmassbuildbot /home/bbmass/slaves/$NAME

cd /home/bbmass/slaves/$NAME
sudo su bbmassbuildbot -c \
   "buildbot create-slave --usepty=0 \
      "mktap buildbot slave --basedir /home/bbmass/slaves/$NAME \
         --master buildbot.spamassassin.org:9988 --name $NAME \
         --passwd $PASSWORD --usepty=0"

echo $PASSWORD > $HOME/pwd
sudo mv $HOME/pwd /home/buildbot/pwds/$NAME
sudo chown buildbot /home/buildbot/pwds/$NAME
sudo chmod 600 /home/buildbot/pwds/$NAME

sudo vi /home/buildbot/bots/bbmass/master.cfg

        [search for mc-fast and add new lines/entries for $NAME]
        [don't forget the 'scheduler' part!]

sudo vi /etc/init.d/buildbotbbmass 

        [search for mc-fast and add new lines/entries for $NAME]

...