Ideas for Google Summer of Code 2006

These are possibilities, and won't be considered until they're listed in the real page at http://wiki.apache.org/general/SummerOfCode2006 . Anyone who's planning to mentor for these will also need to sign up there too.

Task Proposals

Subject ID: spamassassin-easy-mass-check

Title: Nightly Mass Check for Normal People

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: perl, email, corpora, distributed, community

Description:

We need a way to make nightly mass check easily accessible to normal users. They need easy to use software to do mass checks and submit results. They must be properly trained on the sorting rules. Our project then needs some way of tracking the level of trust of these growing number of submitters. see SocNightlyMassCheck

Possible Mentors: Justin Mason (jm at jmason.org)

Status: -

Subject ID: spamassassin-persistent-db-conns

Title: Persistent database connections

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: perl, databases, sql

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=2037 :
persistent database connections for SpamAssassin's Bayes subsystem. Michael:
'This exists, but is not an ASL friendly license. So a "clean room" implementation might be cool.'

Possible Mentors: -

Status: -

Subject ID: spamassassin-separate-expiry

Title: Helper process for Bayes expiry

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: perl, bayes, spamd, processes

Description:

Theo said: 'I also suggested having things like Bayes expiry and such being passed back to the parent who can spawn a helper process to do the work. That way the children processes will be able to accept, process, return the result, notify parent for bayes work, go back to listening. Right now we do: accept, process, do bayes work, return result, go back to listening, which ends up causing timeouts and possibly eats up all processing children.'

Possible Mentors: -

Status: -

Subject ID: spamassassin-arf-plugin

Title: ARF plugin

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: arf, plugins, reporting, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4812 :
ARF is a spam-report format for feedback loops for ISPs; there's been some interest in SpamAssassin understanding this and being able to match metadata inside the messages being reported.

Possible Mentors: -

Status: -

Subject ID: spamassassin-httpd-spamd

Title: Finish up Apache::SpamD httpd module

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: apache, httpd, modules, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4603 :
Finish up and polish the Apache::SpamD httpd module.

Possible Mentors: -

Status: -

Subject ID: spamassassin-quarantine-config-ui

Title: Quarantine / user-configuration web UI

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: web, ui, quarantine, user-configuration, cgi, perl

Description:

Create a web application for message quarantine or user configuration, as part of the SpamAssassin project.

Possible Mentors: -

Status: -

Subject ID: spamassassin-corpus

Title: Maintain a SpamAssassin corpus of messages

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: corpora, mail, collection, perl

Description:

Theo said: 'I'd almost rather we shift this around and make a "SpamAssassin Corpora", have all of us focus on making that the best it can be, and use that for mass-checks, etc.'

This could be a good possibility. Contributors can upload their own mail corpora to a central web app where the mass-check occurs. The mail collections could be quickly checked for validity, and tagged based on how much privacy the user wants for their mails (therefore controlling further redistribution of those mails).

Related to 'spamassassin-easy-mass-check' above.

Possible Mentors: -

Status: -

Subject ID: spamassassin-rules-db

Title: Rules explanation database

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: rules, wiki, web, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4771 - It'd be nice to have a way for users/admins/interested parties to have an easy way to look up a human-readable description of rules, based on the rule name; using the wiki as part of that would be the best solution.

Possible Mentors: -

Status: -

Subject ID: spamassassin-better-reload

Title: Better way to reload the spamd configuration

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: reload, spamd, sighup, restart, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4774 :
we currently have a very heavyweight configuration-rereading system where the entire process restarts. This is too heavyweight, and can be improved.

Possible Mentors: -

Status: -

Subject ID: spamassassin-message-test-suite

Title: a message-parser test suite

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: testing, testsuite, parsing, mail, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4559 : Every now and again, we come up against bugs in our message parser (MIME, HTML, headers, base64/qp decoding, etc. etc.) We fix them, but occasionally there's regressions. I envisage it as using a vast collection of message files, something like a mass-check corpus, and a set of tests to ensure the parser sees what it should be seeing.

Possible Mentors: -

Status: -

Subject ID: spamassassin-reduce-memory-usage

Title: Reduce memory footprint of spamd

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: ram, memory, spamd, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3839 : it may be possible to reduce spamd's memory footprint through changes to SpamAssassin's engine, internally. A risky project; major internal changes may never get applied, esp if they break other stuff

Possible Mentors: -

Status: -

Subject ID: spamassassin-improved-chi

Title: Implement 'Improved Chi' in the BAYES rules

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: bayes, chi, robinson, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3460 :
Handling Redundancy in Email Token Probabilities, Gary Robinson. http://www.garyrobinson.net/2004/04/improved_chi.html . Has shown good results. Implement in SpamAssassin and benchmark.

Possible Mentors: -

Status: -

Subject ID: spamassassin-spamd-unix-and-tcp

Title: spamd should support both UNIX domain and TCP sockets

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: spamd, unix-domain, sockets, networking, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3991 : currently spamd supports either UNIX-domain or TCP/IP sockets for incoming scan requests. It should support both simultaneously, in the one set of daemon processes.

Possible Mentors: -

Status: -

Subject ID: spamassassin-dobly

Title: Benchmark and implement "Dobly" Noise Reduction

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: dobly, bayes, classifiers, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3078 : investigate "Dobly" noise reduction a la http://bnr.nuclearelephant.com/ , in a form that can be incorporated into SpamAssassin. Benchmark results using 10-fold cross-validation.

Possible Mentors: -

Status: -

Subject ID: spamassassin-secure-user-auth

Title: Secure user authentication in the spamd protocol

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: spamd, protocol, tls, perl

Description:

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4550 :
a secure method to authenticate users over a spamc/spamd connection.

Possible Mentors: -

Status: -

Subject ID: spamassassin-mua-learner-plugins

Title: Thunderbird/Outlook/etc Plugins for Learning Messages via Spamd

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: spamd, protocol, bayesian, perl, thunderbird, outlook, mua

Description:

The spamd protocol is well defined and allows for learning/reporting of messages. What is currently lacking is the ability to learn/report those messages via popular mail applications (Thunderbird, Outlook, etc). Many of these applications have a well established API for creating add-on plugins that would allow for this type of activity.

Possible Mentors: -

Status: -

Template

Please use this for further project suggestions...

Subject ID: spamassassin-xxxx

Title: xxxxx

ASF Project: SpamAssassin - http://SpamAssassin.apache.org/

Keywords: xxxx, perl

Description:

xxxx

Possible Mentors: -

Status: -

Child pages

SummerOfCode2006

Ideas for Google Summer of Code 2006

Task Proposals

Template