...
To extract a unique list of e-mail addresses from your 'Sent' folder (in mbox format), you could use something like this:
In your ~/.spamassassin/user_prefs
file, put this in:
No Format |
---|
include sent_whitelist
|
The following script will create the sent_whitelist file with 100 addresses per line:
No Format |
---|
#!/bin/sh SADIR=~/.spamassassin SENTMAIL=~/mail/Sent rmcat -f $SADIR/sent_whitelist for x in `grep "^To:" $SENTMAIL $SENTMAIL | grep -Ei '^(To|cc|bcc):' | grep grep -EiooEi '[-a-z0-9_.+_=/-]+@([-a-z0-9-]+\.]+)+[a-z]{2,}' | tr "A-Z" "a-z" | sort -u`u ; do| xargs -n 100 echo "whitelist_from $x" >>> $SADIR/sent_whitelist done cat $SADIR/user_prefs.base $SADIR/sent_whitelist > $SADIR/user_prefs |
This can be adapted as necessary, and executed as a cron job. Note, this requires you to store/rename your user_prefs
file to user_prefs.base
! Make sure you do this before running the script or you'll lose your preferences. Of course if there is some way I'm not aware of to include files from within a user_prefs
file, please someone make the necessary changes.
_(There is a way: include filename
- Include configuration lines from "filename". Relative paths are considered relative to the current configuration file or user preferences file.)
...
The script is simple and fast, but not very accurate. It extracts strings looking like e-mail addresses from the lines starting with To: CC: and BCC. However it does not take into account the continuation lines (addresses on continuation lines are not added to the white list), but extracts addresses from message body (if a line in the body starts with To:/CC:/BCC:, which often happens when formwarding e-mails).
To make the script more accurate but much more slow replace the line "grep -Ei '^(To|cc|bcc):'|" with a call to formail (part of procmail package):
No Format |
---|
formail -s formail -czx 'To:' -x 'CC:' -x 'BCC:' |
|
Building an auto-whitelist from LDAP
...