Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by AlexAndr]

...

To extract a unique list of e-mail addresses from your 'Sent' folder (in mbox format), you could use something like this:

In your ~/.spamassassin/user_prefs file, put this in:

No Format

include sent_whitelist

The following script will create the sent_whitelist file with 100 addresses per line:

No Format
 #!/bin/sh

SADIR=~/.spamassassin
SENTMAIL=~/mail/Sent

rmcat -f $SADIR/sent_whitelist
for x in `grep "^To:" $SENTMAIL $SENTMAIL |
        grep -Ei '^(To|cc|bcc):' |
	grep grep -EiooEi '[-a-z0-9_.+_=/-]+@([-a-z0-9-]+\.]+)+[a-z]{2,}' |
	tr "A-Z" "a-z" |
	sort -u`u ;

	do|
        xargs -n 100 echo "whitelist_from $x" >>> $SADIR/sent_whitelist
done

cat $SADIR/user_prefs.base $SADIR/sent_whitelist > $SADIR/user_prefs

This can be adapted as necessary, and executed as a cron job. Note, this requires you to store/rename your user_prefs file to user_prefs.base! Make sure you do this before running the script or you'll lose your preferences. Of course if there is some way I'm not aware of to include files from within a user_prefs file, please someone make the necessary changes.

_(There is a way: include filename - Include configuration lines from "filename". Relative paths are considered relative to the current configuration file or user preferences file.)

...

The script is simple and fast, but not very accurate. It extracts strings looking like e-mail addresses from the lines starting with To: CC: and BCC. However it does not take into account the continuation lines (addresses on continuation lines are not added to the white list), but extracts addresses from message body (if a line in the body starts with To:/CC:/BCC:, which often happens when formwarding e-mails).

To make the script more accurate but much more slow replace the line "grep -Ei '^(To|cc|bcc):'|" with a call to formail (part of procmail package):

No Format

	formail -s formail -czx 'To:' -x 'CC:' -x 'BCC:' |

Building an auto-whitelist from LDAP

...