Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: [Original edit by JustinMason]

...

  • discarding obvious viruses/malware
  • discarding too-large messages (over 4MB in size?)
  • removing forwarding headers, for spams forwarded from third-party addresses
  • extracting attached message/rfc822 parts (for certain forwarders)
  • discarding messages from now-unreliable addresses, identified using header regexps
  • discarding frequently-appearing ham, identified using header regexps
  • discarding bounces (scan with SpamAssassin for ANY_BOUNCE_MESSAGES)

...

I'd suggest the simplest is for the uploading EC2 nodes to send a mail to a mailing list once each upload completes, containing:

  • the newly-uploaded object's URL
  • its size
  • possibly other metadata; number of messages, time period covered, etc.

in a machine-readable format – namely RFC-822 "header: value".

...