...
- discarding obvious viruses/malware
- discarding too-large messages (over 4MB in size?)
- removing forwarding headers, for spams forwarded from third-party addresses
- extracting attached message/rfc822 parts (for certain forwarders)
- discarding messages from now-unreliable addresses, identified using header regexps
- discarding frequently-appearing ham, identified using header regexps
- discarding bounces (scan with SpamAssassin for ANY_BOUNCE_MESSAGES)
...
I'd suggest the simplest is for the uploading EC2 nodes to send a mail to a mailing list once each upload completes, containing:
- the newly-uploaded object's URL
- its size
- possibly other metadata; number of messages, time period covered, etc.
in a machine-readable format – namely RFC-822 "header: value".
...