This plugin checks for specific keywords in image/gif attachments, using gocr
(an optical character recognition program).
This plugin can be used to detect spam that puts all the real spam content in an attached image. The mail itself only random text and random html, without any URL's or identifiable information.
Note that this is my first SA plugin, so any feedback is welcome. The words checked for are specific for some spam I received a lot of recently.
TODO: The words check for are now hardcoded in the plugin. This should be a configuration parameter instead.
You will need giftopnm
and gocr
installed.
– mdeboer at iua dot upf dot edu
Code
Ocr.cf
loadplugin Ocr Ocr.pm body OCR eval:check_ocr() describe OCR Check if text in attached images contains spam words score OCR 3.0
Ocr.pm
package Ocr; use strict; use Mail::SpamAssassin; use Mail::SpamAssassin::Util; use Mail::SpamAssassin::Plugin; our @ISA = qw (Mail::SpamAssassin::Plugin); # constructor: register the eval rule sub new { my ( $class, $mailsa ) = @_; $class = ref($class) || $class; my $self = $class->SUPER::new($mailsa); bless( $self, $class ); $self->register_eval_rule("check_ocr"); return $self; } sub check_ocr { my ( $self, $pms ) = @_; my $cnt = 0; foreach my $p ( $pms->{msg}->find_parts("image") ) { my ( $ctype, $boundary, $charset, $name ) = Mail::SpamAssassin::Util::parse_content_type( $p->get_header('content-type') ); if ( $ctype eq "image/gif" ) { open OCR, "|/usr/bin/giftopnm|/usr/bin/gocr -i - > /tmp/spamassassin.ocr.$$"; foreach $p ( $p->decode() ) { print OCR $p; } close OCR; open OCR, "/tmp/spamassassin.ocr.$$"; my @words = ( 'company', 'money', 'stock', 'million', 'thousand', 'buy' ); while (<OCR>) { my $w; foreach $w (@words) { if (m/$w/i) { $cnt++; } } } unlink "/tmp/spamassassin.ocr.$$"; } } return ( $cnt > 2 ); } 1;