You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

This plugin checks for specific keywords in image/gif attachments, using gocr (an optical character recognition program).

This plugin can be used to detect spam that puts all the real spam content in an attached image. The mail itself only random text and random html, without any URL's or identifiable information.

Note that this is my first SA plugin, so any feedback is welcome. The words checked for are specific for some spam I received a lot of recently.

TODO: The words check for are now hardcoded in the plugin. This should be a configuration parameter instead.

You will need giftopnm and gocr installed.

– mdeboer at iua dot upf dot edu

Code

Ocr.cf

loadplugin Ocr Ocr.pm
body OCR eval:check_ocr()
describe OCR Check if text in attached images contains spam words
score OCR 3.0

Ocr.pm

package Ocr;

use strict;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Plugin;

our @ISA = qw (Mail::SpamAssassin::Plugin);

# constructor: register the eval rule
sub new {
   my ( $class, $mailsa ) = @_;
   $class = ref($class) || $class;
   my $self = $class->SUPER::new($mailsa);
   bless( $self, $class );
   $self->register_eval_rule("check_ocr");
   return $self;
}

sub check_ocr {
   my ( $self, $pms ) = @_;
   my $cnt = 0;
   foreach my $p ( $pms->{msg}->find_parts("image") ) {
      my ( $ctype, $boundary, $charset, $name ) =
        Mail::SpamAssassin::Util::parse_content_type(
         $p->get_header('content-type') );
      if ( $ctype eq "image/gif" ) {
         open OCR, "|/usr/bin/giftopnm|/usr/bin/gocr -i - > /tmp/spamassassin.ocr.$$";
         foreach $p ( $p->decode() ) {
            print OCR $p;
         }
         close OCR;
         open OCR, "/tmp/spamassassin.ocr.$$";
         my @words =
           ( 'company', 'money', 'stock', 'million', 'thousand', 'buy' );
         while (<OCR>) {
            my $w;
            foreach $w (@words) {
               if (m/$w/i) {
                  $cnt++;
               }
            }
         }
         unlink "/tmp/spamassassin.ocr.$$";
      }
   }
   return ( $cnt > 2 );
}

1;
  • No labels