You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

This plugin checks for specific keywords in image/gif attachments, using gocr. This can be used to detect spam that puts all the real contect in an attached image, accompanied with random text and html (no URL's, etc).

Note that this is my first SA plugin, so any feedback is welcome. The words checked for are specific for some spam I received a lot of recently. (Hardcoded in the plugin)

You will need giftopnm and gocr installed.

– mdeboer at iua dot upf dot edu

Code

Ocr.cf

loadplugin Ocr Ocr.pm
body OCR eval:check_ocr()
describe OCR Check if text in attached images contains spam words
score OCR 3.0

Ocr.pm

package Ocr;

use strict;
use Mail::SpamAssassin;
use Mail::SpamAssassin::Util;
use Mail::SpamAssassin::Plugin;

our @ISA = qw (Mail::SpamAssassin::Plugin);

# constructor: register the eval rule
sub new {
   my ( $class, $mailsa ) = @_;
   $class = ref($class) || $class;
   my $self = $class->SUPER::new($mailsa);
   bless( $self, $class );
   $self->register_eval_rule("check_ocr");
   return $self;
}

sub check_ocr {
   my ( $self, $pms ) = @_;
   my $cnt = 0;
   foreach my $p ( $pms->{msg}->find_parts("image") ) {
      my ( $ctype, $boundary, $charset, $name ) =
        Mail::SpamAssassin::Util::parse_content_type(
         $p->get_header('content-type') );
      if ( $ctype eq "image/gif" ) {
         open OCR, "|/usr/bin/giftopnm|/usr/bin/gocr -i - > /tmp/spamassassin.ocr.$$";
         foreach $p ( $p->decode() ) {
            print OCR $p;
         }
         close OCR;
         open OCR, "/tmp/spamassassin.ocr.$$";
         my @words =
           ( 'company', 'money', 'stock', 'million', 'thousand', 'buy' );
         while (<OCR>) {
            my $w;
            foreach $w (@words) {
               if (m/$w/i) {
                  $cnt++;
               }
            }
         }
         unlink "/tmp/spamassassin.ocr.$$";
      }
   }
   return ( $cnt > 2 );
}

1;
  • No labels