<!> Solr1.4

Where it is used

This is used for indexing mails from a mail box. Currently IMAP protocol is supported. Since Java Mail API is used, it should be able to support other protocols as well in future.

Fields emitted by MailEntityProcessor

Each mail gets indexed as one document. The MailEntityProcessor emits the following fields for each mail. The consumer is free to consume fields of interest, transform etc.

single valued fields :

multi valued fields

How to configure

The data-config.xml should have the below configuration at a minimum.

<document>
   <entity processor="MailEntityProcessor"     
           user="somebody@gmail.com" 
           password="something" 
           host="imap.gmail.com" 
           protocol="imaps"
           folders = "x,y,z"/>
</document>

Additional configuration

The below attributes help fine tune the indexing. These are all optional.

folders to index

Timeouts

handling attachments

The MailEntityProcessor uses Apache Tika.

fetching mails since a certain date

custom filter for fetching mails

  public static interface MailEntityProcessor.CustomFilter {
    public SearchTerm getCustomSearch(javax.mail.Folder folder);
  }

Other attributes for tuning mail fetching from mail server