MIT Information Extraction (MITIE) with Tika

MIT Information Extraction provides free state-of-the-art information extraction tools. The current release includes tools for performing named entity extraction and binary relation detection as well as tools for training custom extractors and relation detectors.

Support for MITIE is provided as a runtime binding in Tika org.apache.tika.parser.ner.mitie.MITIENERecogniser class

Installation

  1. Simple by downloading mitie-resources : Use following commands to set up your mitie-resources.

    MAC OS Requirement: Download and install Homebrew.

    Linux/Windows: No pre-requisite.

     git clone https://github.com/manalishah/mitie-resources
     cd mitie-resources
     # absolute path to mitie-resources folder 
     export NER_RES=$PWD
     chmod a+x install.sh
     ./install.sh

Running MITIE with Tika-App

Running MITIE with Tika-Server

  1. For Mac OS

     export TIKA_SERVER={your/path/to/tika-server}/target/tika-server-1.13-SNAPSHOT.jar
    
     java -Djava.library.path=$NER_RES/MITIE/mitielib -Dner.mitie.model=$NER_RES/MITIE/MITIE-models/english/ner_model.dat -Dner.impl.class=org.apache.tika.parser.ner.mitie.MITIENERecogniser -classpath $NER_RES/MITIE/mitielib/javamitie.jar:$TIKA_SERVER org.apache.tika.server.TikaServerCli --config=$NER_RES/tika-config.xml -p 9998
  2. For LINUX/Windows

     export TIKA_SERVER={your/path/to/tika-server}/target/tika-server-1.13-SNAPSHOT.jar
    
     java -Dner.mitie.model=$NER_RES/MITIE/MITIE-models/english/ner_model.dat -Dner.impl.class=org.apache.tika.parser.ner.mitie.MITIENERecogniser -classpath $NER_RES/MITIE/mitielib/javamitie.jar:$TIKA_SERVER org.apache.tika.server.TikaServerCli --config=$NER_RES/tika-config.xml -p 9998

    This will start the Tika-Server enabled with MITIE Named Entity Parser at http://localhost:9998 To test the server try the sample.txt file provided in the mitie-resources folder

     curl -T $NER_RES/sample.txt http://localhost:9998/meta -H "Accept: application/json"
    This should return metadata keys in a JSON format:
     {
      "Content-Type":"text/plain",
      "NER_LOCATION":["Los Angeles","California"],
      "X-Parsed-By":["org.apache.tika.parser.CompositeParser","org.apache.tika.parser.ner.NamedEntityParser"],
      "language":"sl"
     }

TikaAndMITIE (last edited 2016-04-24 00:10:22 by manalishah)