Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: somehow have a non cwiki controlled TOC that didn't actually link in the page!

...

Named Entity Recognition is supported in tika-parsers v1.12 (TIKA-1787). This page describes the steps required to configure and activate the NamedEntityParser.

Contents

  1. Named Entity Recognition (NER) with Tika
    1. Activate Named Entity Parser
    2. Using Apache OpenNLP NER
      1. Tika App + OpenNLP NER in action
    3. Using Stanford CoreNLP NER
      1. Tika + CoreNLP in action
    4. Using Regular Expressions
      1. Tika + RegexNER in action
    5. Creating a custom NER
    6. Chaining all the above at once

Activate Named Entity Parser

...

The following table shows types of entities and the paths to place the model file.

Entity Type

Path for model

URL to get

PERSON

org/apache/tika/parser/ner/opennlp/ner-person.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-person.bin

LOCATION

org/apache/tika/parser/ner/opennlp/ner-location.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-location.bin

ORAGANIZATION

org/apache/tika/parser/ner/opennlp/ner-organization.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-organization.bin

DATE

org/apache/tika/parser/ner/opennlp/ner-date.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-date.bin

TIME

org/apache/tika/parser/ner/opennlp/ner-time.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-time.bin

PERCENT

org/apache/tika/parser/ner/opennlp/ner-percentage.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-percentage.bin

MONEY

org/apache/tika/parser/ner/opennlp/ner-money.bin

http://opennlp.sourceforge.net/models-1.5/en-ner-money.bin

Notes:

  1. You can use any combination of the models. If you are interested in only the LOCATION names, then skip other NER models save LOCATION.
  2. NER Models for other languages are also available http://opennlp.sourceforge.net/models-1.5/ . If you choose to use different language, use those URLs in the below script.

...