Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Tika now has the ability to leverage Apache cTAKES for use in parsing biomedical information from text. This page documents how to get Tika working with cTAKES.

Table of Contents

Installing cTAKES

The first step to getting the parser up and running is installing Apache cTAKES. Read on the following should work well on *nix systems. Windows directions are TODO. It's very important to install at least cTAKES version 3.2.2 or later.

  1. mkdir -p $HOME/src && cd $HOME/src
  2. 2. curl -O http://mirrors.ibiblio.org/apache//ctakes/ctakes-3.2.2/apache-ctakes-3.2.2-bin.tar.gz 3.
  3. tar xvzf *.tar.gz
  4. 4. export CTAKES_HOME=$HOME/src/apache-ctakes-3.2.2

Now you have to download a separate resources package for cTAKES:

  1. cd $HOME/src
  2. 2. curl -Lo ctakes-resources-3.2.1.1-bin.zip "http://downloads.sourceforge.net/project/ctakesresources/ctakes-resources-3.2.1.1-bin.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fctakesresources%2F%3Fsource%3Dtyp_redirect&ts=1433609725&use_mirror=softlayer-dal" 3.  
  3.  mv *.zip apache-ctakes-3.2.2
  4. 4. cd apache-ctakes-3.2.2 5. unzip ctakes-resources-3.2.1.1-bin.zip

After the above is done, cTAKES is installed.

...

  1. mkdir -p $HOME/src/ctakes-config/org/apache/tika/parser/ctakes && cd $HOME/src/ctakes-config/org/apache/tika/parser/ctakes 2.
  2. curl -kO "https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/org/apache/tika/parser/ctakes/CTAKESConfig.properties"

Setting up the Tika Config file

...

  1. cd $HOME/src/ctakes-config 2. curl -kO "https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml"

Putting it all together: Tika-App

...