Tika now has the ability to leverage Apache cTAKES for use in parsing biomedical information from text. This page documents how to get Tika working with cTAKES.
Table of Contents |
---|
Installing cTAKES
The first step to getting the parser up and running is installing Apache cTAKES. Read on the following should work well on *nix systems. Windows directions are TODO. It's very important to install at least cTAKES version 3.2.2 or later.
mkdir -p $HOME/src && cd $HOME/src
- 2.
curl -O http://mirrors.ibiblio.org/apache//ctakes/ctakes-3.2.2/apache-ctakes-3.2.2-bin.tar.gz
3. tar xvzf *.tar.gz
- 4.
export CTAKES_HOME=$HOME/src/apache-ctakes-3.2.2
Now you have to download a separate resources package for cTAKES:
cd $HOME/src
- 2.
curl -Lo ctakes-resources-3.2.1.1-bin.zip "http://downloads.sourceforge.net/project/ctakesresources/ctakes-resources-3.2.1.1-bin.zip?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fctakesresources%2F%3Fsource%3Dtyp_redirect&ts=1433609725&use_mirror=softlayer-dal"
3. -
mv *.zip apache-ctakes-3.2.2
- 4.
cd apache-ctakes-3.2.2
5.unzip ctakes-resources-3.2.1.1-bin.zip
After the above is done, cTAKES is installed.
...
mkdir -p $HOME/src/ctakes-config/org/apache/tika/parser/ctakes && cd $HOME/src/ctakes-config/org/apache/tika/parser/ctakes
2.curl -kO "https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/org/apache/tika/parser/ctakes/CTAKESConfig.properties"
Setting up the Tika Config file
...
cd $HOME/src/ctakes-config
2.curl -kO "https://raw.githubusercontent.com/chrismattmann/ctakesparser-utils/master/config/tika-config.xml"
Putting it all together: Tika-App
...