Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Update documentation

Grobid Quantities is a module of Grobid that specialised on in the recognition of any expressions of measurements (e.g. pressure, temperature, etc.) in textual documents such as PDF publications.
Measurements are parsed normalised and converted into SI units. 
To use its capabilities with Tika, one must install the server endpoint created for Grobid Quantities to extract measurement units from text passed to it.

...

The resources to be created are 2 files: tika-config.xml and GrobidServer.properties to be supplied later.

A predefined set of configuration files are available here:

Code Block
git clone https://github.com/lfoppiano/grobid-quantities-tika-parser-resources.git grobidquantities-parser-resources

Alternatively is possible to create the files automatically, as described below.

Manual configuration

Create Tika-config.xml

In order to use any of the NamedEntityParser implementations in Tika, the parser responsible for handling the name recognition task needs to be enabled.
This can be done by creating the tika-config.xml file, as follows:

No Format
 <?xml version="1.0" encoding="UTF-8"?>
 <properties>
     <parsers>
         <parser class="org.apache.tika.parser.ner.NamedEntityParser">
             <mime>text/plain</mime>
             <mime>text/html</mime>
             <mime>application/xhtml+xml</mime>
         </parser>
     </parsers>
 </properties>
 

...

Create GrobidServer.properties

It is imperative that Tika should know on what host you are running the grobid-quantities-server. By default, Tika will assume your server runs on port 8060.
In order to specify any other port, you must supply a GrobidServer.properties file. Sample GrobidServer.properties   file. My file looks like the following:

No Format
grobid.server.url=http://localhost:8060
grobid.endpoint.text=/processQuantityText
 

...

No Format
 #Create a directory for keeping the config and properties file.
 export GROBID_QUANTITIES_RES=$HOME/GrobidQuantitiesRest-resources
 mkdir -p $GROBID_QUANTITIES_RES
 cd $GROBID_QUANTITIES_RES
 #config file must be stored in this directory
 pwd

 export PATH_PREFIX="$GROBID_QUANTITIES_RES/org/apache/tika/parser/ner/grobid"
 mkdir -p $PATH_PREFIX
 #create and edit the properties file
 vim $PATH_PREFIX/GrobidServer.properties
 

Running Grobid Quantities with Tika


No Format
export TIKA_APP={your/path/to/tika-app}/target/tika-app-1.13-SNAPSHOT.jar

#set the system property to use GrobidNERecogniser class
java -Dner.impl.class=org.apache.tika.parser.ner.grobid.GrobidNERecogniser -classpath $GROBID_QUANTITIES_RES:$TIKA_APPgrobidquantities-parser-resources:tika-app-2.8.0.jar org.apache.tika.cli.TikaCLI --config=$GROBID_QUANTITIES_RESgrobidquantities-parser-resources/tika-config.xml -m  https://en.wikipedia.org/wiki/Time

...