Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Not clear that the -bc parameter wasn't what you use to pass in a custom tika-config, so add an example to clue someone in.

...

*Use the RecursiveParserWrapper and store text for each document:
java -jar tika-app.X.Y.jar -J -t -i <inputDirectory> -o <outputDirectory>

*Customize the behavior of Tika through the tika-config.xml configuration file:
java -jar tika-app.X.Y.jar -c my-custom-tika-config.xml -J -t -i <inputDirectory> -o <outputDirectory>

*Specify jvm args to be used by the child process (prepend a "J" to the regular args):

...

*Commandline to generate output files for tika-eval...only process those files listed in pdfs_random_50000.csv:


java -Dlog4j.configuration=file:log4j_driver.xml -cp "bin/*" org.apache.tika.cli.TikaCLI -JXX:-OmitStackTraceInFastThrow -JXmx5g -JDlog4j.configuration=file:log4j.xml -bc tika-batch-config-basic-test.xml -i <input_directory> -o <output_directory> -fileList pdfs_random_50000.csv

...