...
*Use the RecursiveParserWrapper and store text for each document:java -jar tika-app.X.Y.jar -J -t -i <inputDirectory> -o <outputDirectory>
*Customize the behavior of Tika through the tika-config.xml configuration file:java -jar tika-app.X.Y.jar -c my-custom-tika-config.xml -J -t -i <inputDirectory> -o <outputDirectory>
*Specify jvm args to be used by the child process (prepend a "J" to the regular args):
...
*Commandline to generate output files for tika-eval...only process those files listed in pdfs_random_50000.csv:
java -Dlog4j.configuration=file:log4j_driver.xml -cp "bin/*" org.apache.tika.cli.TikaCLI -JXX:-OmitStackTraceInFastThrow -JXmx5g -JDlog4j.configuration=file:log4j.xml -bc tika-batch-config-basic-test.xml -i <input_directory> -o <output_directory> -fileList pdfs_random_50000.csv
...