Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The ForkParser – this forks a child process and will protect against OOM and infinite loops.
  2. tika-batch – if you are processing files at desktop/vm scale (not cloud scale), you can run tika-batch via tika-app:
    1.  java -jar tika-app.jar -i <input_dir> -o <output_dir>
  3. tika-server – if you are using tika-server 1In Tika >= 2.x, start the server with `–spawnChild` mode, and it will fork a child process to do the actual parsing.  In Tika 2.x, that mode is defaultthe parsing is done in a forked process by default. Clients need to be able to handle tika-server going offline when the forked parsing process has to restart.

  4. Use tika-pipes in Tika 2.x, programmatically, in tika-app with the -a option or in tika-server with the /async or /pipes endpoints.

The Tika project has taken the following steps to identify and fix catastrophic problems:

...