Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

113707915

Background

So, you've integrated Apache Tika into your framework, tried it on a couple of thousand files and all works well. Problem solved!

...

  1. Regular catchable exceptions
  2. 2. OutOfMemory errors which can put the jvm in an unreliable state
  3. 3. Permanent hangs (Tika can chew up massive amounts of resources and go forever)
  4. 4. Security vulnerabilities (e.g. CVE-2016-6809 and CVE-2016-4434)

Please note that for 3., permanent hangs – you cannot terminate the Thread. Thread's stop, suspend, destroy sound like they'll do the trick, but they won't. You need to kill the entire process. See TIKA-456.

As of Tika 1.15, we added a MockParser 113707915 in the tika-core-tests.jar that will allow you to test your framework against items 1-3. Simply add that jar to your class path and then include a <mock> xml file in your set of test documents, and crash, crash away.

...

<throw class="my.evil.DeserializationAttack">bwahahaha</throw>

Usage

Below are several options for adding the dependency.

Including the tika-core-tests dependency in your project

<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>tika-core</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>

Tika-app

Place the tika-app.jar and the tika-core-tests.jar in a "bin" directory.

...