Differences between revisions 3 and 4
Revision 3 as of 2013-10-23 18:00:11
Size: 1592
Editor: NickBurch
Comment: Add MPXJ based parser to the list (WIP), and update the Ogg entry
Revision 4 as of 2013-10-23 18:01:34
Size: 1592
Editor: NickBurch
Comment: Link needs to be the other way round
Deletions are marked like this. Additions are marked like this.
Line 21: Line 21:
It builds on top of [[MPXJ|http://mpxj.sourceforge.net/]], which is available under the LGPL It builds on top of [[http://mpxj.sourceforge.net/|MPXJ]], which is available under the LGPL

List of 3rd party parser plugins

These are 3rd party parser plugins which cannot be included due to licensing incompatibiliy. To install a plugin, download it according to instructions below and drop the jar(s) on your classpath. Tika will auto detect the plugin.

Microsoft TNEF / LZFU

This is a MS compression format used for compressed RTF, email attachments (like WINMAIL.DAT) and more. The parser is available from a github fork of the JTNEF project.

(Tika 0.10 includes a TNEF parser as standard now, which may be sufficient)

Install instructions:

  • git clone http://github.com/jukka/jtnef.git jtnef

  • cd jtnef
  • mvn package
  • cp target/jtnef-*.jar $SOMEWHERE_ON_CLASS_PATH

Microsoft Project

This parser extracts metadata and content from Microsoft Project (MPP and MPX) files

It builds on top of MPXJ, which is available under the LGPL

Installation instructions:

  • git clone git://git.code.sf.net/p/mpxj/mpxj

  • cd !mpxj
  • mvn package
  • cp target/mpxj-*SNAPSHOT.jar $SOMEWHERE_ON_CLASS_PATH
  • git clone http://github.com/Gagravarr/MPXJ-Tika

  • cd !MPXJ-Tika
  • mvn package
  • cp target/mpxj-tika-*SNAPSHOT.jar $SOMEWHERE_ON_CLASS_PATH

Ogg Vorbis and FLAC

This parser extracts metadata from Ogg Vorbis and FLAC audio files.

The library and parser are available under the Apache License, so this is now included as part of Tika.

Your plugin

<Your description here>

Install instructions:

  • <Your instructions here>

3rd party parser plugins (last edited 2013-10-23 18:01:34 by NickBurch)