Using ParseContext to Control Parsing

The ParseContext is used to configure parsing for a given file.

The general use is

parseContext.set(MyClass.class, new MyClass());
parser.parse(inputStream, contentHandler, metadata, parseContext);


The following uses apply to several parsers:

1. Handling embedded files

2. XMLParsing -- Users can send in their own XMLReader (StAX), SAXParser (SAX), SAXParserFactory (SAX) or DocumentBuilder (DOM). Parsers that use XML parsing will use these resources for XML parsing.

3. PasswordProvider -- If you know the password to password protected files, you can send in a PasswordProvider via the ParseContext.

4. ExecutorService -- For parsers that use an ExecutorService, users can pass in their own ExecutorService.

Parser Specific

1. HtmlParser

2. TesseractOcrParser

3. PDFParser

4. Microsoft Parser (as of Tika 1.15)

TikaParseContext (last edited 2017-03-13 16:16:08 by TimothyAllison)