Differences between revisions 4 and 5
Revision 4 as of 2013-04-05 09:55:10
Size: 2603
Comment:
Revision 5 as of 2016-01-22 15:46:35
Size: 874
Editor: SteveRowe
Comment: Moved content to Confluence, added redirect info
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Using PreAnalyzedField type for integration with external document processing pipelines. = {{{#!wiki important
This page exists for the Solr Community to share Tips, Tricks, and Advice about
[[https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes#WorkingwithExternalFilesandProcesses-ThePreAnalyzedFieldType|PreAnalyzedField]].
Line 3: Line 5:
''This field type is available since Solr 4.0. See also SOLR-1535, SOLR-4619.'' Reference material previously located on this page has been migrated to the
[[https://cwiki.apache.org/solr/|Official Solr Ref Guide]].
If you need help, please consult the ref guide for the version of Solr you are using
for the specific details about using [[https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes#WorkingwithExternalFilesandProcesses-ThePreAnalyzedFieldType|this feature]].
Line 5: Line 10:
PreAnalyzedField type provides a way to send to Solr serialized token streams, optionally with independent stored values of a field, and have this information stored and indexed without any additional text processing applied in Solr. This is useful if user wants to submit field content that was already processed by some existing external text processing pipeline (e.g. tokenized, annotated, stemmed, inserted synonyms, etc), while using all the rich attributes that Lucene's TokenStream provides (per-token attributes).

== Pluggable serialization ==
The serialization format is pluggable using implementations of PreAnalyzedParser interface. There are two out of the box implementations:

 * JsonPreAnalyzedParser - as the name suggests, it parses content that uses JSON to represent field's content. This is the default parser to use if the field type is not configured otherwise.
 * SimplePreAnalyzedParser - uses a simple strict plain text format, which in some situations may be easier to create than JSON.

== Configuration options ==
There is only one configuration parameter, `parserImpl`. The value of this parameter should be a fully qualified class name of a class that implements PreAnalyzedParser interface (since Solr 4.3 you can also use `json` or `simple` as shortcuts for the two included implementations). The default value of this parameter is `org.apache.solr.schema.JsonPreAnalyzedParser` (or `json`).

Here's an example of how to define the type and a field that uses this type in `schema.xml`:
{{{
<types>
  ...
  <fieldType name="preanalyzed" class="solr.PreAnalyzedField" parserImpl="org.apache.solr.schema.JsonPreAnalyzedParser"/>
  ...
</types>
<fields>
  ...
  <field name="pre" type="preanalyzed" indexed="true" stored="true"/>
  ...
</fields>
If you'd like to share information about how you use this feature, please [[FrontPage#How_to_edit_this_Wiki|add it to this page]].
/* cwikimigrated */
Line 29: Line 13:

And here's an example XML that adds documents with fields of this type:

{{{
<?xml version="1.0" encoding="UTF-8"?>
<add>
<doc>
<field name="id">1</field>
<field name="pre">{"v":"1","str":"document one","tokens":[{"t":"one"},{"t":"two"},{"t":"three","i":100}]}</field>
</doc>
<doc>
<field name="id">2</field>
<field name="pre">{"v":"1","str":"document two","tokens":[{"t":"four"},{"t":"five"},{"t":"six","i":100}]}</field>
</doc>
<doc>
<field name="id">3</field>
<field name="pre">{"v":"1","str":"document three","tokens":[{"t":"seven"},{"t":"eight"},{"t":"nine","i":100}]}</field>
</doc>
</add>
}}}

This page exists for the Solr Community to share Tips, Tricks, and Advice about PreAnalyzedField.

Reference material previously located on this page has been migrated to the Official Solr Ref Guide. If you need help, please consult the ref guide for the version of Solr you are using for the specific details about using this feature.

If you'd like to share information about how you use this feature, please add it to this page.

PreAnalyzedField (last edited 2016-01-22 15:46:35 by SteveRowe)