SchemaXml

The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields.

A [WWW] sample Solr schema.xml with detailed comments can be found in the Source Repository.

/!\ :TODO: /!\ we should try to make a DTD for the schema

  1. Data Types
  2. Fields
    1. Common field options
    2. Dynamic fields
    3. Indexing same data in multiple fields
    4. Expert field options
  3. Miscellaneous Settings
    1. The Unique Key Field
    2. The Default Search Field
    3. Default query parser operator
    4. Copy Fields
    5. Similarity

Data Types

The <types> section allows you define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.

Any subclass of [WWW] FieldType may be used as a field type class, using either its full package name, or the "solr" alias if it is in the default Solr package. For common numeric types (integer, float, etc...) there are multiple implementations provided depending on your needs, please see SolrPlugins for information on how to ensure that your own custom Field Types can be loaded into Solr.

TextFields can also support Analyzers with highly configurable Tokenizers and Token Filters.

/!\ :TODO: /!\ do omitNorms and positionIncrementGap have any meaning for non TextFields?

Field types that store text (TextField, StrField) support compression of stored contents:

compressThreshold is the minimum length required for text compression to be invoked. This applies only if compressed=true; a common pattern is to set compressThreshold on the field type definition, and turn compression on and off in the individual field definitions.

Fields

The <fields> section is where you list the individual <field> declarations you wish to use in your documents. Each <field> has a name that you will use to reference it when adding documents or executing searches, and an associated type which identifies the name of the fieldtype you wish to use for this field. There are various field options that apply to a field. These can be set in the field type declarations, and can also be overridden at an individual field's declaration.

Common field options

Common options that fields can have are...

See also FieldOptionsByUseCase, which discusses how these options should be set in various circumstances. See SolrPerformanceFactors for how different options can affect Solr performance.

Dynamic fields

One of the powerful features of Lucene is that you don't have to pre-define every field when you first create your index. Even though Solr provides strong datatyping for fields, it still preserves that flexibility using "Dynamic Fields". Using <dynamicField> declarations, you can create field rules that Solr will use to understand what datatype should be used whenever it is given a field name that is not explicitly defined, but matches a prefix or suffix used in a dynamicField.

For example the following dynamic field declaration tells Solr that whenever it sees a field name ending in "_i" which is not an explicitly defined field, then it should dynamically create an integer field with that name...

    <dynamicField name="*_i"  type="integer"  indexed="true"  stored="true"/>

Indexing same data in multiple fields

Note that, with textual data, it will often make sense to take what's logically speaking a single field (e.g. product name) and index it into several different Solr fields, each with different field options and/or analyzers.

As an example, if I had a field with a list of authors, such as:

I might want to index the same data differently in three different fields (perhaps using the Solr copyField directive):

(See also SolrFacetingOverview.)

Expert field options

The storage of Lucene term vectors can be triggered using the following field options:

These options can be used to accelerate highlighting and other anciliary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr (phrase queries, etc., do not require these settings to be present).

Miscellaneous Settings

In addition to the <fieldtypes> and <fields> sections of the schema, there are several other declarations that can appear in your schema....

The Unique Key Field

The <uniqueKey> declaration can be used to inform Solr that there is a field in your index which should be unique for all documents. If a document is added that contains the same value for this field as an existing document, the old document will be deleted.

It is not mandatory for a schema to have a uniqueKey field.

The Default Search Field

The <defaultSearchField> Is used by Solr when parsing queries to identify which field name should be searched in queries where an explicit field name has not been used.

Default query parser operator

The default operator used by Solr's query parser ([WWW] SolrQueryParser) can be configured with <solrQueryParser defaultOperator="AND|OR"/>. The default operator is "OR" if unspecified.

Copy Fields

Any number of <copyField> declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index, in the "dest" field of that document. You are responsible for ensuring that the datatypes of the fields are compatible, but Solr will process the information in the "dest" field using the appropriate field type (and Analyzer if it's a TextField).

This is provided as a convenient way to ensure that data is put into several fields, without needing to include the data in the update command multiple times.

Similarity

A <similarity> declaration can be used to specify the subclass of Similarity that you want Solr to use when dealing with your index. If no Similarity class is specified, the Lucene DefaultSimilarity is used. Please see SolrPlugins for information on how to ensure that your own custom Similarity can be loaded into Solr.

last edited 2007-10-24 23:08:30 by ChrisHarris