Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: SOLR-9345

...

A sample Solr schema.xml with detailed comments can be found in the Source Repository.

Table of Contents

Data Types

The <types> section allows you to define a list of <fieldtype> declarations you wish to use in your schema, along with the underlying Solr class that should be used for that type, as well as the default options you want for fields that use that type.

...

  • Common options that field types can have are...
    • sortMissingLast=true|false
    • sortMissingFirst=true|false
    • indexed=true|false
    • stored=true|false
    • multiValued=true|false
    • omitNorms=true|false
    • omitTermFreqAndPositions=true|false (warning) Solr1.4
    • omitPositions=true|false (warning) Solr3.4
    • positionIncrementGap=N
    • autoGeneratePhraseQueries=true|false (in schema version 1.4 and later this now defaults to false)
    • postingsFormat=<name of a postings format> (warning) codec factory], only works if you use a [http://wiki.apache.org/solr/SolrConfigXml#codecFactory that is schema-aware such as SchemaCodecFactory. Please note that the postings formats used in a fieldType definition need to be in any of Solr lib directories. (For example, some useful (but unsupported) postings formats are available in the lucene-codecs JAR.). For detailed instructions on how to configure SimpleTextCodec, see: SimpleTextCodec Example

{{TextField}}s can also support Analyzers with highly configurable Tokenizers and Token Filters.

...

  • compressed=true|false
  • compressThreshold=<integer>

(warning) compression support was removed in 1.4.1. There are (untested) patches for 3.x in https://issues.apache.org/jira/browse/SOLR-752.

...

Common field options

Common options that fields can have are...

  • default
    • The default value for this field if none is provided while adding documents
  • indexed=true|false
    • True if this field should be "indexed". If (and only if) a field is indexed, then it is searchable, sortable, and facetable.
  • stored=true|false
    • True if the value of the field should be retrievable during a search, or if you're using highlighting or MoreLikeThis.
  • compressed=true|false
    • True if this field should be stored using gzip compression. (This will only apply if the field type is compressible; among the standard field types, only TextField and StrField are.)
  • compressThreshold=<integer>
  • multiValued=true|false
    • True if this field may contain multiple values per document, i.e. if it can appear multiple times in a document
  • omitNorms=true|false
    • This is arguably an advanced option.
    • Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
  • termVectors=false|true <?> Solr 1.1
    • If set, include full term vector info.
    • If enabled, often also used with termPositions="true" and termOffsets="true".
    • To use interactively, requires TermVectorComponent
    • Corresponds to TV button in Luke, and V field attribute.
  • omitTermFreqAndPositions=true|false (warning) Solr1.4
    • If set, omit term freq, positions and payloads from postings for this field. This can be a performance boost for fields that don't require that information and reduces storage space required for the index. Queries that rely on position that are issued on a field with this option fail with an exception. Prior to (warning) Solr4.0 the queries would silently fail to find documents.
  • omitPositions=true|false (warning) Solr3.4
    • If set, omits positions, but keeps term frequencies

See also FieldOptionsByUseCase, which discusses how these options should be set in various circumstances. See SolrPerformanceFactors for how different options can affect Solr performance.

...

  • Schildt, Herbert; Wolpert, Lewis; Davies, P.

I might want to index the same data differently in three different fields (perhaps using the Solr copyField directive):

  • For searching: Tokenized, case-folded, punctuation-stripped:
    • schildt / herbert / wolpert / lewis / davies / p
  • For sorting: Untokenized, case-folded, punctuation-stripped:
    • schildt herbert wolpert lewis davies p
  • For faceting: Primary author only, using a solr.StringField:
    • Schildt, Herbert

(See also SolrFacetingOverview.)

...

  • termVectors=true|false
  • termPositions=true|false
  • termOffsets=true|false

These options can be used to accelerate highlighting and other ancillary functionality, but impose a substantial cost in terms of index size. They are not necessary for typical uses of Solr (phrase queries, etc., do not require these settings to be present).

...

Anchor
copyField
copyField

Copy Fields

Any number of <copyField> declarations can be included in your schema, to instruct Solr that you want it to duplicate any data it sees in the "source" field of documents that are added to the index, in the "dest" field of that document. You are responsible for ensuring that the datatypes of the fields are compatible. The original text is sent from the "source" field to the "dest" field, before any configured analyzers for the originating or destination field are invoked.

This is provided as a convenient way to ensure that data is put into several fields, without needing to include the data in the update command multiple times. The copy is done at the stream source level and no copy feeds into another copy. The maxChars property may be used in a copyField declaration. This simply limits the number of characters copied. For example:

No Format

 <copyField source="body" dest="teaser" maxChars="300"/>

A common requirement is to copy or merge all input fields into a single solr field. This can be done as follows:-

No Format

 <copyField source="*" dest="text"/>

You can also automatically generate new field names by including an asterisk in both the source and destination fields. For example, if you have the following copyField directive:

No Format

 <copyField source="*_t" dest="*_t_facet" />

and then submit a field called author_t, the field's value will also be copied to another field called author_t_facet, where the word "author" was matched by the original asterisk in the source attribute, and then that pattern's match text was used to generate the destination field name, via the asterisk in the destination attribute *_t_facet, which serves as a field name template.

See also Copying Fields in the new the Apache Solr Reference Guide

...