Introduction

Analysis is a very important factor in spell checking. Stemming and other techniques that change tokens is not recommended since it will result in giving stems as suggestions. Instead, you should use a very minimal tokenization/analysis process like the StandardAnalyzer or even the WhitespaceTokenizer plus a simple lower casing filter and a filter that removes apostrophes and the like. As with most things in search, there are always tradeoffs and you should evaluate the results in your application.

That being said, a common configuration for spell checking is:

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StandardFilterFactory"/>
  </analyzer>
</fieldType>

Furthermore, on the field that will get this type, use omitTermFreqAndPositions="true" to save a little space and time during indexing.

Use a <copyField> to divert your main text fields to the spell field and then configure your spell checker to use the "spell" field to derive the spelling index.

Space shortcuts

Page tree

Introduction

Space shortcuts

Page tree

SpellCheckingAnalysis

Introduction