Differences between revisions 3 and 4
Revision 3 as of 2007-07-09 03:48:35
Size: 2651
Editor: dns3
Comment: Minor edits and added simple explanation about suggestion ordering.
Revision 4 as of 2007-07-09 12:48:23
Size: 4160
Editor: syd-pow-pr6
Comment: Updated and expanded section on 'termSourceField'
Deletions are marked like this. Additions are marked like this.
Line 15: Line 15:
The field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (ie: string). The default field is 'word' and can be configured in SolrConfigXml. The field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

{{{
<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>
}}}

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

{{{
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>
}}}

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

{{{
<copyField source="content" dest="spell"/>
}}}

The default field is 'word' and can be configured in SolrConfigXml.

The SpellCheckerRequestHandler is designed to process a word (or several words) as the value of the "q" parameter and returns a list of alternative spelling suggestions. The spellchecker used by this handler is the Lucene contrib [http://wiki.apache.org/jakarta-lucene/SpellChecker SpellChecker] and more background information on the Solr inplementation can be found [https://issues.apache.org/jira/browse/SOLR-81 here].

Parameters

q

The word (or words) to be spell checked.

qt

This must be set to 'spellchecker' in order to invoke the SpellCheckerRequestHandler

termSourceField

The field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

<copyField source="content" dest="spell"/> 

The default field is 'word' and can be configured in SolrConfigXml.

spellcheckerIndexDir

The directory where your spell checker index should live and defaults to 'spell' in SolrConfigXml. May be absolute or relative to the Solr "dataDir" directory. If this option is not specified, a RAM directory will be used.

suggestionCount

Determines how many spelling suggestions are returned. The default value is 1 but can be configured in SolrConfigXml. The order of the returned results is determined by both the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] (or accuracy) of the suggestion and the popularity (the frequency) of the suggested word in the termSourceField.

accuracy

A float value between 1.0 and 0.0 on how close the suggested words should match the original word being checked (calculated using the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] algorithm). The default value is 0.5 but can be configured in SolrConfigXml.

cmd

There are currently two supported values for cmd: 'rebuild' and 'reopen':

In order to use SpellCheckerRequestHandler for the first time, you need to explicitly build the spelling index (see examples below):

If an external process is responsible for building the spell checker index, you must issue '&cmd=reopen' to force the spell checker index directory to be re-opened .

Examples

Build the spelling index for the first time:
  http://localhost:8983/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild

A simple call to the spell check handler:
  http://localhost:8983/solr/select/?q=windaws&qt=spellchecker

Return 5 suggestions with a accuracy value of 0.7:
  http://localhost:8983/solr/select/?q=linix&qt=spellchecker&suggestionCount=5&accuracy=0.7

SpellCheckerRequestHandler (last edited 2012-07-19 05:53:57 by cust)