Differences between revisions 5 and 6
Revision 5 as of 2007-07-09 13:03:01
Size: 4589
Editor: syd-pow-pr6
Comment: Added info about the "onlyMorePopular" parameter
Revision 6 as of 2007-09-18 00:29:40
Size: 6537
Editor: HossMan
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
The SpellCheckerRequestHandler is designed to process a word (or several words) as the value of the "q" parameter and returns a list of alternative spelling suggestions. The spellchecker used by this handler is the Lucene contrib [http://wiki.apache.org/jakarta-lucene/SpellChecker SpellChecker] and more background information on the Solr inplementation can be found [https://issues.apache.org/jira/browse/SOLR-81 here]. The [http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html SpellCheckerRequestHandler] is designed to process a word (or several words) as the value of the "q" parameter and returns a list of alternative spelling suggestions. The spellchecker used by this handler is the Lucene contrib [http://wiki.apache.org/jakarta-lucene/SpellChecker SpellChecker].

<!> ["Solr1.3"]

[[TableOfContents(3)]]

== Term Source Configuration ==

When configuring the !SpellCheckerRequestHandler in your SolrConfigXml, you should use the `termSourceField` config option to specify the field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple !FieldType without a lot of Analysis (e.g. string):

{{{
<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>
}}}

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the !StandardTokenizer and !StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

{{{
<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>
}}}

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

{{{
<copyField source="content" dest="spell"/>
}}}

The default termSourceField is 'word'.
Line 99: Line 149:
----
CategorySolrRequestHandler

The [http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html SpellCheckerRequestHandler] is designed to process a word (or several words) as the value of the "q" parameter and returns a list of alternative spelling suggestions. The spellchecker used by this handler is the Lucene contrib [http://wiki.apache.org/jakarta-lucene/SpellChecker SpellChecker].

<!> ["Solr1.3"]

TableOfContents(3)

Term Source Configuration

When configuring the SpellCheckerRequestHandler in your SolrConfigXml, you should use the termSourceField config option to specify the field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

<copyField source="content" dest="spell"/> 

The default termSourceField is 'word'.

Parameters

q

The word (or words) to be spell checked.

qt

This must be set to 'spellchecker' in order to invoke the SpellCheckerRequestHandler

termSourceField

The field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

<copyField source="content" dest="spell"/> 

The default field is 'word' and can be configured in SolrConfigXml.

spellcheckerIndexDir

The directory where your spell checker index should live and defaults to 'spell' in SolrConfigXml. May be absolute or relative to the Solr "dataDir" directory. If this option is not specified, a RAM directory will be used.

suggestionCount

Determines how many spelling suggestions are returned. The default value is 1 but can be configured in SolrConfigXml. The order of the returned results is determined by both the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] (or accuracy) of the suggestion and the popularity (the frequency) of the suggested word in the termSourceField.

accuracy

A float value between 1.0 and 0.0 on how close the suggested words should match the original word being checked (calculated using the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] algorithm). The default value is 0.5 but can be configured in SolrConfigXml.

onlyMorePopular

When "onlyMorePopular" is set to true and the misspelled word exists in the user field, only words that occur more frequently in the termSourceField than the one given will be returned. The default value is false.

cmd

There are currently two supported values for cmd: 'rebuild' and 'reopen':

In order to use SpellCheckerRequestHandler for the first time, you need to explicitly build the spelling index (see examples below):

If an external process is responsible for building the spell checker index, you must issue '&cmd=reopen' to force the spell checker index directory to be re-opened .

Examples

Build the spelling index for the first time:
  http://localhost:8983/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild

A simple call to the spell check handler:
  http://localhost:8983/solr/select/?q=windaws&qt=spellchecker

Return a list of suggestions that appear more frequently in the termSourceField that the word 'aft'
  http://localhost:8983/solr/select/?q=aft&qt=spellchecker&onlyMorePopular=true

Return 5 suggestions with a accuracy value of 0.7:
  http://localhost:8983/solr/select/?q=linix&qt=spellchecker&suggestionCount=5&accuracy=0.7


CategorySolrRequestHandler

SpellCheckerRequestHandler (last edited 2012-07-19 05:53:57 by cust)