Differences between revisions 6 and 7
Revision 6 as of 2007-09-18 00:29:40
Size: 6537
Editor: HossMan
Comment:
Revision 7 as of 2007-11-05 20:01:04
Size: 7503
Editor: S01060016b64931f7
Comment: updated spell checker page with new parameters
Deletions are marked like this. Additions are marked like this.
Line 65: Line 65:
(sp.dictionary.termSourceField in <!> ["Solr1.3"])
Line 109: Line 111:
== dictionary-related parameters ==
Line 110: Line 114:

(sp.dictionary.indexDir in <!> ["Solr1.3"])
Line 113: Line 119:
=== suggestionCount === === sp.dictionary.threshold ===
Line 115: Line 121:
Determines how many spelling suggestions are returned. The default value is 1 but can be configured in SolrConfigXml. The order of the returned results is determined by both the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] (or accuracy) of the suggestion and the popularity (the frequency) of the suggested word in the termSourceField. Determines what terms will be used for creating the dictionary from the source field. The threshold is in terms of ''document frequency'', i.e., what fraction of documents contain this term (not term frequency). This can be used to create a smaller, more accurate dictionary.
Line 117: Line 123:
=== accuracy ===

A float value between 1.0 and 0.0 on how close the suggested words should match the original word being checked (calculated using the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] algorithm). The default value is 0.5 but can be configured in SolrConfigXml.

=== onlyMorePopular ===

When "onlyMorePopular" is set to true and the misspelled word exists in the user field, only words that occur more frequently in the termSourceField than the one given will be returned. The default value is false.
The default value is '`0`'. <!> ["Solr1.3"]
Line 133: Line 133:
== query-related parameters ==

=== suggestionCount ===

(sp.query.suggestionCount in <!> ["Solr1.3"])

Determines how many spelling suggestions are returned. The default value is 1 but can be configured in SolrConfigXml. The order of the returned results is determined by both the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] (or accuracy) of the suggestion and the popularity (the frequency) of the suggested word in the termSourceField.

=== accuracy ===

(sp.query.accurary in <!> ["Solr1.3"])

A float value between 1.0 and 0.0 on how close the suggested words should match the original word being checked (calculated using the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] algorithm). The default value is 0.5 but can be configured in SolrConfigXml.

=== onlyMorePopular ===

(sp.query.onlyMorePopular in <!> ["Solr1.3"])

When "onlyMorePopular" is set to true and the misspelled word exists in the user field, only words that occur more frequently in the termSourceField than the one given will be returned. The default value is false.

== sp.query.extendedResults ==

Whether to use the extended response format, which is more complicated but richer. Returns the document frequency for each suggestion and returns one suggestion block for each term in the query string.

The default value is '`false`'. <!> ["Solr1.3"]

The [http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html SpellCheckerRequestHandler] is designed to process a word (or several words) as the value of the "q" parameter and returns a list of alternative spelling suggestions. The spellchecker used by this handler is the Lucene contrib [http://wiki.apache.org/jakarta-lucene/SpellChecker SpellChecker].

<!> ["Solr1.3"]

TableOfContents(3)

Term Source Configuration

When configuring the SpellCheckerRequestHandler in your SolrConfigXml, you should use the termSourceField config option to specify the field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

<copyField source="content" dest="spell"/> 

The default termSourceField is 'word'.

Parameters

q

The word (or words) to be spell checked.

qt

This must be set to 'spellchecker' in order to invoke the SpellCheckerRequestHandler

termSourceField

(sp.dictionary.termSourceField in <!> ["Solr1.3"])

The field in your schema that you want to be able to build your spell index on. This should be a field that uses a very simple FieldType without a lot of Analysis (e.g. string):

<add>
  <doc>
    <field name="word">Accountant</field>
  </doc>
  <doc>
    <field name="word">Auditor</field>
  </doc>
  <doc>
    <field name="word">Solicitor</field>
  </doc>
</add>

In order to extract dictionary words from a field containing more than a single word (i.e. a text field), you should use the StandardTokenizer and StandardFilter which doesn't perform a great deal of processing on the field yet should provide acceptable results when used with the spell checker:

<fieldType name="spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory "/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words=" stopwords.txt"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>

To automatically populate this field with the contents of another field when a document is added to the index, simply use a copyField:

<copyField source="content" dest="spell"/> 

The default field is 'word' and can be configured in SolrConfigXml.

spellcheckerIndexDir

(sp.dictionary.indexDir in <!> ["Solr1.3"])

The directory where your spell checker index should live and defaults to 'spell' in SolrConfigXml. May be absolute or relative to the Solr "dataDir" directory. If this option is not specified, a RAM directory will be used.

sp.dictionary.threshold

Determines what terms will be used for creating the dictionary from the source field. The threshold is in terms of document frequency, i.e., what fraction of documents contain this term (not term frequency). This can be used to create a smaller, more accurate dictionary.

The default value is '0'. <!> ["Solr1.3"]

cmd

There are currently two supported values for cmd: 'rebuild' and 'reopen':

In order to use SpellCheckerRequestHandler for the first time, you need to explicitly build the spelling index (see examples below):

If an external process is responsible for building the spell checker index, you must issue '&cmd=reopen' to force the spell checker index directory to be re-opened .

suggestionCount

(sp.query.suggestionCount in <!> ["Solr1.3"])

Determines how many spelling suggestions are returned. The default value is 1 but can be configured in SolrConfigXml. The order of the returned results is determined by both the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] (or accuracy) of the suggestion and the popularity (the frequency) of the suggested word in the termSourceField.

accuracy

(sp.query.accurary in <!> ["Solr1.3"])

A float value between 1.0 and 0.0 on how close the suggested words should match the original word being checked (calculated using the [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] algorithm). The default value is 0.5 but can be configured in SolrConfigXml.

onlyMorePopular

(sp.query.onlyMorePopular in <!> ["Solr1.3"])

When "onlyMorePopular" is set to true and the misspelled word exists in the user field, only words that occur more frequently in the termSourceField than the one given will be returned. The default value is false.

sp.query.extendedResults

Whether to use the extended response format, which is more complicated but richer. Returns the document frequency for each suggestion and returns one suggestion block for each term in the query string.

The default value is 'false'. <!> ["Solr1.3"]

Examples

Build the spelling index for the first time:
  http://localhost:8983/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild

A simple call to the spell check handler:
  http://localhost:8983/solr/select/?q=windaws&qt=spellchecker

Return a list of suggestions that appear more frequently in the termSourceField that the word 'aft'
  http://localhost:8983/solr/select/?q=aft&qt=spellchecker&onlyMorePopular=true

Return 5 suggestions with a accuracy value of 0.7:
  http://localhost:8983/solr/select/?q=linix&qt=spellchecker&suggestionCount=5&accuracy=0.7


CategorySolrRequestHandler

SpellCheckerRequestHandler (last edited 2012-07-19 05:53:57 by cust)