SpellChecker

A Spell Checker allows to suggest a list of words similar to a misspelled word. This implementation is based on David Spencer's code using the n-gram method and the Levenshtein distance.

Structure of a dictionary index

An index (the dictionary) with all the possible words (a lucene index) must be created. The structure of this index is (for a 3-4 gram) this:

Index Structure

Example

word

kings

gram3

kin, ing, ngs

gram4

king, ings

start3

kin

start4

king

end3

ngs

end4

ings

Import: Adding Words to the Dictionary

We can add the words coming from a Lucene Index (more precisely from a set of Lucene fields), and from a text file with a list of words.

Getting a List of Suggested Words

The suggestSimilar method returns a list of suggested words sorted by:

  1. the Levenshtein distance (the most similar word to the misspelled word is the first in the list). 2. (optionally) the popularity of the word in a given Lucene Field.

Furthermore, that list can be restricted only to the words present in a given Lucene Field.

Changes

Version 1.1 :

Credits