Lucene Concepts and Definitions
This page contains concepts and definitions related to Lucene. It is not a substitute for knowledge in InformationRetrieval.
Please keep in alphabetical order when editing.
Payloads - A payload is an array of bytes stored at one or more term positions
Stemmer - From Wikipedia Stemmer: "A stemming algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form." Stemmers are often used to reduce the search space and index size. Often times a user searching for "widgets" is interested in documents that contain the term "widget".
A Lucene Document is a record in the index. A Document has a list of fields; each field has a name and a textual value.
A Term is Lucene's unit of indexing. In western languages, a Term is often a word.
TermEnum is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).
See LuceneFAQ, How do I retrieve all the values of a particular field that exists within an index, across all documents? which also includes sample code.
A TermFreqVector (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the IndexReader only when Term Vectors are stored during indexing.