Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Pulsing codec

An optimized codec for fields that have lots of rare terms.

DocumentsWritersPerThread

Improved concurrency of index updates.

Query execution

Terms dictionary

...

NumericRangeQuery

Lucene has an optimized range query implementation for numeric types:

BKD trees

BKD trees have been implemented to support geo capabilities in Lucene and have superseded NumericRangeQuery for one-dimensional data.

Automaton-based fuzzy query

...

Scoring models

In addition to its default TF-IDF scoring algorithm, Lucene supports other scoring models such as Okapi BM25 and models based on language models.

Incorporating non-textual signals into the final score

...

Block Max WAND

Block MAX WAND is an iteration over WAND that helps efficiently skip scoring non-relevant documents.

Misc

FST compression

Lucene uses FSTs a lot, so their in-memory size is important.

Twitter Earlybird

Modifications that Twitter made to Lucene to support lock-free updates and efficient early query termination for time-based relevance.

...