Searching Numerical Fields

NumericRangeQuery (in Lucene Core since version 2.9)

Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (called trie, all numerical values like doubles, longs, Dates, floats, and ints are converted to lexicographic sortable string representations and indexed with different precisions). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically. See: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/NumericRangeQuery.html

This dramatically improves the performance of Apache Lucene with range queries, which is no longer dependent on the index size and number of distinct values because there is an upper limit not related to any of these properties.

NumericRangeQuery (formerly TrieRangeQuery) can be used for date/time searches (if you need variable precision of date and time downto milliseconds), double searches (e.g. spatial search for latitudes or longitudes), prices (if encoded as long using cent values, doubles are not good for price values because of rounding problems). The document fields containing the trie encoded values are generated by a special NumericTokenStream or simplier using the new field implementation NumericField (see http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/core/org/apache/lucene/document/NumericField.html). Numeric fields can be sorted on (a special parser is included into FieldCache) and used in function queries (through FieldCache)

Other possibilities with storing numerical values stored in more readable form in index

Utility to pad the numbers

Index the relevant fields using the pad function

Use a Custom RangeFilter

If you have a size field indexed using NumberTools build a chained RangeFilter to include a subset such as 1-1500.

FilteredQuery fq=new FilteredQuery(query,cstm_range("size",1L,1500L)); 

private static Filter cstm_range(String sfld,long lmin,long lmax) 
   { 
   Filter lessthn_f=RangeFilter.Less(sfld,NumberTools.longToString(lmax));
   Filter morethn_f=RangeFilter.More(sfld,NumberTools.longToString(lmin));
   Filter[] fa=new Filter[]{lessthn_f,morethn_f}; 

   Filter rf=new ChainedFilter(fa,ChainedFilter.AND); 
   return rf; 
   } 

Consider Using a Filter

Create a custom QueryParser subclass:

Use the custom QueryParser

For decimals

Handling positive and negative numbers.

Handling larger numbers

SearchNumericalFields (last edited 2009-10-07 06:03:16 by UweSchindler)