FunctionQuery allows one to use the actual value of a numeric field and functions of those fields in a relevancy score.

Using FunctionQuery

There are a few ways to use FunctionQuery from Solr's HTTP interface:

  1. Embed a FunctionQuery in a regular query expressed in SolrQuerySyntax via the _val_ hook

  2. Use the FunctionQParserPlugin, ie: q={!func}log(foo)

  3. Use a parameter that has an explicit type of FunctionQuery, such as DisMaxRequestHandler's bf (boost function) parameter.

    • NOTE: the bf parameter actually takes a list of function queries separated by whitespace and each with an optional boost. Make sure to eliminate any internal whitespace in single function queries when using bf.

    • Example: q=foo&bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"

See SolrPlugins#ValueSourceParser for information on how to hook in your own FunctionQuery.

Note that a FunctionQuery matches all non-deleted documents.

Function Query Syntax

There is currently no infix parser - functions must be expressed as function calls (e.g. sum(a,b) instead of a+b)

Available Functions

constant

<!> Solr1.3 Floating point constants.

literal

<!> Solr1.5 String literals. Warning: not all functions can work with string literals.

fieldvalue

This function returns the numeric field value of an indexed field with a maximum of one value per document (not multiValued). The syntax is simply the field name by itself. 0 is returned for documents without a value in the field.

ord

ord(myfield) returns the ordinal of the indexed field value within the indexed list of terms for that field in lucene index order (lexicographically ordered by unicode value), starting at 1. In other words, for a given field, all values are ordered lexicographically; this function then returns the offset of a particular value in that ordering. The field must have a maximum of one value per document (not multiValued). 0 is returned for documents without a value in the field.

WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must use a FieldCache entry at the top level reader, while sorting and function queries now use entries at the segment level. Hence sorting or using a different function query, in addition to ord()/rord() will double memory use.

WARNING: ord() depends on the position in an index and can thus change when other documents are inserted or deleted, or if a MultiSearcher is used.

rord

The reverse ordering of what ord provides.

WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must use a FieldCache entry at the top level reader, while sorting and function queries now use entries at the segment level. Hence sorting or using a different function query, in addition to ord()/rord() will double memory use.

sum

<!> Solr1.3 sum(x,y,...) returns the sum of multiple functions.

sub

<!> Solr1.4 sub(x,y) returns x-y

product

<!> Solr1.3 product(x,y,...) returns the product of multiple functions.

div

<!> Solr1.3 div(x,y) divides the function x by the function y.

pow

<!> Solr1.3 pow(x,y) raises the base x to the power y.

abs

<!> Solr1.3 abs(x) returns the absolute value of a function.

log

<!> Solr1.3 log(x) returns log base 10 of the function x.

sqrt

<!> Solr1.3 sqrt(x) returns the square root of the function x

map

<!> Solr1.3 map(x,min,max,target) maps any values of the function x that fall within min and max inclusive to target. min,max,target are constants. It outputs the field's value if it does not fall between min and max.

scale

<!> Solr1.3 scale(x,minTarget,maxTarget) scales values of the function x such that they fall between minTarget and maxTarget inclusive.

query

<!> Solr1.4 query(subquery, default) returns the score for the given subquery, or the default value for documents not matching the query. Any type of subquery is supported through either parameter dereferencing $otherparam or direct specification of the query string in the LocalParams via "v".

linear

linear(x,m,c) implements m*x+c where m and c are constants and x is an arbitrary function. This is equivalent to sum(product(m,x),c), but slightly more efficient as it is implemented as a single function.

recip

A reciprocal function with recip(x,m,a,b) implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function.

When a and b are equal, and x>=0, this function has a maximum value of 1 that drops as x increases. Increasing the value of a and b together results in a movement of the entire function to a flatter part of the curve. These properties can make this an ideal function for boosting more recent documents when x is rord(datefield).

<!> Solr1.4 In Solr 1.4 and later, best practice is to avoid ord() and rord() and derive the boost directly from the value of the date field. See ms() for more details.

max

max(x,c) returns the max of another function and a constant. Useful for "bottoming out" another function at some constant.

ms

<!> Solr1.4

Returns milliseconds of difference between it's arguments.

Dates are relative to the Unix or POSIX time epoch, midnight, January 1, 1970 UTC.

Arguments may be numerically indexed date fields such as TrieDate (the default in 1.4), or date math (examples in SolrQuerySyntax) based on a constant date or NOW.

ms()

ms(a)

ms(a,b)

Date Boosting

Boosting more recent content is a common use case. One way is to use a recip function in conjunction with ms.

There are approximately 3.16e10 milliseconds in a year, so one can scale dates to fractions of a year with the inverse, or 3.16e-11. Thus the function recip(ms(NOW,mydatefield),3.16e-11,1,1) will yield values near 1 for very recent documents, 1/2 for documents a year old, 1/3 for documents two years old, etc. Be careful to not use this function for dates more than one year in the future or the values will be negative.

The most effective way to use such a boost is to multiply it with the relevancy score, rather than add it in. One way to do this is with the boost query parser.

Also see http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

Math.*

<!> Solr1.5

Most Java Math functions are now supported, including:

  1. rad - convert degrees to radians
  2. deg- convert radians to degrees
  3. sqrt - square root
  4. cbrt - cube root
  5. log - base10 log
  6. ln - natural log
  7. exp - Euler's number raised to a power
  8. sin - trigonometric sine of an angle in radians
  9. cos - cosine of an angle
  10. tan - tangent of an angle
  11. asin - arc sine
  12. acos - arc cosine
  13. atan- arc tangent
  14. sinh - hyperbolic sine
  15. cosh - hyperbolic cosine
  16. tanh - hyperbolic tangent
  17. ceil - ceiling, rounds up to an integer
  18. floor - rounds down to an integer
  19. rint - rounds to the nearest integer
  20. pow(x,y) - raises x to the power of y (x**y)
  21. hypo(x,y) - returns sqrt(x**2+y**2) without intermediate overflow or underflow
  22. atan2(x,y) - returns the angle resulting from the conversion of the rectangular coordinates x,y to polar coordinates
  23. pi() - returns the closest approximation of pi, the ratio of the circumference of a circle to its diameter
  24. e() - returns the closest approximation to e, the base of the natural logarithm

See the java.util.Math javadocs for more details.

dist

Solr1.5 See: https://issues.apache.org/jira/browse/SOLR-1302

Return the Distance between two Vectors (points) in an n-dimensional space. See http://en.wikipedia.org/wiki/Lp_space for more information. Takes in the power, plus two or more ValueSource instances and calculates the distances between the two vectors. Each ValueSource must be a number. There must be an even number of ValueSource instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector.

Signature: dist(power, pointA, pointB, ...)

Common cases:

Example: Assume each document has 4 numeric fields: x,y,z,w. Then we can calculate various distances:

  1. dist(2, x, y, 0, 0) - calculates the Euclidean distance between (0,0) and (x,y) for each document
  2. dist(1, x, y, 0, 0) - calculates the Manhattan distance between (0,0) and (x,y) for each document
  3. dist(2, x,y,z,0,0,0) - Euclidean distance between (0,0,0) and (x,y,z) for each document.
  4. dist(1,x,y,z,e,f,g) - Euclidean distance between (x,y,z) and (e,f,g) where each letter is a field name

sqedist - Squared Euclidean Distance

Solr1.5 See: https://issues.apache.org/jira/browse/SOLR-1302

Similar to the dist() function, the Sq. Euclidean distance calculates the 2-norm (Euclidean distance) but does not take the square root, thus saving a fairly expensive operation. It is often the case that applications that care about Euclidean distance do not need the actual distance, but instead can use the square of the distance. See also dist(). There must be an even number of ValueSource instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector.

Signature: sqedist(pointA, pointB, ...)

Example: sqedist(x_td, y_td, 0, 0)

hsin, ghhsin - Haversine Formula

Solr1.5 See: https://issues.apache.org/jira/browse/SOLR-1302

The Haversine distance calculates the distance between two points on a sphere when traveling along the sphere. In other words, it's the Great Circle distance. See http://en.wikipedia.org/wiki/Great-circle_distance. In the case of GeohashHaversine, the sources must be a geohash value.

Values must be in Radians. Please note Haversine distance has known problems with antipodal values. See the Wiki page.

Signature: hsin(x1,y1,x2,y2, radius), ghhsin(hash1, hash2, radius)

Example: hsin(x, y, 0, 0, 1) - Calculate the haversin distance between the lat/lon 0, 0 and the values in field x,y with a radius of 1.

geohash - Convert to geohash

Given a latitude and longitude (in degrees, not radians), convert them to a Geohash. See http://en.wikipedia.org/wiki/Geohash.

Signature: geohash(ValueSource, ValueSource)

Example: geohash(lat, lon)

top

<!> Solr1.4 Causes it's function query argument to derive it's values from the top-level IndexReader containing all parts of an index. For example, the ordinal of a value in a single segment will be different from the ordinal of that same value in the complete index. The ord() and rord() functions implicitly use top() and hence ord(foo) is equivalent to top(ord(foo)).

General Example

To give more idea about the use of the function query, suppose index stores dimensions in meters x, y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes, the query params would be:

  q=boxname:findbox+_val_:"product(product(x,y),z)"

Although this will rank the results based on volumes but in order to get the computed volume you will need to add parameter...

  &fl=*,score

...where score will contain the resultant volume.

Suppose you also have a field containing weight of the box as 'weight', then to sort by the density of the box and return the value of the density in score you query should be...

http://localhost:8983/solr/select/?q=boxname:findbox+_val_:"div(weight,product(product(x,y),z))"&fl=boxname,x,y,z,weight,score

FunctionQuery (last edited 2009-11-20 20:05:33 by YonikSeeley)