FunctionQuery allows one to use the actual value of a field and functions of those fields in a relevancy score.

Contents

- What is a Function?
- Using FunctionQuery
- Function Query Syntax
- Sort By Function
- Available Functions
- General Example
- Vector Functions

# What is a Function?

A function can be:

- A Constant (numeric or string literal) - See "constant" and "literal" below.
- A Field - See "fieldvalue" below. Also see notes later on multivalued/vector fields.
- Another Function
Solr4.0 A parameter substitution: $arg, where arg is another function in the request.

Example request:

`q={!func}add($v1,$v2)&v1=sqrt(popularity)&v2=100.0`

Unless otherwise indicated, the term "function" or "function query" refer to normal functions that produce at most one value per document (i.e not vector functions)

# Using FunctionQuery

There are a few ways to use FunctionQuery from Solr's HTTP interface:

Invoke the FunctionQParserPlugin via LocalParams syntax, e.g.

`q={!func}log(foo)`Alternatively: Set

**func**as the default query type, e.g.`defType=func&q=log(foo)`

Invoke the FunctionRangeQParserPlugin via LocalParams syntax in a filter query, e.g.

`fq={!frange l=0}sub(field1,field2)`Use a parameter that has an explicit type of FunctionQuery, such as DisMaxQParserPlugin's

**bf**(boost function) parameter, or extended dismax**boost**parameter (multiplicative boost).NOTE: the

**bf**parameter actually takes a list of function queries separated by whitespace and each with an optional boost. Make sure to eliminate any internal whitespace in single function queries when using**bf**.Example:

`q=foo&bf="ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3"`

Embed a FunctionQuery in a regular query expressed in SolrQuerySyntax via the _val_ hook

If you use a function query as 'just another term' (for example, by using the special field name **_val_**), the value of the function becomes a component of the overall query score, just as with any other part of the query. This includes the standard normalization, which amounts (by default) to multiplying scores by a magic constant (by default, that is inverse of the square root of the sum of squared weights).

See SolrPlugins#**ValueSource**Parser for information on how to hook in your own FunctionQuery.

Note that a FunctionQuery currently matches all non-deleted documents.

# Function Query Syntax

There is currently no infix parser - functions must be expressed as function calls (e.g. sum(a,b) instead of a+b)

# Sort By Function

It is now possible to sort the results by the output of a function. For instance, if an application wanted to sort by distance, it could do:

http://localhost:8983/solr/select?q=*:*&sort=dist(2, point1, point2) desc

# Available Functions

## constant

Solr1.3 Floating point constants.

Example Syntax:

**1.5**SolrQuerySyntax Example:

**_val_:1.5**

## literal

Solr3.1 String literals. Warning: not all functions can work with string literals.

- Signature: literal(value), "value", 'value'
- Example: 'foo', literal("this is a literal")

## field

This function returns the numeric field value of an indexed field with a maximum of one value per document (not multiValued). The syntax is simply the field name by itself. 0 is returned for documents without a value in the field.

Example Syntax:

**myFloatField**or**field(myFloatField)**SolrQuerySyntax Example:

**_val_:myFloatField**

For non-numeric fields, fields are still evaluated to a numeric value. How this works depends on the field type:

For new date fields (i.e. those based on the TrieDateField class), the fieldvalue behavior is as follows: If the document has a value in that field, then the numeric field value is the number of milliseconds(?) between that date and the epoch. (This number can be negative for documents dated before the epoch.) If the document has no value in that field, then the numeric field value is 0. (That is, the default==0 convention still holds for dates. This has the side-effect of making undated documents act as if they were dated 1970-01-01T00:00:00Z.)

For "classic" date fields (i.e. those based on DateField or LegacyDateField), the fieldvalue value is not a function of the date field itself; rather, it gives the "ord" ordinal ranking of that field's value. (See "ord".) The default==0 convention still holds. (This has the side-effect of making undated documents act as if they were the oldest documents in the index.)

For boolean fields (e.g. solr.BoolField), the fieldvalue behavior may not be what you expect. In particular, don't assume that true maps to 1.0 and false to 0.0. At least on a pre-1.4 build, Chris Harris noted that true was mapping to 2.0. The justification (from debugQuery=on) is as follows: " 2.0 = ord(inStock)=2".

## ord

ord(myfield) returns the ordinal of the indexed field value within the indexed list of terms for that field in lucene index order (lexicographically ordered by unicode value), starting at 1. In other words, for a given field, all values are ordered lexicographically; this function then returns the offset of a particular value in that ordering. The field must have a maximum of one value per document (not multiValued). 0 is returned for documents without a value in the field.

- Example: If there were only three values for a particular field: "apple","banana","pear", then ord("apple")=1, ord("banana")=2, ord("pear")=3
Example Syntax:

**ord(myIndexedField)** Example SolrQuerySyntax:

**_val_:"ord(myIndexedField)"**

WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must use a FieldCache entry at the top level reader, while sorting and function queries now use entries at the segment level. Hence sorting or using a different function query, in addition to ord()/rord() will double memory use.

WARNING: ord() depends on the position in an index and can thus change when other documents are inserted or deleted, or if a MultiSearcher is used.

## rord

The reverse ordering of what ord provides.

Example Syntax:

**rord(myIndexedField)**Example:

**rord(myDateField)**is a metric for how old a document is: the youngest document will return 1, the oldest document will return the total number of documents.

WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must use a FieldCache entry at the top level reader, while sorting and function queries now use entries at the segment level. Hence sorting or using a different function query, in addition to ord()/rord() will double memory use.

## sum

Solr1.3 sum(x,y,...) returns the sum of multiple functions.

Example Syntax:

**sum(x,1)**Example Syntax:

**sum(x,y)**Example Syntax:

**sum(sqrt(x),log(y),z,0.5)**

## sub

Solr1.4 sub(x,y) returns x-y

Example:

**sub(myfield,myfield2)**Example:

**sub(100,sqrt(myfield))**

## product

Solr1.3 product(x,y,...) returns the product of multiple functions.

Example Syntax:

**product(x,2)**Example Syntax:

**product(x,y)**

## div

Solr1.3 div(x,y) divides the function x by the function y.

Example Syntax:

**div(1,x)**Example Syntax:

**div(sum(x,100),max(y,1))**

## mod

Solr4.0 mod(x,y) computes the modulus of the function x by the function y.

Example Syntax:

**mod(1,x)**Example Syntax:

**mod(sum(x,100),max(y,1))**

## pow

Solr1.3 pow(x,y) raises the base x to the power y.

Example Syntax:

**pow(x,0.5)**- same as sqrtExample Syntax:

**pow(x,log(y))**

## abs

Solr1.3 abs(x) returns the absolute value of a function.

Example Syntax:

**abs(-5)**Example Syntax:

**abs(x)**

## log

Solr1.3 log(x) returns log base 10 of the function x.

Example Syntax:

**log(x)**Example Syntax:

**log(sum(x,100))**

## sqrt

Solr1.3 sqrt(x) returns the square root of the function x

Example Syntax:

**sqrt(2)**Example Syntax:

**sqrt(sum(x,100))**

## map

Solr1.3 map(x,min,max,target) with Solr1.4 map(x,min,max,target,value) maps any values of the function x that fall within min and max inclusive to target. min,max,target,value are constants. It outputs the field's value (or "value") if it does not fall between min and max.

Example Syntax 1:

**map(x,0,0,1)**change any values of 0 to 1... useful in handling default 0 valuesExample Syntax 2 Solr1.4:

**map(x,0,0,1,0)**change any values of 0 to 1 . and if the value is not zero it can be set to the value of the 5th argument instead of defaulting to the field's valueExample Syntax 3 Solr1.3:

**map(price,0,100,0)**if price is between 0 and 100 return 0 otherwise return price.

Also, if x=NULL, it will also match min of 0.

## scale

Solr1.3 scale(x,minTarget,maxTarget) scales values of the function x such that they fall between minTarget and maxTarget inclusive.

Example Syntax:

**scale(x,1,2)**all values will be between 1 and 2 inclusive.**NOTE**: The current implementation currently traverses all of the function values to obtain the min and max so it can pick the correct scale.**NOTE**: This implementation currently cannot distinguish when documents have been deleted or documents that have no value, and 0.0 values will be used for these cases. This means that if values are normally all greater than 0.0, one can still end up with 0.0 as the min value to map from. In these cases, an appropriate map() function could be used as a workaround to change 0.0 to a value in the real range. example:**scale(map(x,0,0,5),1,2)**

## query

Solr1.4 query(subquery, default) returns the score for the given subquery, or the default value for documents not matching the query. Any type of subquery is supported through either parameter dereferencing `$otherparam` or direct specification of the query string in the LocalParams via "v".

These examples assume '&defType=func', since they just have a function in the Q parameter.

Example Syntax:

**q=product(popularity, query({!dismax qf=text v='solr rocks'}))**returns the product of the popularity and the score of the dismax query.Example Syntax:

**q=product(popularity, query($qq))&qq={!dismax qf=text}solr rocks**is equivalent to the previous query, using param dereferencing.Example Syntax:

**q=product(popularity, query($qq,0.1))&qq={!dismax qf=text}solr rocks**specifies a default score of 0.1 for documents that don't match the dismax query.

## currency

Solr4.2 `currency(field_name,[CODE])` is a function that can operate on any `field_name` that is implemented as CurrencyField. Unlike the raw values generated when using a currency field direction as input to another function, the `currency()` function can be used to return human readable values in either the default currency configured for the field type, or an arbitrary currency specified in the function.

Example Syntax:

**currency(price_c)**returns the price_c field values in the default currencyExample Syntax:

**currency(price_c,EUR)**returns the price_c field values converted to Euros, regardless of the default currency

See CurrencyField for more details.

## linear

linear(x,m,c) implements m*x+c where m and c are constants and x is an arbitrary function. This is equivalent to **sum(product(m,x),c)**, but slightly more efficient as it is implemented as a single function.

Example Syntax:

**linear(x,2,4)**returns 2*x+4

## recip

A reciprocal function with **recip(x,m,a,b)** implementing a/(m*x+b). m,a,b are constants, x is any numeric field or arbitrarily complex function.

When a and b are equal, and x>=0, this function has a maximum value of 1 that drops as x increases. Increasing the value of a and b together results in a movement of the entire function to a flatter part of the curve. These properties can make this an ideal function for boosting more recent documents when x is rord(datefield).

Example Syntax:

**recip(rord(creationDate),1,1000,1000)**

Solr1.4 In Solr 1.4 and later, best practice is to avoid ord() and rord() and derive the boost directly from the value of the date field. See ms() for more details.

## max

max(x,c) returns the max of another function and a constant. Useful for "bottoming out" another function at some constant.

Example Syntax:

**max(myfield,0)**

As of Solr3.2 max no longer requires a constant, arguments can be arbitrary functions.

## min

min(x,y) returns the minimum of two functions.

Example Syntax:

**min(fieldA,fieldB)**Example Syntax:

**min(fieldA, 37)**

## ms

Returns milliseconds of difference between it's arguments.

Dates are relative to the Unix or POSIX time epoch, midnight, January 1, 1970 UTC.

Arguments may be numerically indexed date fields such as TrieDate (recommended field type for dates since Solr 1.4), or date math (examples in SolrQuerySyntax) based on a constant date or **NOW**.

Things other than these will _not_ work as arguments. For example, you cannot currently use:

"classic", non-numerically indexed date fields (i.e. fields backed by DateField or LegacyDateField classes)

- other functions (e.g. map, sum, etc.)

Arguments may _not_ be "classic" date fields

**ms()**

Equivalent to

**ms(NOW)**, number of milliseconds since the epoch.

**ms(a)**

- Returns the number of milliseconds since the epoch that the argument represents.
Example:

**ms(NOW/DAY)** Example:

**ms(2000-01-01T00:00:00Z)**Example:

**ms(mydatefield)**

Note that this number can be negative for dates from before the epoch.

**ms(a,b)**

Returns the number of milliseconds that

`b`occurs before`a`(i.e.`a - b`). Note that this offers higher precision than**sub(a,b)**because the arguments are not converted to floating point numbers before subtraction.Example:

**ms(NOW,mydatefield)**Example:

**ms(mydatefield,2000-01-01T00:00:00Z)**Example:

**ms(datefield1,datefield2)**

ms(foofield) currently (Chris Harris, 4/16/2010) returns the value 0 for docs with nonexistent foofield. Should this behavior be relied on?

### Date Boosting

Boosting more recent content is a common use case. One way is to use a `recip` function in conjunction with `ms`.

There are approximately 3.16e10 milliseconds in a year, so one can scale dates to fractions of a year with the inverse, or 3.16e-11. Thus the function **recip(ms(NOW,mydatefield),3.16e-11,1,1)** will yield values near 1 for very recent documents, 1/2 for documents a year old, 1/3 for documents two years old, etc. Be careful to not use this function for dates more than one year in the future or the values will be negative.

Consider using reduced precision to prevent excessive memory consumption. You would instead use **recip(ms(NOW/HOUR,mydatefield),3.16e-11,1,1)**. See this thread for more information.

The most effective way to use such a boost is to multiply it with the relevancy score, rather than add it in. One way to do this is with the boost query parser.

Also see http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents

## Math.*

Most Java Math functions are now supported, including:

- rad - convert degrees to radians
- deg- convert radians to degrees
- sqrt - square root
- cbrt - cube root
- log - base10 log
- ln - natural log
- exp - Euler's number raised to a power
- sin - trigonometric sine of an angle in radians
- cos - cosine of an angle
- tan - tangent of an angle
- asin - arc sine
- acos - arc cosine
- atan- arc tangent
- sinh - hyperbolic sine
- cosh - hyperbolic cosine
- tanh - hyperbolic tangent
- ceil - ceiling, rounds up to an integer
- floor - rounds down to an integer
- rint - rounds to the nearest integer
- pow(x,y) - raises x to the power of y (x**y)
- hypo(x,y) - returns sqrt(x**2+y**2) without intermediate overflow or underflow
- atan2(x,y) - returns the angle resulting from the conversion of the rectangular coordinates x,y to polar coordinates
- pi() - returns the closest approximation of pi, the ratio of the circumference of a circle to its diameter
- e() - returns the closest approximation to e, the base of the natural logarithm

See the java.util.Math javadocs for more details.

## Relevance Functions

Relevancy functions that take terms can use a bare term if it's a simple word

Example:

**docfreq(text,solr)**

Can optionally quote the term if it's more complex

Example:

**docfreq(text,'solr')**

And can even do parameter substitution for the term value

Example Partial Request:

**&defType=func&q=docfreq(text,$myterm)&myterm=solr**

### docfreq

**docfreq(field,term)** returns the number of documents that contain the term in the field. This is a constant (the same value for all documents in the index).

Example Syntax:

**docfreq(text,'solr')**

### termfreq

**termfreq(field,term)** returns the number of times the term appears in the field for that document.

Example Syntax:

**termfreq****(text,'memory')**

### totaltermfreq

**totaltermfreq(field,term)** returns the number of times the term appears in the field in the entire index. **ttf** is an alias of **totaltermfreq**.

Example Syntax:

**ttf****(text,'memory')**

### sumtotaltermfreq

**sumtotaltermfreq(field)** returns the sum of totaltermfreq values for all terms in the field in the entire index (i.e. the number of indexed tokens for that field). **sttf** is an alias of **sumtotaltermfreq**.

Example Syntax:

**sttf****(text)**

### idf

**idf(field,term)** returns the inverse document frequency for the given term, using the Similarity for the field.

Example Syntax:

**idf(text,'solr')**

### tf

**tf(field,term)** returns the term frequency factor for the given term, using the Similarity for the field.

Example Syntax:

**tf(text,'solr')**

### norm

**norm(field)** returns the "norm" stored in the index, the product of the index time boost and then length normalization factor, according to the [[http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/Similarity.html|Similarity] for the field.

Example Syntax:

**norm(text)**

### maxdoc

**maxdoc()** returns the number of documents in the index, including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index).

### numdocs

**numdocs()** returns the number of documents in the index, not including those that are marked as deleted but have not yet been purged. This is a constant (the same value for all documents in the index).

## Boolean Functions

These are functions which helps do boolean logic acting on true/false and conditions. Note that they do not only work for pure `true/false` values but interpret 0 as false and positive number as true.

### true

**true** returns true

### false

**false** returns false

### exists

**exists(field|function)** returns true if a value exists for a given document.

Example use: `exists(myField)` will return true if myField has a value, while `exists(query({!v='year:2012'}))` will return true for docs with year=2012.

### if

**if(expression,trueValue,falseValue)** emits trueValue if the expression is true, else falseValue. An expression can be any function which outputs boolean values, or even functions returning numeric values, in which case value 0 will be interpreted as false, or strings, in which case empty string is interpreted as false.

Example use: `if(exists(myField),100,0)` returns 100 if myField exists

### def

**def(field|function,defaultValue)** returns the value of field "field", or if the field does not exist, returns the defaultValue specified.

Example use: `def(rating,5)` The def() function will here return the rating, or if no rating specified in the doc, returns 5

### not

**not(function|field)** returns the inverse of expression.

Examples: `not(true)` returns false, `not(inStock))` returns true (or 1.0) for docs with inStock=false, and false (or 0.0) for the others.

### and

**and(x,y)** returns true if both x and y are true

### or

**or(x,y)** returns true if either x or y are true

## dist

Return the Distance between two Vectors (points) in an n-dimensional space. See http://en.wikipedia.org/wiki/Lp_space for more information. Takes in the power, plus two or more **ValueSource** instances and calculates the distances between the two vectors. Each **ValueSource** must be a number. There must be an even number of **ValueSource** instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector.

Signature: dist(power, pointA, pointB, ...)

Common cases:

Power

Common Name

0

Sparseness calculation

1

Manhattan (taxicab) Distance

2

Euclidean Distance

Infinite

Infinite norm - maximum value in the vector

Example: Assume each document has 4 numeric fields: x,y,z,w. Then we can calculate various distances:

- dist(2, x, y, 0, 0) - calculates the Euclidean distance between (0,0) and (x,y) for each document
- dist(1, x, y, 0, 0) - calculates the Manhattan distance between (0,0) and (x,y) for each document
- dist(2, x,y,z,0,0,0) - Euclidean distance between (0,0,0) and (x,y,z) for each document.
- dist(1,x,y,z,e,f,g) - Euclidean distance between (x,y,z) and (e,f,g) where each letter is a field name

See Solr's SpatialSearch for easy to use geospatial search features.

## sqedist - Squared Euclidean Distance

Similar to the dist() function, the Sq. Euclidean distance calculates the 2-norm (Euclidean distance) but does not take the square root, thus saving a fairly expensive operation. It is often the case that applications that care about Euclidean distance do not need the actual distance, but instead can use the square of the distance. See also dist(). There must be an even number of **ValueSource** instances passed in and the method assumes that the first half represent the first vector and the second half represent the second vector.

Signature: sqedist(pointA, pointB, ...)

Example: sqedist(x_td, y_td, 0, 0)

## hsin, ghhsin - Haversine Formula

The Haversine distance calculates the distance between two points on a sphere when traveling along the sphere. In other words, it's the Great Circle distance. See http://en.wikipedia.org/wiki/Great-circle_distance. In the case of GeohashHaversine, the sources must be a geohash value.

**Values must be in Radians**. See the rad() function. Please note Haversine distance has known problems with antipodal values. See the Wikipedia page for details.

Signature: hsin(radius, true|false, x1,y1,x2,y2), ghhsin(radius, hash1, hash2) The boolean value determines whether hsin will automatically convert the input values from degrees to radians.

Example: hsin(2, true, x, y, 0, 0) - Calculate the haversine distance between the lat/lon 0, 0 on a sphere of radius 2 and the values in field x,y, converting the values automatically to radians.

See Solr's SpatialSearch for easy to use geospatial search features.

## geohash - Convert to geohash

Given a latitude and longitude (in degrees, not radians), convert them to a Geohash. See http://en.wikipedia.org/wiki/Geohash.

Signature: geohash(**ValueSource**, **ValueSource**)

Example: geohash(lat, lon)

## strdist

Calculate the distance between two strings. Uses the Lucene spell checker StringDistance interface and supports all of the implementations available in that package, plus allows applications to plug in their own via Solr's resource loading capabilities.

- Signature: strdist(s1, s2, {jw|edit|ngram|FQN}[, ngram size])
- Example: strdist("SOLR",id,edit)

The third argument is the name of the distance measure to use. The abbreviations stand for:

- jw - Jaro-Winkler
- edit - Levenstein or Edit distance
- ngram - The NGramDistance, if specified, can optionally pass in the ngram size too. Default is 2.
FQN - Fully Qualified class Name for an implementation of the StringDistance interface. Must have a no-arg constructor.

This function returns a float between 0 and 1 based on how similar the specified strings are to one another. Returning a value of 1 means the specified strings are identical and 0 means the string are maximally different.

## top

Solr1.4 Causes it's function query argument to derive it's values from the top-level IndexReader containing all parts of an index. For example, the ordinal of a value in a single segment will be different from the ordinal of that same value in the complete index. The ord() and rord() functions implicitly use top() and hence ord(foo) is equivalent to top(ord(foo)).

# General Example

To give more idea about the use of the function query, suppose index stores dimensions in meters **x**, **y**,**z** of some hypothetical boxes with arbitrary names stored in field **boxname**. Suppose we want to search for box matching name *findbox* but ranked according to volumes of boxes, the query params would be:

q=boxname:findbox+_val_:"product(product(x,y),z)"

Although this will rank the results based on volumes but in order to get the computed volume you will need to add parameter...

&fl=*,score

...where **score** will contain the resultant volume.

Suppose you also have a field containing weight of the box as 'weight', then to sort by the density of the box and return the value of the density in score you query should be...

http://localhost:8983/solr/select/?q=boxname:findbox+_val_:"div(weight,product(product(x,y),z))"&fl=boxname,x,y,z,weight,score

# Vector Functions

Until Solr3.1, all functions were single valued. Since Solr31., it is now possible for functions to use multiple values as part of a function calculation. There is not yet support for fields that have more than one token, but it is possible to aggregate multiple single value fields into a vector.

## vector

Takes in one or more single sources (i.e. all the stuff above) and creates a vector source.

Signature: vector(x,y,z)

### Distances

dist(), hsine() and sqedist() can work with vector functions (implemented in Java via Multi**ValueSource** )