Spatial Search
Contents
Introduction
Many applications wish to combine location data with text data. This is often called spatial search or geo-spatial search. Most of these applications need to do several things:
- Represent spatial data in the index
- Filter by some spatial concept such as a bounding box or other shape
- Sort by distance
- Score/boost by distance
NOTE: Unless otherwise specified, all units of distance are kilometers and points are in degrees of latitude,longitude.
QuickStart
If you haven't already, get Solr3.4 or a recent nightly build of Solr4.0, start the example server and index the example data as shown in the solr tutorial. With the Solr server running, you should be able to click on the example links and see real responses.
In the example data, certain documents have a field called "store" (with a fieldType named "location" implemented via LatLonType). Some of the points in the example data are:
<field name="store">45.17614,-93.87341</field> <!-- Buffalo store --> <field name="store">40.7143,-74.006</field> <!-- NYC store --> <field name="store">37.7752,-122.4232</field> <!-- San Francisco store -->
geofilt - The distance filter
Now let's assume that we are at 45.15,-93.85 (which happens to be 3.437 km from the Buffalo store). We can use a geofilt filter to find all products (documents in our index) with the field store within 5km of our position:
Sure enough, we find 8 products at the Buffalo store:
...
"response":{"numFound":8,"start":0,"docs":[
{
"name":"Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133",
"store":"45.17614,-93.87341"},
{
"name":"Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300",
"store":"45.17614,-93.87341"},
...
Spatial Query Parameters
The main spatial search related queries, geofilt, bbox, and geodist default to looking for normal request parameters, so any of pt, sfield, and dist may be factored out and only specified once in a request (even if multiple spatial queries are used).
Examples:
bbox - Bounding-box filter
Exact distance calculations can be somewhat expensive and it can often make sense to use a quick approximation instead. The bbox filter is guaranteed to encompass all of the points of interest, but it may also include other points that are slightly outside of the required distance. For our standard LatLonType, this is implemented as a bounding box - a box made up of a range of latitudes and longitudes that encompasses the circle of radius d (i.e. it will select the same or slightly more documents than geofilt will).
The parameters are exactly the same as geofilt, so the following request will still match everything in the Buffalo store:
Because the bounding box is less selective, if we change our distance to 3km it will still include the Buffalo store (which is actually 3.437 km away). If we used the more accurate geofilt at 3km, these documents would not match. There are many scenarios when the bounding box can make sense though - especially if you are sorting by some other criteria anyway, or sorting by distance itself.
Since the LatLonType field also supports field queries and range queries, one can manually create their own bounding box rather than using bbox:
geodist - The distance function
The geodist(param1,param2,param3) function supports (optional) parameters:
- param1: the sfield
- param2: the latitude (pt)
- param3: the longitude (pt)
geodist is a function query that yields the calculated distance. This gives the flexibility to do a number of interesting things, such as sorting by the distance (Solr can sort by any function query), or combining the distance with the relevancy score, such as boosting by the inverse of the distance.
Here's an example of sorting by distance ascending:
Or you could use the distance function as the main query (or part of it) to get the distance as the document score:
The geodist function can have the points specified as function arguments, or can default to looking at the pt and sfield global request parameters.
Or you could combine geodist() with geofilt (or bbox) to limit the results and sort them by distance (50km):
This returns the as the score - the closest distance for 2 points that the user wants to check near (Denver and San Francisco):
Or
In order to return the number of results that match using a facet:
Returning the distance
You can use the pseudo-field feature to return the distance along with the stored fields of each document by adding fl=geodist() to the request. Use an alias like fl=_dist_:geodist() to make the distance come back in the _dist_ pseudo-field instead. Here is an example of sorting by distance ascending and returning the distance for each document in _dist_.
As a temporary workaround for older Solr versions, it's possible to obtain distances by using geodist or geofilt as the only scoring part of the main query.
Other Use Cases
How to combine with a sub-query to expand results
It is possible to filter by other criteria with an OR clause. Here is an example that says return by Jacksonville, FL or within 50 km from 45.15,-93.85:
Note: you can't try this example with the example schema since the "state" and "city" fields haven't been defined.
How to facet by distance
Faceting by distance can be done using the frange QParser. Unfortunately, right now, it is a bit inefficient, but it likely will be fine in most situations. Note: frange is actually slower than geofilt.
How to boost closest results (with dismax)
It is possible also boost the query by closest results by combining bf with geodist():
Advanced Spatial Search
Solr also supports other spatial capabilities beyond just latitude and longitude. For example, a PointType can be used to represent a point in an n-dimensional space. This can be useful, for instance, for searching in CAD drawings or blueprints. Solr also supports other distance measures. See the FunctionQuery page for more information and look for hsin, ghhsin and others.
LatLonType
The LatLonType is the current default spatial field. Values for this type are of the form latitude,longitude, although behind the scenes, the latitude and longitude are indexed as separate numbers. Fields using LatLonType must be single valued (i.e. multiValued="false"). This field type does distance calculations based on Great Circle (haversine).
In addition to geofilt, geodist and bbox, the LatLonType supports field queries such as field:10,20 and range queries such as field:[10,20 TO 30,40].
Filtering Caveats
For the bbox filter, when the bounding box includes a pole, the LatLonType will switch from producing a bounding box to a "bounding bowl" (i.e. a spherical cap) whereby it will include all values that are North or South of the latitude of the would be bounding box (the lower left and the upper right) that is closer to the equator. In other words, we still calculate what the coordinates of the upper right corner and the lower left corner of the box would be just as in all other filtering cases, but we then take the corner that is closest to the equator (since it goes over the pole it may not be the lower left, despite the name) and do a latitude only filter. Obviously, this means there will be more matches than a pure bounding box match, but the query is much easier to construct and will likely be faster, too.
Spatial Options Under Development
SpatialSearchDev -- Covers things like Geohash (supports multivalue lat-lon points), other distance functions, etc.