Spatial Search (docs + features under development).

Examine SpatialSearch prior to this page if you have not already.

Solr also supports other spatial capabilities beyond just latitude and longitude. For example, a PointType can be used to represent a point in an n-dimensional space. This can be useful, for instance, for searching in CAD drawings or blueprints. Solr also supports other distance measures. See the FunctionQuery page for more information and look for hsin, ghhsin and others.

Lucene 4's new spatial module

Lucene 4 has a new spatial module that replaces the older one before it. The Solr adapters for it are documented here: SolrAdaptersForLuceneSpatial4

Field Types

PointType

<fieldType name="location" class="solr.PointType" dimension="2" subFieldSuffix="_d"/>

Example

<fieldType name="latLon" class="solr.LatLonType" subFieldSuffix="_latLon"/>
...
<field name="store_lat_lon" type="latLon" indexed="true" stored="true"/>
...
<dynamicField name="*_latLon" type="double" indexed="true" stored="false" multiValued="true"/>

Geohash

A geohash is a way of encoding lat/lon into a single field as a String. For further information about the details of the encoding see http://www.geohash.org and http://en.wikipedia.org/wiki/Geohash

Solr includes a the field type "solr.GeoHashField" but it unfortunately doesn't realize any of the intrinsic properties of the geohash to its advantage. You shouldn't use it. Instead, check out http://wiki.apache.org/solr/SpatialSearch#SOLR-2155. The main feature is multi-valued field support.

Example

<fieldtype name="geohash" class="solr.GeoHashField"/>
...
<field name="store_hash" type="geohash" indexed="true" stored="false"/>

Indexing

Indexing is handled by the various FieldType instances in the schema. At the most basic, the user can represent their own spatial data using ints, floats or doubles. Beyond that, the PointType, GeoHashField and LatLonType can be used to index spatial information automatically.

When indexing, the format is something like:

<field name="store_lat_lon">12.34,-123.45</field>

(It can vary based on the number of values. When using a LatLonType or a GeoHashField, it is always latitude, then longitude.

Filtering

There are several different ways to filter in spatial search:

  1. By Range Query, as in fq=lat:[-23.0 TO -79.5] AND lon:[56.3 TO 60.3] -- Already implemented

  2. By the Spatial Filter QParser (!SpatialQParser) - e.g. {!sfilt fl=location}&pt=49.32,-79.0&d=20

  3. Using the "frange" QParser, as in fq={!frange l=0 u=400}hsin(0.57, -1.3, lat_rad, lon_rad, 3963.205)

In practice, for those using Solr's field types above, the Spatial Filter !QParser will automatically make the correct decision about how best to filter. If an application needs a specific type of filtering for performance or other needs, the best bet is to extend the FieldType in question with your own needs.

Spatial Filter QParser

See https://issues.apache.org/jira/browse/SOLR-1568.

NOTE: Depending on the FieldType, different calculations for distance will be applied. For instance, the PointType uses a rectangular coordinate system and uses the Euclidean distance while LatLonType uses Haversine by default.

See SpatialFilterTest for examples of the various points.

The following parameters are supported:

Parameter

Description

Example

pt

The Point to use as the center of the filter. Specified as a comma separated list of doubles. If using the LatLonType, then it is lat,lon.

&pt=33.4,29.0 &pt=27.3,83.9,10.0,5.5

d

The distance from the point to the outer edge of whatever is being used to filter on (bounding box, pure distance, something else). Must be greater than or equal to 0

&d=10.0

sphere_radius

The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.lucene.spatial.DistanceUtils.EARTH_MEAN_RADIUS_KM) which is set to 6371.009. Most applications will not need to set this.

&sphere_radius=10.3

meas

NOTE: This value is experimental and subject to removal. Most applications will not need to change the measure. The FieldTypes usually make the proper choice given the data stored. The distance measure to use when calculating distance. The default is dependent on the FieldType. Supported values are: 1. hsin - The haversine 2. 0, 1, 2, ... INF for the appropriate p-norm (2 is the Euclidean Distance)

&meas=hsin.

For LatLonType, the sfilt command calculates a bounding box by calculating the East and West Longitudes and the North and South Latitudes of a box that transcribes the circle with radius d (using hsin). There are other ways that this can be implemented by overriding the createSpatialQuery method on LatLonType.

For PointType, the bounding box is calculated by standard rectangular coordinate system measures.

SOLR-2155 GeoHash Plugin

See http://wiki.apache.org/solr/SpatialSearch#SOLR-2155

Sorting

https://issues.apache.org/jira/browse/SOLR-1297 added the ability to sort by function, so sorting by distance is now simply a matter of sorting using the appropriate distance function, just like boosting.

Scoring

Scoring by distance works just like any other FunctionQuery. See the distance methods on the FunctionQuery page for examples and method signatures.

Query Parsing

<!> TODO <!>

https://issues.apache.org/jira/browse/SOLR-1578 See https://issues.apache.org/jira/browse/SOLR-1568

Other Caveats

Unless otherwise specified, all units are kilometers.

Known Issues

See https://issues.apache.org/jira/browse/SOLR-773 for tracking

Useful References

  1. http://www.movable-type.co.uk/scripts/latlong.html

  2. http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html

  3. http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

SpatialSearchDev (last edited 2012-06-28 17:41:42 by DavidSmiley)