Differences between revisions 5 and 6
Revision 5 as of 2013-03-11 20:15:15
Size: 3058
Editor: JonChambers
Comment: Minor typo fix.
Revision 6 as of 2014-03-03 18:09:10
Size: 3365
Editor: DavidSmiley
Comment: Old rect syntax is deprecated
Deletions are marked like this. Additions are marked like this.
Line 23: Line 23:
 * worldBounds: The left two numbers are the starting epoch values (usually 0) and the end two are the maximum values. "minX minY maxX maxY" order. The min and maximums should be set to the same in the X & Y dimensions. I'm tempted to suggest using a power of two large enough for your max value, particularly if you experience any small precision problems. But I'm unaware of problems so you don't have to do this.  * worldBounds: The left two numbers are the starting epoch values (usually 0) and the end two are the maximum values. "minX minY maxX maxY" order. The min and maximums should be set to the same in the X & Y dimensions. I'm tempted to suggest using a power of two large enough for your max value, particularly if you experience any small precision problems. But I'm unaware of problems so you don't have to do this.  The syntax here will eventually be unsupported in Solr 5, where you'll have to use the ENVELOPE WKT style syntax (supported in Solr 4.7 and beyond).
Line 39: Line 39:
Now for queries, look at the examples on Hossman's slides. '''However''' to avoid edge cases, you should slightly buffer the query shapes -- the edges other than the minimum or maximum. One example query given was {{{Intersects(0 9 8 365)}}}  but this should instead become {{{Intersects(0 8.5 8.5 365)}}} Math: 9 - 0.5, 8 + 0.5 Now for queries, look at the examples on Hossman's slides. '''However''' to avoid edge cases, you should slightly buffer the query shapes -- the edges other than the minimum or maximum. In addition, the syntax used is deprecated; use the rectangle range query style instead. One example query given was {{{Intersects(0 9 8 365)}}}. In rectangle range query format, this is {{{["0 9" TO "8 365"]}}}. '''But we need to buffer it:''' {{{["0 8.5" TO "8.5 365"]}}} Math: 9 - 0.5, 8 + 0.5

If you need to index multi-value time durations (or any numeric ranges) you may be struggling with how to do this in Solr and search on it properly. It basically isn't, or should I say, wasn't possible until Solr 4's new spatial support came along, which supports multi-value spatial fields. Usually, "spatial" is nearly synonymous with "geospatial" but it can be used for other purposes like this too.

First, read Chris Hostetter (aka Hossman)'s illustrated slides from a Solr meetup: Spatial Search Tricks for People Who Don't Have Spatial Data.

Configuration

However, don't use the field configuration as given in that presentation, not to mention there are some tweaks to be done to the queries to avert edge cases.

Here is an example Solr fieldType configuration that may only require some small changes for your data:

<fieldType name="days_of_year"
           class="solr.SpatialRecursivePrefixTreeFieldType"
           geo="false"
           worldBounds="0 0 365 365"
           distErrPct="0"
           maxDistErr="1"
           units="degrees"
        />

Some explanation:

  • worldBounds: The left two numbers are the starting epoch values (usually 0) and the end two are the maximum values. "minX minY maxX maxY" order. The min and maximums should be set to the same in the X & Y dimensions. I'm tempted to suggest using a power of two large enough for your max value, particularly if you experience any small precision problems. But I'm unaware of problems so you don't have to do this. The syntax here will eventually be unsupported in Solr 5, where you'll have to use the ENVELOPE WKT style syntax (supported in Solr 4.7 and beyond).

  • distErrPct: This use of spatial generally implies precise results, not approximate results. Setting distErrPct to 0 means no approximation.
  • maxDistErr: This is basically the smallest indivisible value. This use of spatial should generally use the integer range to keep things simpler, so use "1".

Indexing

Use "x y" (x space y) order for the points:

<doc>
  ...
  <field name="shift">1 3</field>
  ...

Now for queries, look at the examples on Hossman's slides. However to avoid edge cases, you should slightly buffer the query shapes -- the edges other than the minimum or maximum. In addition, the syntax used is deprecated; use the rectangle range query style instead. One example query given was Intersects(0 9 8 365). In rectangle range query format, this is ["0 9" TO "8 365"]. But we need to buffer it: ["0 8.5" TO "8.5 365"] Math: 9 - 0.5, 8 + 0.5

Limitations

It's not realistic to configure the max values in worldBounds to be a huge number (say Long.MAX_VALUE 264. Maybe as high as perhaps 250)?

Credit

This durration modeling as coordinates idea originated in a solr-user@lucene thread. See David's initial response to Geert-Jan's question and the subsequent followup about using different rectangle intersections.

SpatialForTimeDurations (last edited 2014-03-03 18:09:10 by DavidSmiley)