Differences between revisions 14 and 15
Revision 14 as of 2013-06-06 21:38:33
Size: 3821
Editor: HossMan
Comment: add section on limitations
Revision 15 as of 2013-12-10 14:41:29
Size: 4028
Editor: AndyLester
Comment: Fix some little typos
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

In many cases, documents have relationships between them and it is too expensive to denormalize them. Thus, a join operation is needed. Preserving the document relationship allows documents to be updated independently without having to reindex large numbers of denormalized documents.  
In many cases, documents have relationships between them and it is too expensive to denormalize them. Thus, a join operation is needed. Preserving the document relationship allows documents to be updated independently without having to reindex large numbers of denormalized documents.
Line 8: Line 7:
Joins are processed using Solr's LocalParams syntax. The query typically looks like: {{{q={!join from=manu_id_s to=id}ipod}}}
Line 9: Line 9:
Joins are processed using Solr's LocalParams syntax. The query typically looks like:
{{{q={!join from=manu_id_s to=id}ipod}}}

Thus, you need the join !QueryParser(Plugin) which is specified by the {{{ {!join} }}} syntax. Then, you need specify the foreign key relationship by giving the from and to fields to join on. 
Thus, you need the join !QueryParser(Plugin) which is specified by the {{{ {!join} }}} syntax. Then, you specify the foreign key relationship by giving the from and to fields to join on.
Line 15: Line 12:
Line 18: Line 14:
 * Find all product docs matching "ipod", then join them against (manufacturer) docs and return the list of manufactures that make those products
  * http://localhost:8983/solr/select?q={!join+from=manu_id_s+to=id}ipod
 * Find all product docs matching "ipod", then join them against (manufacturer) docs and return the list of manufacturers that make those products
  * http://localhost:8983/solr/select?q={!join+from=manu_id_s+to=id}ipod
Line 21: Line 17:
  * http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin   * http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin
Line 23: Line 19:
  * http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:%5B%2A+TO+12%5D   * [[http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:[*+TO+12]|http://localhost:8983/solr/select?q={!join+from=id+to=manu_id_s}compName_s:Belkin&fq=price:%5B%2A+TO+12%5D]]
Line 25: Line 21:
  * http://localhost:8983/solr/select?q=ipod&fl=*,score&sort=score+desc&fq={!join+from=id+to=manu_id_s}compName_s:%28Belkin%20Apple%29   * [[http://localhost:8983/solr/select?q=ipod&fl=*,score&sort=score+desc&fq={!join+from=id+to=manu_id_s}compName_s:(Belkin%20Apple)|http://localhost:8983/solr/select?q=ipod&fl=*,score&sort=score+desc&fq={!join+from=id+to=manu_id_s}compName_s:%28Belkin%20Apple%29]]
Line 28: Line 24:

For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query"
For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query".
Line 34: Line 29:
/solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvv  /solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvv
Line 36: Line 31:
Line 40: Line 34:
SELECT xxx, yyy  SELECT xxx, yyy
Line 44: Line 38:
Line 46: Line 39:
Line 52: Line 44:

<!> Solr4.0

Introduction

In many cases, documents have relationships between them and it is too expensive to denormalize them. Thus, a join operation is needed. Preserving the document relationship allows documents to be updated independently without having to reindex large numbers of denormalized documents.

Input Parameters

Joins are processed using Solr's LocalParams syntax. The query typically looks like: q={!join from=manu_id_s to=id}ipod

Thus, you need the join QueryParser(Plugin) which is specified by the  {!join}  syntax. Then, you specify the foreign key relationship by giving the from and to fields to join on.

Examples

In the example data, all documents have a unique "id" field, but documents modeling products also have a "manu_id_s" which is essentially a "foreign key" to the "id" of the associated manufacturer doc.

Compared To SQL

For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query".

This Solr request...

/solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvv

Is comparable to this SQL statement...

SELECT xxx, yyy
FROM collection1
WHERE outer_id IN (SELECT inner_id FROM collection1 where zzz = "vvv")

Limitations

  • Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents (ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents)
  • The Join query produces constant scores for all documents that match -- scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents
  • In a DistributedSearch environment, you can not Join across cores on multiple nodes. If however you have a custom sharding approach, you could join across cores on the same node.

Quick Start

/!\ NOTE: The described additions to the "browse" screen is currently dependent on SOLR-2502

  • Follow the Tutorial at http://lucene.apache.org/solr/tutorial.html to get setup

  • Point your browser at http://localhost:8983/solr/browse?&queryOpts=join

  • Fill in your query and the names of two fields to join on, for example From: manu_id and To: id (join between the manu_id on the products and the id on the manufacturers)
  • Submit -- Notice the results are of the manufacturers who make those items and not of the products themselves even though the match is on the products

Join (last edited 2013-12-10 14:41:29 by AndyLester)