Differences between revisions 1 and 2
Revision 1 as of 2006-07-11 05:07:55
Size: 1734
Editor: YonikSeeley
Comment: a start on design for federated search
Revision 2 as of 2006-07-11 17:26:20
Size: 2175
Editor: YonikSeeley
Comment:
Deletions are marked like this. Additions are marked like this.
Line 10: Line 10:
   * could subclass DocList and add sort info, and cache that
   * could dynamically add the sort info if requested via the FieldCache... this would make Solr's result cache smaller.
   * probably want to re-use FieldDocSortedHitQueue, which means returning TopFieldDocs, or creating them on the fly from
     a DocList w/ field info
Line 14: Line 18:

Should this be more of a public API, or a private one?
For RMI, it should definitely be private...

Federated Search Design

Follow the basic Lucene design for MultiSearcher/RemoteSearcher as a template.

Areas that will need change:

  • Solr's caches don't contain enough info to merge search results from subsearchers
    • could subclass DocList and add sort info, and cache that

    • could dynamically add the sort info if requested via the FieldCache... this would make Solr's result cache smaller.

    • probably want to re-use FieldDocSortedHitQueue, which means returning TopFieldDocs, or creating them on the fly from

Network Transports

  • RMI
  • XML

Should this be more of a public API, or a private one? For RMI, it should definitely be private...

Misc:

  • optional global idf calculations
  • new style APIs geared toward faceted browsing (avoid instantiating DocSets... pass around symbolic sets)

High Availability

How can High Availability be obtained on the query side?

  • sub-searchers could be identified by VIPs (top-level-searcher would go through a load-balancer to access sub-searchers).
  • could do it in code via HASolrMultiSearcher that takes a list of sub-servers for each

Master

How should the collection be updated? It would be complex for the client to partition the data themselves, since they would have to ensure that a particular document always went to the same server. Although user partitioning should be possible, there should be an easier default.

Single Master

A single master could partition the data into multiple local indicies... subsearchers would only pull the local index they are configured to have.

Commits

How to synchronize commits across subsearchers and top-level-searchers?

FederatedSearch (last edited 2009-09-20 22:05:24 by localhost)