Federated Search Design
Follow the basic Lucene design for MultiSearcher/RemoteSearcher as a template.
SolrMultiSearcher would implement SolrSearchable via multiple SolrSearchers, implementing the logic of combining search results from multiple subsearchers. The implementation should be network friendly (no HitCollectors, avoid passing around DocSets/BitSets, etc).
Areas that will need change:
- Solr's caches don't contain enough info to merge search results from subsearchers
- optional global idf calculations
new style APIs geared toward faceted browsing (avoid instantiating DocSets... pass around symbolic sets)
How can High Availability be obtained on the query side?
- sub-searchers could be identified by VIPs (top-level-searcher would go through a load-balancer to access sub-searchers).
- could do it in code via HASolrMultiSearcher that takes a list of sub-servers for each
How should the collection be updated? It would be complex for the client to partition the data themselves, since they would have to ensure that a particular document always went to the same server. Although user partitioning should be possible, there should be an easier default.
A single master could partition the data into multiple local indicies... subsearchers would only pull the local index they are configured to have.
How to synchronize commits across subsearchers and top-level-searchers?