Overview

This describes a proposal for creating from Apache Blur a distributed search/indexing platform on which Blur "classic" could be implemented.

In modern open source search platforms, we find Lucene at the very core and a monolithic application stack implemented on top of it handling the distributed indexing, searching, failures, features, etc. We suppose here that it would be helpful if an intermediate abstraction could be introduced providing the primitives for a distributed Lucene server on which specific search applications could be built. This document describes an approach for separating those concerns in Blur and re-implementing Blur classic on top of this new platform.

Motivation

We have a nice, incredibly scalable, search system why such a big change? It's a fair question, here are some thoughts:

Approach

The key to the approach is building a command execution framework, then transition the implementation of the Thrift server classes to utilize that framework. For example, the IndexServer might transition to be able to run generic IndexCommand's across its shards. This framework may provide:

This would allow someone to implement new features on top of the platform by implementing some sort of Command class, something like (should be read as pseudo-code really):

The point is that as a command implementor, you process a BlurIndex (which gives you full access to IndexReader/Searcher/Writers); define how results should be merged together; and how they should finally be returned.

Here's a potential client example:

  public static void main(String[] args) {
    BlurCommandServerInterface client = getClient("host:port");
    String query = "field:term";
    Long result = client.execute(new BlurCommand<Long, Long>(query) {
      @Override
      public Long execute(BlurIndex index) throws IOException {
        IndexSearcherClosable searcher = index.getIndexSearcher();
        String queryStr = getArgs(0);
        QueryParser parser = new QueryParser(Version.LUCENE_43, "", new StandardAnalyzer(Version.LUCENE_43));
        Query query;
        try {
          query = parser.parse(queryStr);
        } catch (ParseException e) {
          throw new IOException(e);
        }
        final AtomicLong count = new AtomicLong();
        searcher.search(query, new Collector() {

          @Override
          public void collect(int doc) throws IOException {
            count.incrementAndGet();
          }

          @Override
          public void setScorer(Scorer scorer) throws IOException {

          }

          @Override
          public void setNextReader(AtomicReaderContext context) throws IOException {

          }

          @Override
          public boolean acceptsDocsOutOfOrder() {
            return false;
          }
        });
        return count.get();
      }

      @Override
      public Long merge(Map<String, Long> results) throws IOException {
        long total = 0;
        for (Entry<String, Long> e : results.entrySet()) {
          total += e.getValue();
        }
        return total;
      }
    });
    System.out.println(result);
  }

BlurPlatform (last edited 2014-07-23 02:08:40 by AaronMcCurry)