Differences between revisions 1 and 2
Revision 1 as of 2014-07-22 16:51:09
Size: 2280
Editor: TimWilliams
Comment:
Revision 2 as of 2014-07-23 02:08:40
Size: 3946
Editor: AaronMcCurry
Comment:
Deletions are marked like this. Additions are marked like this.
Line 39: Line 39:

Here's a potential client example:
{{{
  public static void main(String[] args) {
    BlurCommandServerInterface client = getClient("host:port");
    String query = "field:term";
    Long result = client.execute(new BlurCommand<Long, Long>(query) {
      @Override
      public Long execute(BlurIndex index) throws IOException {
        IndexSearcherClosable searcher = index.getIndexSearcher();
        String queryStr = getArgs(0);
        QueryParser parser = new QueryParser(Version.LUCENE_43, "", new StandardAnalyzer(Version.LUCENE_43));
        Query query;
        try {
          query = parser.parse(queryStr);
        } catch (ParseException e) {
          throw new IOException(e);
        }
        final AtomicLong count = new AtomicLong();
        searcher.search(query, new Collector() {

          @Override
          public void collect(int doc) throws IOException {
            count.incrementAndGet();
          }

          @Override
          public void setScorer(Scorer scorer) throws IOException {

          }

          @Override
          public void setNextReader(AtomicReaderContext context) throws IOException {

          }

          @Override
          public boolean acceptsDocsOutOfOrder() {
            return false;
          }
        });
        return count.get();
      }

      @Override
      public Long merge(Map<String, Long> results) throws IOException {
        long total = 0;
        for (Entry<String, Long> e : results.entrySet()) {
          total += e.getValue();
        }
        return total;
      }
    });
    System.out.println(result);
  }

}}}

Overview

This describes a proposal for creating from Apache Blur a distributed search/indexing platform on which Blur "classic" could be implemented.

In modern open source search platforms, we find Lucene at the very core and a monolithic application stack implemented on top of it handling the distributed indexing, searching, failures, features, etc. We suppose here that it would be helpful if an intermediate abstraction could be introduced providing the primitives for a distributed Lucene server on which specific search applications could be built. This document describes an approach for separating those concerns in Blur and re-implementing Blur classic on top of this new platform.

Motivation

We have a nice, incredibly scalable, search system why such a big change? It's a fair question, here are some thoughts:

  • To allow for indexing/searching based on other/new data models (e.g. more than just the Row/Record constructs).
  • Allow implementations to build whole new APIs given direct access to the Lucene primitives.
  • Allows flexibility to build totally custom applications.

Approach

The key to the approach is building a command execution framework, then transition the implementation of the Thrift server classes to utilize that framework. For example, the IndexServer might transition to be able to run generic IndexCommand's across its shards. This framework may provide:

  • Command preemption.
  • Command cancelling.
  • Full status information.
  • Distributed traceability.
  • Metering, time/memory/etc

This would allow someone to implement new features on top of the platform by implementing some sort of Command class, something like (should be read as pseudo-code really):

  •   public abstract class IndexCommand<T> {
        private String _table;
      
        public abstract <T> T process(BlurIndex index);
        public abstract void merge(T partial);
        public abstract IndexResponse<T> terminate();
      
        public String getTable() {
          return _table;
        }
      }

The point is that as a command implementor, you process a BlurIndex (which gives you full access to IndexReader/Searcher/Writers); define how results should be merged together; and how they should finally be returned.

Here's a potential client example:

  public static void main(String[] args) {
    BlurCommandServerInterface client = getClient("host:port");
    String query = "field:term";
    Long result = client.execute(new BlurCommand<Long, Long>(query) {
      @Override
      public Long execute(BlurIndex index) throws IOException {
        IndexSearcherClosable searcher = index.getIndexSearcher();
        String queryStr = getArgs(0);
        QueryParser parser = new QueryParser(Version.LUCENE_43, "", new StandardAnalyzer(Version.LUCENE_43));
        Query query;
        try {
          query = parser.parse(queryStr);
        } catch (ParseException e) {
          throw new IOException(e);
        }
        final AtomicLong count = new AtomicLong();
        searcher.search(query, new Collector() {

          @Override
          public void collect(int doc) throws IOException {
            count.incrementAndGet();
          }

          @Override
          public void setScorer(Scorer scorer) throws IOException {

          }

          @Override
          public void setNextReader(AtomicReaderContext context) throws IOException {

          }

          @Override
          public boolean acceptsDocsOutOfOrder() {
            return false;
          }
        });
        return count.get();
      }

      @Override
      public Long merge(Map<String, Long> results) throws IOException {
        long total = 0;
        for (Entry<String, Long> e : results.entrySet()) {
          total += e.getValue();
        }
        return total;
      }
    });
    System.out.println(result);
  }

BlurPlatform (last edited 2014-07-23 02:08:40 by AaronMcCurry)