Cascading is an alternative API to Hadoop MapReduce. Under the covers it uses MapReduce during execution, but during development, users don't have to think in MapReduce to create solutions for execution on Hadoop.

Cascading now has support for reading and writing data to and from a HBase cluster.

Detailed information and access to the source code can be found on the Cascading Modules page. Cascading 1.0.1 is required.

Here is a simple example showing how to "sink" data into an HBase cluster. Note the exact same "hBaseTap" instance can be used to "source" data as well (as shown in the unit tests). See the github repo, linked from the modules page, for more up-to-date API.

   1 // read data from the default filesystem
   2 // emits two fields: "offset" and "line"
   3 Tap source = new Hfs( new TextLine(), inputFileLhs );
   4 
   5 // store data in a HBase cluster
   6 // accepts fields "num", "lower", and "upper"
   7 // will automatically scope incoming fields to their proper familyname, "left" or "right"
   8 Fields keyFields = new Fields( "num" );
   9 String[] familyNames = {"left", "right"};
  10 Fields[] valueFields = new Fields[] {new Fields( "lower" ), new Fields( "upper" ) };
  11 Tap hBaseTap = new HBaseTap( "multitable", new HBaseScheme( keyFields, familyNames, valueFields ), SinkMode.REPLACE );
  12 
  13 // a simple pipe assembly to parse the input into fields
  14 // a real app would likely chain multiple Pipes together for more complex processing
  15 Pipe parsePipe = new Each( "insert", new Fields( "line" ), new RegexSplitter( new Fields( "num", "lower", "upper" ), " " ) );
  16 
  17 // "plan" a cluster executable Flow
  18 // this connects the source Tap and hBaseTap (the sink Tap) to the parsePipe
  19 Flow parseFlow = new FlowConnector( properties ).connect( source, hBaseTap, parsePipe );
  20 
  21 // start the flow, and block until complete
  22 parseFlow.complete();
  23 
  24 // open an iterator on the HBase table we stuffed data into
  25 TupleEntryIterator iterator = parseFlow.openSink();
  26 
  27 while(iterator.hasNext())
  28   {
  29   // print out each tuple from HBase
  30   System.out.println( "iterator.next() = " + iterator.next() );
  31   }
  32 
  33 iterator.close();

Note the "hBaseTap" above can be used as both a sink and a source in a Flow. So another Flow could be created to process data stored in HBase.

Hbase/Cascading (last edited 2009-09-20 23:54:47 by localhost)