Differences between revisions 3 and 4
Revision 3 as of 2012-07-30 16:44:14
Size: 1062
Comment: update to point to BulkLoader
Revision 4 as of 2013-11-13 00:35:12
Size: 1123
Editor: 50
Comment: statcounter
Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:


Binary Memtable is the name of Cassandra's pre-1.0 bulk-load interface. It was deprecated in version 0.8 and removed entirely in 1.0. It was replaced by the BulkLoader tool.

It avoided several kinds of overhead associated with the normal Thrift API:

  • Converting to Thrift from the internal structures and back
  • Routing (copying) from a coordinator node to the replica nodes
  • Writing to the commitlog
  • Serializing the internal structures to on-disk format

The tradeoff you make is that it is considerably less convenient to use than Thrift:

  • You must use the StorageProxy API, only available as Java code

  • You must pre-serialize the rows yourself
  • The rows you send are not live for querying until a flush occurs (either normally because the Binary Memtable fills up, or because you request one with nodetool)

  • You must write an entire row at once

There is an example of using Hadoop to load data through the Binary Memtable interface at https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8/examples/bmt/ .


BinaryMemtable (last edited 2013-11-13 00:35:12 by 50)