update to point to BulkLoader
|Deletions are marked like this.||Additions are marked like this.|
|Line 19:||Line 19:|
Binary Memtable is the name of Cassandra's pre-1.0 bulk-load interface. It was deprecated in version 0.8 and removed entirely in 1.0. It was replaced by the BulkLoader tool.
It avoided several kinds of overhead associated with the normal Thrift API:
- Converting to Thrift from the internal structures and back
- Routing (copying) from a coordinator node to the replica nodes
- Writing to the commitlog
- Serializing the internal structures to on-disk format
The tradeoff you make is that it is considerably less convenient to use than Thrift:
You must use the StorageProxy API, only available as Java code
- You must pre-serialize the rows yourself
The rows you send are not live for querying until a flush occurs (either normally because the Binary Memtable fills up, or because you request one with nodetool)
- You must write an entire row at once
There is an example of using Hadoop to load data through the Binary Memtable interface at https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8/examples/bmt/ .