This is an overview of Cassandra architecture aimed at Cassandra users.

Developers should probably look at the Developers links on the wiki's front page

Information is mainly based on J Ellis OSCON 09 presentation


Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible

... and when you do, it usually isn't relational anymore

* The new face of data

Scale out, not up Online load balancing, cluster growth Flexible schema Key-oriented queries CAP-aware

Pick two of Consistency, Availability, Partition tolerance

Two famous papers

Two approaches

10,000 ft summary

Cassandra highlights

Dynamo architecture & Lookup

Architecture details

O(1) node lookup Explicit replication Eventually consistent

Architecture layers Messaging service Gossip Failure detection Cluster state Partitioner Replication Commit log Memtable SSTable Indexes Compaction Tombstones Hinted handoff Read repair Bootstrap Monitoring Admin tools


Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses

Memtable / SSTable

Disk Commit log

SSTable format

Key / data

SSTable Indexes

Bloom filter Key Column

(Similar to Hadoop MapFile / Tfile)


Merge keys Combine columns Discard tombstones


Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction Read repair complicates things a little Eventually consistent complicates things more Solution: configurable delay before tombstone GC, after which tombstones are not repaired

Cassandra write properties

No reads No seeks Fast Atomic within ColumnFamily Always writable

Read path

Any node Partitioner Wait for R responses Wait for N ­ R responses in the background and perform read repair

Cassandra read properties

Read multiple SSTables Slower than writes (but still fast) Seeks can be mitigated with more RAM Scales to billions of rows

Consistency in a BASE world

If W + R > N, you will have consistency W=1, R=N W=N, R=1 W=Q, R=Q where Q = N / 2 + 1

vs MySQL with 50GB of data


~300ms write ~350ms read ~0.12ms write ~15ms read