This is an overview of Cassandra architecture aimed at Cassandra users.

Developers should probably look at the Developers links on the wiki's front page

Information is mainly based on J Ellis OSCON 09 presentation


Scaling reads to a relational database is hard Scaling writes to a relational database is virtually impossible

... and when you do, it usually isn't relational anymore

Why Cassandra

The new face of data

CAP theorem

The CAP theorem (Brewer00) states that you have to pick two of Consistency, Availability, Partition tolerance. (You can't have the three at the same time).

Cassandra values Availability and Partitioning tolerance (AP). Tradeoffs between consistency and latency are tunable in Cassandra. You can get strong consistency with Cassandra (with an increased latency). But, you can't get row locking: that is a definite win for HBase.

Note: Hbase values Consistency and Partitioning tolerance (CP)

History and approaches

Two famous papers

Two approaches

Cassandra 10,000 ft summary

Cassandra highlights

p2p distribution model -- which drives the consistency model -- means there is no single point of failure.

Dynamo architecture & Lookup

Architecture details

Architecture layers

Core Layer

Middle Layer

Top Layer

Messaging service
Gossip Failure detection
Cluster state

Commit log

Hinted handoff
Read repair
Admin tools


Any node Partitioner Commitlog, memtable SSTable Compaction Wait for W responses

Write model:

There are two write modes:

If node down, then write to another node with a hint saying where it should be written two. Harvester every 15 min goes through and find hints and moves the data to the appropriate node

Write path

At write time,

Write properties


Deletion marker (tombstone) necessary to suppress data in older SSTables, until compaction Read repair complicates things a little Eventually consistent complicates things more Solution: configurable delay before tombstone GC, after which tombstones are not repaired


Read path

Cassandra read properties

Consistency in a BASE world

If R=read replica count W=write replica count N=replication factor Q=QUORUM (Q = N / 2 + 1)

Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor).

Cassandra vs MySQL with 50GB of data



~300ms write

~0.12ms write

~350ms read

~15ms read