Cassandra is a partitioned row store, where rows are organized into tables with a required primary key.

The first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the PK. Other columns may be indexed independent of the PK.

This allows pervasive denormalization to "pre-build" resultsets at update time, rather than doing expensive joins across the cluster.

A blog post by committer Tyler Hobbs that gives Basic Rules of Cassandra Data Modeling.

DataStax has a free self-paced online course DS220: Data Modeling with Apache Cassandra.

A blog post by Sebastian Estevez describing a web based tool he created to help visualize data models (pre-3.0) as well as gives a tailored cassandra-stress configuration file to test the model. Using the Cassandra Data Modeler to Stress and Size Cassandra Instances

Patrick McFadin's data modeling series:

  1. The Data Model is Dead; Long live the Data Model: Video, Slides

  2. Become a Super Modeler: Video, Slides

  3. The World's Next Top Data Model: Video, Slides

  4. Apache Cassandra 2.0: Data Model on Fire: Video, Slides

  5. Real Data Models of Silicon Valley: Video, Slides


DataModel (last edited 2016-05-03 21:36:12 by jeremyhanna)