Differences between revisions 15 and 16
Revision 15 as of 2011-06-28 05:49:32
Size: 4150
Editor: TimSmith
Comment: typo
Revision 16 as of 2013-11-15 18:41:58
Size: 4211
Editor: GehrigKunz
Comment: statcounter
Deletions are marked like this. Additions are marked like this.
Line 93: Line 93:

{{https://c.statcounter.com/9397521/0/fe557aad/1/|stats}}

Cassandra Use Cases

This summary of a mailing-list survey briefly describes how several organizations (Rackspace, Cisco, OneSpot, more) are using Cassandra: more detail in this mailing-list thread.

The below gives simple use patterns and example implementations in high-level code.

If you've got more simple examples along the lines of those below, please add them.


Twissandra, a Twitter clone using Cassandra

Available at twissandra.com.

A Simple Capped Log

Please help complete

* Adapt e.g. this redis implementation to Cassandra * This mailing list thread gives an overview for building a production-grade windowed time-series store in Cassandra.

Please help complete

A distributed Priority Job Queue

Please help complete

Use Cassandra to enqueue jobs with a priority and optional delay. At each request, the broker assigns the ready job with highest priority.

Consistent Vote Counting

From a conversation on the #cassandra IRC channel, here's a way to implement Consistent Vote Counting using Cassandra that doesn't depend on vector clocks or an atomic increment operation.

Uniq a large dataset using simple key-value columns

We have to batch-process a massive dataset with frequent duplicates that we'd like to skip.

Here is ruby code using Cassandra as a simple key-value store to skip duplicates. You can find a real working version in the Wukong example code -- it's used to batch process terabyte-scale data on a 30 machine cluster using Hadoop and Cassandra.

    class CassandraConditionalOutputter
      CASSANDRA_KEYSPACE = 'Foo'

      # Batch parse a raw stream into parsed objects. The parsed objects may have
      # many duplicates which we'd like to reject
      #
      # records respond to #key (only one record for the given key will be output)
      # and #timestamp (which can be say '0' if record has no meaningful timestamp)
      def process raw_records
        raw_records.parse do |record|
          if should_emit?(record)
            track! record
            puts   record
          end
        end
      end

      # Emit if record's key isn't already in the key column
      def should_emit? record
        key_cache.exists?(key_column, record.key)
      end

      # register key in the key_cache
      def track! record
        key_cache.insert(key_column, record.key, 't' => record.timestamp)
      end

      # nuke key from the key_cache
      def remove record
        key_cache.remove(key_column, record.key)
      end

      # The Cassandra keyspace for key lookup
      def key_cache
        @key_cache ||= Cassandra.new(CASSANDRA_KEYSPACE)
      end

      # Name the key column after class
      def key_column
        self.class.to_s+'Keys'
      end
    end

Simple time-series with roll-ups

Cloudkick implements time-series down at the second-level with roll-ups.

An implementation of some DBMS rules written in python using pycassa

We have created a DBMS layer that handles references to other columnfamilys (foreign keys), Automatic reverse linking. required fields in columnfamilys and datatypes (long and datetime). It wraps the get, get_range, insert, remove functions of pycassas columnfamilys. At this time it is limited to: on delete cascade and positive long numbers but this could change if there is enough interest. It suits our project.

ThomasBoose/dbms implementation

Based on this article

ThomasBoose/EERD model components to Cassandra Column family's

stats

UseCases (last edited 2013-11-15 18:41:58 by GehrigKunz)