Introduction

Cassandra is an advanced topic, and while work is always underway to make things easier, it can still be daunting to get up and running for the first time. This document aims to provide a few easy to follow steps to take the first-time user from installation, to an operational Cassandra cluster.

Step 1: Picking a version

At any given time, there are a number of different versions available for install:

  1. Stable releases
    • Cassandra stable releases are well tested and reasonably free of serious problems, (or at least the problems are known and well documented). If you are setting up a production environment, a stable release is what you want.

      Download links for the latest stable release can always be found on the website.

  2. Betas and release candidates
    • Betas are prototype releases considered ready for user testing, and release candidates have the potential to become the next stable release. These releases represent the state-of-the-art so are often the best place to start, and since APIs and on-disk storage formats can change between major versions this can also save you from an upgrade. The testing and feedback is also highly appreciated.
  3. Nightly builds
    • Nightly builds represent the current state of development as of the time of the build. They contain all of the previous days new features, fixes, and newly introduced bugs. The only guarantee they come with is that they successfully build and the unit tests pass. Nightly builds are a handy way of testing recent changes, or accessing the latest features and fixes not found in beta or release candidates, but there is some risk of them being buggy.

      The most recent nightly build can be downloaded here.

  4. Subversion
    • Cassandra's subversion repository is where all active development takes place. Anyone interested in contributing to the project should use a checkout of trunk. If you do run from subversion, be sure to update frequently, and subscribe to the mailing list to stay abreast of the latest developments.

      Instructions for checking out the source code can always be found on the website.

Step 2: Running a single node

Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Since there isn't currently an installation method per se, the easiest solution is to simply run Cassandra from an extracted archive1 or SVN checkout (see: Picking a version). Also, unless you've downloaded a binary distribution, you'll need to compile the software by invoking ant from the top-level directory.

The distribution's sample configuration conf/storage-conf.xml contains reasonable defaults for single node operation, but you will need to make sure that the paths exist for CommitLogDirectory, DataFileDirectories, CalloutLocation, BootstrapFileDirectory, and StagingFileDirectory. Additionally, take a minute now to look over the logging configuration in conf/log4j.properties and make sure that directories exist for the configured log file(s) as well.

And now for the moment of truth, start up Cassandra by invoking bin/cassandra -f from the command line2. The service should start in the foreground and log gratuitously to standard-out. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then chances are you've succeeded. To be certain though, take some time to try out the examples in CassandraCli and ThriftInterface before moving on. Also, if you run into problems, Don't Panic, calmly proceed to If Something Goes Wrong.

Step 3: Running a cluster

Setting up a Cassandra cluster is almost as simple as repeating Step 2 for each node in your cluster. There are a few minor exceptions though.

Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.

In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and Thrift, (ListenAddress and ThriftAddress respectively). Use a ListenAddress that will be reachable from the ListenAddress used on all other nodes, and a ThriftAddress that will be accessible to clients.

Once everything is configured and the nodes are running, use the bin/nodeprobe utility to verify a properly connected cluster. For example:

eevans@achilles:~$ bin/nodeprobe -host 98.139.220.175 cluster
98.139.220.175:7001   up
98.139.169.152:7001   up
98.139.220.176:7001   up

Step 4: Write your application

Cassandra uses Thrift for it's external client facing API. Thrift supports a wide variety of languages so you can code your application to use Thrift directly, or use a high-level client where available. Be sure to read the documentation on the Thrift wiki, and checkout the Cassandra-specific examples in ClientExamples before getting started.

If Something Goes Wrong

If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need.

  1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem.
  2. Make sure debug logging is enabled (hint: conf/log4j.properties) and save a copy of the output.

  3. Search the mailing list archive and see if anyone has reported a similar problem and what, if any resolution they received.

  4. Ditto for the bug tracking system.

  5. See if you can put together a unit test, script, or application that reproduces the problem.

Finally, post a message with all relevant details to the list (subscription required), or hop onto IRC (network irc.freenode.net, channel #cassandra) and let us know.




Footnotes:

  1. Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details (1)

  2. To learn more about controlling the behavior of startup scripts, see RunningCassandra (2)

GettingStarted (last edited 2009-09-26 21:09:40 by EricEvans)