Differences between revisions 92 and 93
Revision 92 as of 2013-08-30 03:08:16
Size: 10398
Comment: update for cqlsh
Revision 93 as of 2013-08-30 13:06:31
Size: 10400
Deletions are marked like this. Additions are marked like this.
Line 131: Line 131:
Once everything is configured and the nodes are running, use the `bin/nodetool ring` utility to verify a properly connected cluster. For example: Once everything is configured and the nodes are running, use the `bin/nodetool status` utility to verify a properly connected cluster. For example:

Cassandra documentation from DataStax

DataStax's latest Cassandra documentation covers topics from installation to troubleshooting, including a Quick Start Guide. Documentation for older releases is also available.


This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Step 0: Prerequisites and Connecting to the Community

Cassandra requires the most stable version of Java 1.6 you can deploy, preferably the Oracle/Sun JVM. Cassandra also runs on the IBM JVM, and should run on jrockit as well.

  • Note for OS X users:

    Some people running OS X have trouble getting Java 6 to work. If you've kept up with Apple's updates, Java 6 should already be installed (it comes in Mac OS X 10.5 Update 1). Unfortunately, Apple does not default to using it. What you have to do is change your JAVA_HOME environment setting to /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home and add /System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin to the beginning of your PATH.

The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list (subscription required) and participate in the #cassandra channel on IRC.

Step 1: Download Cassandra

  • Download links for the latest stable release can always be found on the website.

  • Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details.

  • Users of RPM-based distributions can get packages from Datastax.

  • If you are interested in building Cassandra from source, please refer to How to Build page.

For more details about misc builds, please refer to Cassandra versions and builds page.

Step 2: Basic Configuration

The Cassandra configuration files can be found in the conf directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in /etc/cassandra.

Step 2.1: Directories Used by Cassandra

If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from conf/cassandra.yaml: data_file_directories (/var/lib/cassandra/data), commitlog_directory (/var/lib/cassandra/commitlog), and saved_caches_directory (/var/lib/cassandra/saved_caches). Make sure these directories exist and can be written to.

By default, Cassandra will write its logs in /var/log/cassandra/. Make sure this directory exists and is writeable, or change this line in conf/log4j-server.properies:


JVM-level settings such as heap size can be set in conf/cassandra-env.sh.

Step 3: Start Cassandra

And now for the moment of truth, start up Cassandra by invoking 'bin/cassandra -f' from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then everything should be working.

Press "Control-C" to stop Cassandra.

If you start up Cassandra without the "-f" option, it will run in the background. You can stop the process by killing it, using 'pkill -f CassandraDaemon', for example.

  • Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking bin/cassandra -f with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the JAVA_HOME environment variable or adding any directory to your PATH. On Linux just make sure you have a working Java JDK package installed such as the openjdk-6-jdk on Ubuntu Lucid Lynx.

Step 4: Using cqlsh

bin/cqlsh is an interactive command line interface for Cassandra. You can define the schema and interact with data using it. Run the following command to connect to your local Cassandra instance:

$ bin/cqlsh

You should see the following prompt, if successful:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.

For clarity, we will omit the cqlsh prompt in the following examples.

You can access the online help with 'help;' command. Commands are terminated with a semicolon (';') in cqlsh.

First, create a keyspace -- a namespace of tables.

WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Second, authenticate to the new keyspace:

USE mykeyspace;

Third, create a users table:

  user_id int PRIMARY KEY,
  fname text,
  lname text

Now you can store data into users:

INSERT INTO users (user_id,  fname, lname)
  VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1746, 'john', 'smith');

Now let's fetch the data you inserted:

SELECT * FROM users;

You should see output reflecting your new rows:

 user_id | fname | lname
    1745 |  john | smith
    1744 |  john |   doe
    1746 |  john | smith

You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows:

CREATE INDEX ON users (lname);

SELECT * FROM users WHERE lname = 'smith';

 user_id | fname | lname
    1745 |  john | smith
    1746 |  john | smith

Configuring Multinode Clusters

Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster.

Setting up a Cassandra cluster is almost as simple as repeating the above procedures for each node in your cluster. There are a few minor exceptions though.

Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.

In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, (listen_address and rpc_address respectively). Use a 'listen_address that will be reachable from the listen_address used on all other nodes, and a rpc_address` that will be accessible to clients.

Once everything is configured and the nodes are running, use the bin/nodetool status utility to verify a properly connected cluster. For example:

eevans@achilles:‾$ bin/nodetool -host -p 7199 status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns   Host ID                               Rack
UN  30.99 KB   256     32.4%  92b20e08-9ddd-4f55-9173-8516e74d27f5  rack1
UN  31 KB      256     31.5%  b9616658-c744-48fb-b64f-83f96b007d93  rack1
UN  30.96 KB   256     36.1%  f7a08973-85bd-460f-8176-d6f9df8c23f4  rack1

Advanced cluster management is described in Operations.

If you don't yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with ccm (Cassandra Cluster Manager).

For more details about configuring multi node cluster, please refer to MultinodeCluster.

Write your application

Review the resources on DataModeling. The full CQL documentation is here.

DataStax sponsors development of the CQL drivers at https://github.com/datastax. The full list of CQL drivers is on the ClientOptions page.

If Something Goes Wrong

If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need.

  1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem.
  2. Make sure debug logging is enabled (hint: conf/log4j.properties) and save a copy of the output.

  3. Search the mailing list archive and see if anyone has reported a similar problem and what, if any resolution they received.

  4. Ditto for the bug tracking system.

  5. See if you can put together a unit test, script, or application that reproduces the problem.

Finally, post a message with all relevant details to the list (subscription required), or hop onto IRC (network irc.freenode.net, channel #cassandra) and let us know.


  1. To learn more about controlling the behavior of startup scripts, see RunningCassandra. (1)

GettingStarted (last edited 2016-08-10 22:57:22 by JonathanEllis)