Trying Cassandra in a VM

Try Cassandra with these ten minute developer and admin walkthroughs.

Installing Cassandra Locally

This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Step 0: Prerequisites and Connecting to the Community

Cassandra 3.0+ requires the most stable version of Java 8 you can deploy, preferably the Oracle/Sun JVM. Cassandra also runs on OpenJDK, Zing, and the IBM JVM. (It will NOT run on JRockit, which is only compatible with Java 6.)

The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list (subscription required) and participate in the #cassandra channel on IRC.

Step 1: Download Cassandra

For more details about misc builds, please refer to Cassandra versions and builds page.

Step 2: Basic Configuration

The Cassandra configuration files can be found in the conf directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in /etc/cassandra.

Step 2.1: Directories Used by Cassandra

If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from conf/cassandra.yaml: data_file_directories (/var/lib/cassandra/data), commitlog_directory (/var/lib/cassandra/commitlog), and saved_caches_directory (/var/lib/cassandra/saved_caches). Make sure these directories exist and can be written to.

By default, Cassandra will write its logs in /var/log/cassandra/. Make sure this directory exists and is writeable, or change this line in conf/log4j-server.properies:


Note that in Cassandra 2.1+, the logger in use is logback, so change this logging directory in your conf/logback.xml file such as:


JVM-level settings such as heap size can be set in conf/

Step 3: Start Cassandra

And now for the moment of truth, start up Cassandra by invoking 'bin/cassandra -f' from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then everything should be working.

Press "Control-C" to stop Cassandra.

If you start up Cassandra without the "-f" option, it will run in the background. You can stop the process by killing it, using 'pkill -f CassandraDaemon', for example.

Step 4: Using cqlsh

bin/cqlsh is an interactive command line interface for Cassandra. cqlsh allows you to execute CQL (Cassandra Query Language) statements against Cassandra. Using CQL, you can define a schema, insert data, execute queries. Run the following command to connect to your local Cassandra instance with cqlsh:

$ bin/cqlsh

You should see the following prompt, if successful:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.

For clarity, we will omit the cqlsh prompt in the following examples.

You can access the online help with 'help;' command. Commands are terminated with a semicolon (';') in cqlsh.

First, create a keyspace -- a namespace of tables.

WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Second, authenticate to the new keyspace:

USE mykeyspace;

Third, create a users table:

  user_id int PRIMARY KEY,
  fname text,
  lname text

Now you can store data into users:

INSERT INTO users (user_id,  fname, lname)
  VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1746, 'john', 'smith');

Now let's fetch the data you inserted:

SELECT * FROM users;

You should see output reflecting your new rows:

 user_id | fname | lname
    1745 |  john | smith
    1744 |  john |   doe
    1746 |  john | smith

You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows:

CREATE INDEX ON users (lname);

SELECT * FROM users WHERE lname = 'smith';

 user_id | fname | lname
    1745 |  john | smith
    1746 |  john | smith

Write your Application

To connect to Cassandra, you'll need a database driver for your language of choice. A full list of CQL drivers can be found on the ClientOptions page.

When deciding how to design your schema and layout your data, it will be helpful to review the resources on how to DataModel.

You may also want to read the full CQL documentation.

Configuring Multinode Clusters

Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster.

Setting up a Cassandra cluster is almost as simple as repeating the above procedures for each node in your cluster. There are a few minor exceptions though.

Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.

In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, (listen_address and rpc_address respectively). Use a 'listen_address that will be reachable from the listen_address used on all other nodes, and a rpc_address` that will be accessible to clients.

Once everything is configured and the nodes are running, use the bin/nodetool status utility to verify a properly connected cluster. For example:

$ bin/nodetool -host -p 7199 status
Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns   Host ID                               Rack
UN  30.99 KB   256     32.4%  92b20e08-9ddd-4f55-9173-8516e74d27f5  rack1
UN  31 KB      256     31.5%  b9616658-c744-48fb-b64f-83f96b007d93  rack1
UN  30.96 KB   256     36.1%  f7a08973-85bd-460f-8176-d6f9df8c23f4  rack1

Advanced cluster management is described in Operations.

If you don't yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with ccm (Cassandra Cluster Manager).

If Something Goes Wrong

If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need.

  1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem.
  2. Make sure debug logging is enabled (hint: conf/ and save a copy of the output.

  3. Search the mailing list archive and see if anyone has reported a similar problem and what, if any resolution they received.

  4. Ditto for the bug tracking system.

  5. See if you can put together a unit test, script, or application that reproduces the problem.

Finally, post a message with all relevant details to the list (subscription required), or hop onto IRC (network, channel #cassandra) and let us know.



  1. To learn more about controlling the behavior of startup scripts, see RunningCassandra. (1)

GettingStarted (last edited 2016-06-12 13:33:04 by JonathanEllis)