Differences between revisions 103 and 104
Revision 103 as of 2014-06-05 20:24:24
Size: 10626
Comment: add VM intro
Revision 104 as of 2014-10-30 02:53:11
Size: 10977
Editor: JonHaddad
Comment: updating instructions to no longer require root
Deletions are marked like this. Additions are marked like this.
Line 55: Line 55:
 . Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f` with root privileges. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.  . Cassandra Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking `bin/cassandra -f`. Since Cassandra 2.1, the tar.gz download has shipped with the log and data directories defaulting to the Cassandra directory. Versions prior defaulted to `/var/log/cassandra` and `/var/lib/cassandra/`. Due to this it is necessary to either start Cassandra with root privileges or change the `conf/cassandra.yaml` to use a directory owned by the current user. Snow Leopard ships with Java 1.6.0 and does not require changing the `JAVA_HOME` environment variable or adding any directory to your `PATH`. On Linux just make sure you have a working Java JDK package installed such as the `openjdk-6-jdk` on Ubuntu Lucid Lynx.
Line 177: Line 177:
 

Cassandra documentation from DataStax

DataStax's latest Cassandra documentation covers topics from installation to troubleshooting, including a Quick Start Guide. Documentation for older releases is also available.

Trying Cassandra in a VM

Try Cassandra with these ten minute developer and admin walkthroughs.

Installing Cassandra Locally

This document aims to provide a few easy to follow steps to take the first-time user from installation, to running single node Cassandra, and overview to configure multinode cluster. Cassandra is meant to run on a cluster of nodes, but will run equally well on a single machine. This is a handy way of getting familiar with the software while avoiding the complexities of a larger system.

Step 0: Prerequisites and Connecting to the Community

Cassandra requires the most stable version of Java 7 you can deploy, preferably the Oracle/Sun JVM. Cassandra also runs on OpenJDK and the IBM JVM. Because Cassandra requires Java 7, it will NOT run on JRockit.

The best way to ensure you always have up to date information on the project, releases, stability, bugs, and features is to subscribe to the users mailing list (subscription required) and participate in the #cassandra channel on IRC.

Step 1: Download Cassandra

  • Download links for the latest stable release can always be found on the website.

  • Users of Debian or Debian-based derivatives can install the latest stable release in package form, see DebianPackaging for details.

  • Users of RPM-based distributions can get packages from Datastax.

  • If you are interested in building Cassandra from source, please refer to How to Build page.

For more details about misc builds, please refer to Cassandra versions and builds page.

Step 2: Basic Configuration

The Cassandra configuration files can be found in the conf directory of binary and source distributions. If you have installed Cassandra from a deb or rpm package, the configuration files will be located in /etc/cassandra.

Step 2.1: Directories Used by Cassandra

If you've installed Cassandra with a deb or rpm package, the directories that Cassandra will use should already be created an have the correct permissions. Otherwise, you will want to check the following config settings from conf/cassandra.yaml: data_file_directories (/var/lib/cassandra/data), commitlog_directory (/var/lib/cassandra/commitlog), and saved_caches_directory (/var/lib/cassandra/saved_caches). Make sure these directories exist and can be written to.

By default, Cassandra will write its logs in /var/log/cassandra/. Make sure this directory exists and is writeable, or change this line in conf/log4j-server.properies:

log4j.appender.R.File=/var/log/cassandra/system.log

Note that in Cassandra 2.1+, the logger in use is logback, so change this logging directory in your conf/logback.xml file such as:

<file>/var/log/cassandra/system.log</file>

JVM-level settings such as heap size can be set in conf/cassandra-env.sh.

Step 3: Start Cassandra

And now for the moment of truth, start up Cassandra by invoking 'bin/cassandra -f' from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don't see messages with scary words like "error", or "fatal", or anything that looks like a Java stack trace, then everything should be working.

Press "Control-C" to stop Cassandra.

If you start up Cassandra without the "-f" option, it will run in the background. You can stop the process by killing it, using 'pkill -f CassandraDaemon', for example.

  • Cassandra Users of recent Linux distributions and Mac OS X Snow Leopard should be able to start up Cassandra simply by untarring and invoking bin/cassandra -f. Since Cassandra 2.1, the tar.gz download has shipped with the log and data directories defaulting to the Cassandra directory. Versions prior defaulted to /var/log/cassandra and /var/lib/cassandra/. Due to this it is necessary to either start Cassandra with root privileges or change the conf/cassandra.yaml to use a directory owned by the current user. Snow Leopard ships with Java 1.6.0 and does not require changing the JAVA_HOME environment variable or adding any directory to your PATH. On Linux just make sure you have a working Java JDK package installed such as the openjdk-6-jdk on Ubuntu Lucid Lynx.

Step 4: Using cqlsh

bin/cqlsh is an interactive command line interface for Cassandra. cqlsh allows you to execute CQL (Cassandra Query Language) statements against Cassandra. Using CQL, you can define a schema, insert data, execute queries. Run the following command to connect to your local Cassandra instance with cqlsh:

$ bin/cqlsh

You should see the following prompt, if successful:

Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.2 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.

For clarity, we will omit the cqlsh prompt in the following examples.

You can access the online help with 'help;' command. Commands are terminated with a semicolon (';') in cqlsh.

First, create a keyspace -- a namespace of tables.

CREATE KEYSPACE mykeyspace
WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Second, authenticate to the new keyspace:

USE mykeyspace;

Third, create a users table:

CREATE TABLE users (
  user_id int PRIMARY KEY,
  fname text,
  lname text
);

Now you can store data into users:

INSERT INTO users (user_id,  fname, lname)
  VALUES (1745, 'john', 'smith');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1744, 'john', 'doe');
INSERT INTO users (user_id,  fname, lname)
  VALUES (1746, 'john', 'smith');

Now let's fetch the data you inserted:

SELECT * FROM users;

You should see output reflecting your new rows:

 user_id | fname | lname
---------+-------+-------
    1745 |  john | smith
    1744 |  john |   doe
    1746 |  john | smith

You can retrieve data about users whose last name is smith by creating an index, then querying the table as follows:

CREATE INDEX ON users (lname);

SELECT * FROM users WHERE lname = 'smith';

 user_id | fname | lname
---------+-------+-------
    1745 |  john | smith
    1746 |  john | smith

Write your Application

To connect to Cassandra, you'll need a database driver for your language of choice. DataStax sponsors development of CQL drivers at https://github.com/datastax. A full list of CQL drivers can be found on the ClientOptions page.

When deciding how to design your schema and layout your data, it will be helpful to review the resources on how to DataModel.

You may also want to read the full CQL documentation.

Configuring Multinode Clusters

Now you have single working Cassandra node. It is a Cassandra cluster which has only one node. By adding more nodes, you can make it a multi node cluster.

Setting up a Cassandra cluster is almost as simple as repeating the above procedures for each node in your cluster. There are a few minor exceptions though.

Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here. Do make sure that each seed also knows of at least one other, remember, the goal is to avoid a chicken-and-egg scenario and provide an avenue for all nodes in the cluster to discover one another.

In addition to seeds, you'll also need to configure the IP interface to listen on for Gossip and CQL, (listen_address and rpc_address respectively). Use a 'listen_address that will be reachable from the listen_address used on all other nodes, and a rpc_address` that will be accessible to clients.

Once everything is configured and the nodes are running, use the bin/nodetool status utility to verify a properly connected cluster. For example:

$ bin/nodetool -host 192.168.0.10 -p 7199 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns   Host ID                               Rack
UN  127.0.0.3  30.99 KB   256     32.4%  92b20e08-9ddd-4f55-9173-8516e74d27f5  rack1
UN  127.0.0.2  31 KB      256     31.5%  b9616658-c744-48fb-b64f-83f96b007d93  rack1
UN  127.0.0.1  30.96 KB   256     36.1%  f7a08973-85bd-460f-8176-d6f9df8c23f4  rack1

Advanced cluster management is described in Operations.

If you don't yet have access to hardware for a real Cassandra cluster, you can manage local clusters easily with ccm (Cassandra Cluster Manager).

If Something Goes Wrong

If you followed the steps in this guide and failed to get up and running, we'd love to help. Here's what we need.

  1. If you are running anything other than a stable release, please upgrade first and see if you can still reproduce the problem.
  2. Make sure debug logging is enabled (hint: conf/log4j.properties) and save a copy of the output.

  3. Search the mailing list archive and see if anyone has reported a similar problem and what, if any resolution they received.

  4. Ditto for the bug tracking system.

  5. See if you can put together a unit test, script, or application that reproduces the problem.

Finally, post a message with all relevant details to the list (subscription required), or hop onto IRC (network irc.freenode.net, channel #cassandra) and let us know.




Footnotes:

stats

  1. To learn more about controlling the behavior of startup scripts, see RunningCassandra. (1)

GettingStarted (last edited 2014-10-30 02:53:11 by JonHaddad)