Differences between revisions 22 and 23
Revision 22 as of 2011-07-14 23:19:34
Size: 7301
Editor: defmikekoh
Comment:
Revision 23 as of 2013-09-06 02:29:36
Size: 7550
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
== This document applies to pre-1.0 versions of Cassandra. ==
See [[http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management|this article]] for how Cassandra 1.0 automatically manages memory for you.

This document applies to pre-1.0 versions of Cassandra.

See this article for how Cassandra 1.0 automatically manages memory for you.

Don't Touch that Dial

The settings described here should only be changed in the face of a quantifiable performance problem. They will affect the cluster quite differently for distinct use cases and workloads, and the defaults, though conservative, were well-chosen.

JVM Heap Size

By default, the cassandra startup scripts specifies a maximum JVM heap of -Xmx1G. Consider increasing this -- but gently! If Cassandra and other processes take up too much of the available ram, they'll force out the operating system's file buffers and caches. These are as important as the internal data structures for ensuring Cassandra performance.

It's much riskier to start tuning with this too high (a difficult-to-pinpoint malaise) than too low (easy to diagnose using JMX). Even for a high-end machine with say 48GB of ram, a 4GB heap size is reasonable initial guess -- the OS is much smarter than you think. For a rough rule of thumb, Cassandra's internal datastructures will require about memtable_throughput_in_mb * 3 * number of hot CFs + 1G + internal caches.

Also know that if you're running up against the heap limit under load that's probably a symptom of other problems. Diagnose those first.

Virtual Memory and Swap

On a dedicated cassandra machine, the best value for your swap settings is no swap at all -- it's better to have the OS kill the java process (taking the node down but leaving your monitoring, etc. up) than to have the system go into swap death (and become entirely unreachable).

Linux users should understand fully and then consider adjusting the system values for swappiness, overcommit_memory and overcommit_ratio.

Memtable Thresholds

When performing write operations, Cassandra stores values to column-family specific, in-memory data structures called Memtables. These Memtables are flushed to disk whenever one of the configurable thresholds is exceeded. The initial settings (64mb/0.3) are purposefully conservative, and proper tuning of these thresholds is important in making the most of available system memory, without bringing the node down for lack of memory.

Configuring Thresholds

Larger Memtables take memory away from caches: Since Memtables are storing actual column values, they consume at least as much memory as the size of data inserted. However, there is also overhead associated with the structures used to index this data. When the number of columns and rows is high compared to the size of values, this overhead can become quite significant, (possibly greater than the data itself). In other words, which threshold(s) to use, and what to set them to is not just a function of how much memory you have, but of how many column families, how many columns per column-family, and the size of values being stored.

Larger Memtables don't improve write performance: Increasing the memtable capacity will cause less-frequent flushes but doesn't improve write performance directly: writes go directly to memory regardless. (Actually, if your commitlog and sstables share a volume they might contend, so if at all possible, put them on separate volumes)

Larger memtables do absorb more overwrites: If your write load sees some rows written more often than others (eg upvotes of a front-page story) a larger memtable will absorb those overwrites, creating more efficient sstables and thus better read performance. If your write load is batch oriented or if you have a massive row set, rows are not likely to be rewritten for a long time, and so this benefit will pay a smaller dividend.

Larger memtables do lead to more effective compaction: Since compaction is tiered, large sstables are preferable: turning over tons of tiny memtables is bad. Again, this impacts read performance (by improving the overall io-contention weather), but not writes.

Listed below are the thresholds found in storage-conf.xml (or cassandra.yaml in 0.7+), along with a description.

MemtableThroughputInMB

As the name indicates, this sets the max size in megabytes that the Memtable will store before triggering a threshold violation and causing it to be flushed to disk. It corresponds to the size of the values inserted, (plus the size of the containing column).

If left unconfigured (missing from the config), this defaults to 128MB.

Note: This was referred to as MemtableSizeInMB in versons of Casandra before 0.6.0. In version 0.7b2+, the value will be applied on a per column-family basis.

MemtableOperationsInMillions

This directive sets a threshold on the number of columns stored.

Left unconfigured (missing from the config), this defaults to 0.1 (or 100,000 objects). The config file's inital setting of 0.3 (or 300,000 objects) is a conservative starting point.

Note: This was referred to as MemtableObjectCountInMillions in versons of Casandra before 0.6.0. In version 0.7b2+, the value will be applied on a per column-family basis.

Using Jconsole To Optimize Thresholds

Cassandra's column-family mbeans have a number of attributes that can prove invaluable in determining optimal thresholds. One way to access this instrumentation is by using Jconsole, a graphical monitoring and management application that ships with your JDK.

Launching Jconsole with no arguments will display the "New Connection" dialog box. If you are running Jconsole on the same machine that Cassandra is running on, then you can connect using the PID, otherwise you will need to connect remotely. The default startup scripts for Cassandra cause the VM to listen on port 8080 (7199 starting in v0.8.0-beta1) using the JVM option:

  • -Dcom.sun.management.jmxremote.port=8080

The remote JMX url is then:

service:jmx:rmi:///jndi/rmi://localhost:8080/jmxrmi

This is used internally by: bin/nodetool src/java/org/apache/cassandra/tools/nodetool.java

jconsole_connect.png

Once connected, select the MBeans tab, expand the org.apache.cassandra.db section, and finally one of your column families.

There are three interesting attributes here.

  1. MemtableColumnsCount, representing the total number of column entries in this table. If you store 100 rows that each have 100 columns, expect to see this value increase by 10,000. This attribute is useful in setting the MemtableOperationsInMillions threshold.

  2. MemtableDataSize, which is used to determine the total size of stored data. This is the sum of all the values stored and does not account for Memtable overhead, (i.e. it's not indicative of the actual memory used by the Memtable). Use this value when adjusting MemtableThroughputInMB.

  3. Finally there is MemtableSwitchCount which increases by one each time a column family flushes its Memtable to disk.

Note: You'll need to manually mash the Refresh button to update these values.

jconsole_attributes.png

It is also possible to schedule an immediate flush using the forceFlush() operation.

jconsole_operations.png

MemtableThresholds (last edited 2013-11-14 22:20:39 by GehrigKunz)