Check the SSTable counts in cfstats. If the count
is continually growing, the cluster's IO capacity is not enough to handle
the write load it is receiving. Reads have slowed down because the data
is fragmented across many SSTables and compaction is continually running
trying to reduce them. Adding more IO capacity, either via more machines
in the cluster, or faster drives such as SSDs, will be necessary to solve this.

If the SSTable count is relatively low (32 or less) then the amount of file
cache available per machine compared to the amount of data per machine needs to
be considered, as well as the application's read pattern. The amount of file
cache can be formulated as (TotalMemory – JVMHeapSize) and if the amount of
data is greater and the read pattern is approximately random, an equal ratio of
reads to the cache:data ratio will need to seek the disk. With spinning media,
this is a slow operation. You may be able to mitigate many of the seeks by
using a key cache of 100%, and a small amount of row cache (10000-20000) if you
have some 'hot' rows and they are not extremely large.

Check your system.log for messages from the GCInspector. If the GCInspector
is indicating that either the ParNew or ConcurrentMarkSweep collectors took
longer than 15 seconds, there is a very high probability that some portion of
the JVM is being swapped out by the OS. One way this might happen is if the
mmap DiskAccessMode is used without JNA support. The address space will be
exhausted by mmap, and the OS will decide to swap out some portion of the JVM
that isn't in use, but eventually the JVM will try to GC this space. Adding
the JNA libraries will solve this (they cannot be shipped with Cassandra due to
carrying a GPL license, but are freely available) or the DiskAccessMode can be
switched to mmap_index_only, which as the name implies will only mmap the
indicies, using much less address space. DataStax recommends that Cassandra
nodes disable swap entirely (sudoswapoff--all), since it is better to have the OS OutOfMemory (OOM)
killer kill the Java process entirely than it is to have the JVM buried in swap and
responding poorly.

If the GCInspector isn't reporting very long GC times, but is reporting moderate
times frequently (ConcurrentMarkSweep taking a few seconds very often) then it
is likely that the JVM is experiencing extreme GC pressure and will
eventually OOM. See the section below on OOM errors.

If you can run nodetool commands locally but not on other nodes in the ring, you may have a common JMX connection problem that is resolved by adding an entry like the following in <install_location>/conf/cassandra-env.sh on each node:

JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=<public name>"

If you still cannot run nodetool commands remotely after making this configuration change, do a full evaluation of your firewall and network security. The nodetool utility communciates through JMX on port 7199.

This is an indication that the ring is in a bad state. This can happen when there are token conflicts (for instance, when bootstrapping two nodes simultaneously with automatic token selection.) Unfortunately, the only way to resolve this is to do a full cluster restart; a rolling restart is insufficient since gossip from nodes with the bad state will repopulate it on newly booted nodes.

Java is not allowed to open enough file descriptors. Cassandra generally needs more than the default (1024) amount. To increase the number of file descriptors, change the security limits on your Cassandra nodes. For example, using the following commands:

Another, much less likely possibility, is a file descriptor leak in Cassandra. Run lsof-n|grepjava to check that the number of file descriptors opened by Java is reasonable and reports the error if the number is greater than a few thousand.

WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM).
This can result in part of the JVM being swapped out, especially with mmapped I/O enabled.
Increase RLIMIT_MEMLOCK or run Cassandra as root.

You can view the current limits using the ulimit-a command. Although limits can also be temporarily set using this command, DataStax recommends permanently changing the settings by adding the following entries to your /etc/security/limits.conf file:

The native library snappy-1.0.4.1-libsnappyjava.so for Snappy compression is included in the snappy-java-1.0.4.1.jar file. When the JVM initializes the JAR, the library is added to the default temp directory. If the default temp directory is mounted with a noexec option, it results in the above exception.

One solution is to specify a different temp directory that has already been mounted without the noexec option, as follows:

If you use the DSE/Cassandra command $_BIN/dsecassandra or $_BIN/cassandra, simply append the command line:

DSE: bin/dsecassandra-t-Dorg.xerial.snappy.tempdir=/path/to/newtmp

Cassandra: bin/cassandra-Dorg.xerial.snappy.tempdir=/path/to/newtmp

If starting from a package using servicedsestart or servicecassandrastart, add a system environment variable JVM_OPTS with the value:

JVM_OPTS=-Dorg.xerial.snappy.tempdir=/path/to/newtmp

The default cassandra-env.sh looks for the variable and appends to it when starting the JVM.