Cassandra 1.0 incorporates several improvements to how Cassandra's storage engine manages memory and disk space, bringing better performance and addressing some of the most common pain points for Cassandra administration.

Off-heap row cache

Cassandra provides a built-in row cache for super-fast access to frequently requested data, competitive with standalone caching products but without the cache coherence problems that come from using a separate system, i.e., data in the cache becoming temporarily or even permanently out of sync with the database.

Cassandra 1.0 adds the ability to store cached rows in native memory, outside the Java heap. This results in both a smaller per-row memory footprint and reduced JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

This off-heap row cache debuted in 0.8 but it didn't become the default until 1.0. It requires the JNA library to be installed; otherwise, Cassandra will automatically fall back to the old on-heap cache provider. The Debian and RPM packages of Cassandra install JNA automatically, but if you are installing from a tarball or source, you should install JNA as well. (For licensing reasons, JNA can't be distributed as part of Cassandra itself.)

This was tolerable but clunky for small numbers of ColumnFamilies, but whenever additional ColumnFamilies were created, the settings on the existing ones needed to be adjusted to make room. Getting this wrong was the primary cause of OutOfMemoryErrors.

Cassandra 1.0 only uses one memtable setting: memtable_total_space_in_mb (found in cassandra.yaml), which defaults to 1/3 of your JVM heap. Cassandra manages this space across all your ColumnFamilies and flushes memtables to disk as needed. This has been tested to work across hundreds or even thousands of ColumnFamilies. (Do note that a minimum of 1MB per memtable is used by the per-memtable arena allocator also introduced in 1.0, which is worth keeping in mind if you are looking at going from thousands to tens of thousands of ColumnFamilies.)

memtable_total_space_in_mb was introduced in 0.8 and has proved successful enough that for 1.0 we've disabled the old per-ColumnFamily memtable_operations_in_millions and memtable_throughput_in_mb settings. For backwards compatibility, Cassandra still accepts those settings from applications, but they are ignored.

Cassandra 1.0 also introduces a global commitlog_total_space_in_mb, which replaces the old memtable_flush_after_mins per-ColumnFamily setting. The purpose here is to set a bound on how much data will need to be replayed on startup. Since there is a single commitlog per server, an infrequently-updated ColumnFamily could otherwise keep CommitLog segments around for an arbitrarily long time. commitlog_total_space_in_mb will cause any unflushed memtables in the oldest CommitLog segments to be written to disk when its threshold is exceeded, allowing those segments to be removed.

Faster disk space reclamation

In older versions, Cassandra cleaned up these data files when the JVM's garbage collection ran, which resulted in unpredictable reclaiming of space. Cassandra 1.0 provides behavior that matches what operators would naturally expect, and avoids ugly hacks like forcing a GC cycle when disk space is low, which Cassandra would do automatically as a failsafe.

Apache license policy (http://www.apache.org/legal/3party.html) says, “The LGPL v2.1 is ineligible from being a Category B license (a category that includes the MPL, CPL, EPL, and CDDL) primarily due to the restrictions it places on larger works, violating the third license criterion. Therefore, LGPL v2.1-licensed works must not be included in Apache products, although they may be listed as system requirements or distributed elsewhere as optional works.”

This isn’t a great troubleshooting forum — I’d ask on the Cassandra user mailing list. Off the top of my head, that’s an odd error to get because FreeableMemory is a Cassandra class, so it should never be missing.