I have worked with the project for some time now, and I was doing some reflection on this statement and how profound Cassandra's design is. Amazon Dynamo's data model is a key value store, while Cassandra's data model is a ColumnFamily-based one. You can 'encapsulate' the difference like this:

Key value:Map<Object,Object>

Column FamilyMap<Object, Map<Object,Object>>

There are many successful tools that are key value stores, notably Memcache. Some NoSQL products take the next step and turn key value store into a distributed key value store. The ColumnFamily model is more complex with its larger signature Map<Object,Object>, but that provides greater flexibility and allows certain applications to be modelled that would be difficult to do with a pure key/value store.

This has many drawbacks. The first is you are going to need 3-4 client server exchanges to complete this, and the second is handling the situation where a client vanishes after acquiring the lock is a problem along with all the other distributed locking problems.

Vector Clocks are another system to handle this type of problem without the multiple round-trips. Vector clock systems are cool but the explanation/proof of how it works involves a bit more math then I am used to.

Let's take a look at how the ColumnFamily based model solves this problem:

Cassandra's use of ColumnFamily are a straight forward solution when compared to Vector Clocks, locking, or transactions. The application is responsible for creating a unique key for the map. In this example, we appended a timeUUID infront of the page name. This design does not need to "read before write" to accomplish an "append", nor does it need vector clocks on the server side to reconsile the data internally.

The structured log format using Memtables and SSTable allows for fast write and a system can merge on read. This can be a huge optimization for many applications. For example, at m6d we have our mapping servers writing URL information into something like the visit table I described above. If we had to 'read before write' or 'select before insert' we would need significantly more hardware, (if I had to guess I would say 2-3 times). Likewise we can add (or remove) Segments information to or from a user without ever having to issue a read.

Also for we could just use Cassandra as a key value store if we had no need for the map. In the example below 'k' is a static char is the only column in the column family.

set Visists['bob']['k']='index.html,page2.html,page3'

This was a contrived example, but many applications can be fit well into the ColumnFamily model. Time series is one such example, lists inside a key, sets inside a key, different structure types under the same key. The key being a map rather then just a single opaque entity is very powerful and practical. The tunable consistency model, distributed design, and column family based approach are an optimal tool for many applications.