Short notes and essays about stuff that interests me (mostly technical stuff).

Tuesday, March 2, 2010

Following links to Cassandra

The world of extreme ultra-high-end scalable systems begins with Google, and their well-known technologies: BigTable, GFS, Map/Reduce, Protocol Buffers, and so forth. But that world certainly doesn't end with Google; it's a big world, and there's lots of fascinating work being done at places like Amazon, Flickr, Yahoo, and more.

I'm still wrapping my head around these eventually-consistent, non-relational data stores; after all, I'm a relational DBMS guy from a long time back. The Cassandra papers are quite approachable, and give a lot of fascinating insight into the behavior of these systems:

we will focus on the core distributed systems techniques used in Cassandra: partitioning, replication, membership, failure handling, and scaling. All these modules work in synchrony to handle read/write requests. Typically a read/write request for a key gets routed to any node in the Cassandra cluster. The node then determines the replicas for this particular key. For writes, the system routes the requests to the replicas and waits for a quorum of replicas to acknowledge the completion of the writes. For reads, based on the consistency guarantees required by the client, the system either routes the requests to the closest replica or routes the requests to all replicas and waits for a quorum of responses.