Sessions at Cassandra SF 2011 about Scaling

Monday 11th July 2011

Twitter currently runs a couple hundred Cassandra nodes across a half dozen clusters. These span a variety of workloads– from time series to data, to low latency, high throughput key/value. Each workload has led the team to new techniques for operating Cassandra at scale. Chris Goffinet, an engineer at Twitter and Cassandra committer, will be sharing some of the most interesting ones. For those of you interested in Cassandra and operations, this is a must-attend talk.

Cassandra is mostly known for its I/O scalability, but its shared-nothing, highly available foundation is equally useful for applications requiring multi-datacenter distribution and constant uptime. It’s actually quite easy to build and manage HA services by punting the real HA problems to Cassandra’s battle-tested replication and sharding implementation. In this session, we’ll explore Pantheon’s edge routing layer, including:

How application servers register their presence in Cassandra with a Twisted Python-based REST API. We’ll test updating routes using using cURL.

Using Twisted Python to expose the Cassandra route mappings via DNS. This stack will provide us with a distributed, highly available UDP DNS cluster in about 30 lines of Python code back-ending to Cassandra. We’ll then read routes using the “dig” shell client.

Using node.js as an HTTP reverse proxy to look up, cache (in local Redis), and use (with load balancing) routes discovered by reading the “A” records found in DNS. This part tops off the stack and allows us to distribute HTTP requests among application servers registered in Cassandra.

Since this is a case study, we’ll also consider how Pantheon manages the DNS/Cassandra layer to maintain full availability (read and write) during deployments and data center interlink failures.