Kubernetes: Distributed Stateful Apps using CockroachDB

As recently as December 2017, running databases in Kubernetes was challenging––especially for mission-critical online transaction processing (OLTP) workloads whose databases require strong consistency. At that time, rescheduling a database pod (i.e. moving it to a different machine) meant that it lost the disk it was attached to––and that means that the state it was managing disappeared, as well.

Naturally, teams still needed to run databases and they largely solved the problem by managing state outside of Kubernetes. However, this has meant running a single, critical component of your stack outside of Kubernetes––but, because its operation is crucial, the database still required a lot of infrastructure to support it, which might include:

Process monitoring

Configuration management

In-datacenter load balancing

Service discovery

Monitoring and logging

This is especially painful because all of these functions are duplicative of things already offered in Kubernetes.

Enter StatefulSets

As of Kubernetes 1.9 (released in December 2017), Kubernetes offers advanced support for databases through StatefulSets. This feature lets you attach a persistent disk to a pod and maintain its connection to the disk even if it gets rescheduled to another physical machine. This way, as your database pod gets rescheduled, it’s capable of maintaining its state.

StatefulSets & SQL

While StatefulSets were a huge boon in regards to managing a database, a Kubernetes environment is still difficult for legacy SQL databases to handle. Two major factors that limit their ability to integrate with Kubernetes are:

Scale: Popular solutions like MySQL and PostgreSQL weren’t built to dynamically scale. To get them to work across multiple machines requires complex sharding technology that’s appended onto the database and isn’t trivial to configure, let alone in a dynamically orchestrated environment.

A Better SQL Solution: CockroachDB

Rather than run your mission-critical database apart from the rest of your infrastructure, or using dated technology ill-suited for the environment, teams now have the option of using a cloud-native SQL database like CockroachDB within Kubernetes.

How CockroachDB Works on Kubernetes

CockroachDB’s origin story has a major parallel to Kubernetes’: both have their roots in Google’s infrastructure. While CockroachDB is modeled after Google’s scalable and consistent database, Spanner, Kubernetes is a direct descendant of Google’s orchestration system, Borg. This shared ideological DNA makes it natural that the two would work well together.

Kubernetes’ StatefulSets feature was a huge step forward toward simplifying support for stateful services. Using it, database pods that are rescheduled to other nodes are able to “keep” the same remote disk and simply re-attach to it on its new Kubernetes node. For more details about this, check out our blog post: The State of Stateful Apps in Kubernetes.

CockroachDB was designed to be a highly-available, fault-tolerant database meant to withstand chaotic deployments, which is powered by the Multi-Active Availability model. This feature lets it accept reads and writes on any CockroachDB node without sacrificing serializable isolation. Through multi-active availability, CockroachDB handles rescheduling gracefully. Moving between Kubernetes nodes is no different from a temporary node outage, which CockroachDB is well equipped to handle.

CockroachDB on Kubernetes Deployment Strategy

To put CockroachDB in Kubernetes you have two distinct options:

StatefulSets, which leverage remote persistent volumes for storage and are managed like the rest of your Kubernetes pods (meaning they can easily be rescheduled)

DaemonSets, which let you leverage a node’s local disk, but largely eschew letting Kubernetes manage them (they do not get rescheduled)

Choosing a Deployment Strategy

Like all things, there’s a lot of equivocation around choosing between StatefulSets and DaemonSets. The choice ultimately depends on your level of comfort with Kubernetes (e.g. StatefulSets is simpler to implement) and your tolerance for letting Kubernetes completely drive your application (e.g. DaemonSets simply don’t let Kubernetes reschedule pods).

For most users, we recommend deploying CockroachDB through StatefulSets; it’s straightforward and behaves like all of your other orchestrated services. However, if you are interested in DaemonSets, we have some guidance in our documentation.

StatefulSets Deployment Overview

So, what does it look like to run CockroachDB on Kubernetes through StatefulSets? Here’s an overview of what your environment would look like.

A Kubernetes cluster

A Kubernetes node for each CockroachDB node you want to run, each running in the same datacenter/availability zone

We recommend putting each CockroachDB node on a separate machine to optimize fault tolerance. The Kubernetes scheduler prefers doing this anyway and if a machine goes down, you want to minimize your loss of nodes.

We recommend a single datacenter availability zone when using Kubernetes with CockroachDB. It’s possible to deploy CockroachDB on Kubernetes across multiple availability zones, but as of CockroachDB 2.0, it’s not recommended for most users because of the complexity in exposing internal network names across Kubernetes clusters.

A load balancing service for your CockroachDB cluster

For StatefulSets, you’ll also have a persistent volume for each CockroachDB node (at the time of writing, StatefulSet support for local disks is still in beta)

Monitoring for your Kubernetes cluster through a tool like Prometheus

Getting Everything Up & Running

To deploy CockroachDB on Kubernetes, we have an in-depth guide that covers everything you need.

For those who aren’t ready to move something into production, check out our more lightweight Kubernetes tutorial.