Orchestrate CockroachDB with Kubernetes

This page shows you how to orchestrate the deployment and management of a secure 3-node CockroachDB cluster with Kubernetes, using the StatefulSet feature.

If you are only testing CockroachDB, or you are not concerned with protecting network communication with TLS encryption, you can use an insecure cluster instead. Select Insecure above for instructions.

Tip:

For details about potential performance bottlenecks to be aware of when running CockroachDB in Kubernetes and guidance on how to optimize your deployment for better performance, see CockroachDB Performance on Kubernetes.

Before You Begin

Before getting started, it's helpful to review some Kubernetes-specific terminology and current limitations.

Kubernetes Terminology

Feature

Description

instance

A physical or virtual machine. In this tutorial, you'll create GCE or AWS instances and join them into a single Kubernetes cluster from your local workstation.

A pod is a group of one of more Docker containers. In this tutorial, each pod will run on a separate instance and include one Docker container running a single CockroachDB node. You'll start with 3 pods and grow to 4.

A StatefulSet is a group of pods treated as stateful units, where each pod has distinguishable network identity and always binds back to the same persistent storage on restart. StatefulSets are considered stable as of Kubernetes version 1.9 after reaching beta in version 1.5.

A persistent volume is a piece of networked storage (Persistent Disk on GCE, Elastic Block Store on AWS) mounted into a pod. The lifetime of a persistent volume is decoupled from the lifetime of the pod that's using it, ensuring that each CockroachDB node binds back to the same storage on restart.

This tutorial assumes that dynamic volume provisioning is available. When that is not the case, persistent volume claims need to be created manually.

A CSR, or Certificate Signing Request, is a request to have a TLS certificate signed by the Kubernetes cluster's built-in CA. As each pod is created, it issues a CSR for the CockroachDB node running in the pod, which must be manually checked and approved. The same is true for clients as they connect to the cluster.

RBAC, or Role-Based Access Control, is the system Kubernetes uses to manage permissions within the cluster. In order to take an action (e.g., get or create) on an API resource (e.g., a pod or CSR), the client must have a Role that allows it to do so. This tutorial creates the RBAC resources necessary for CockroachDB to create and access certificates.

Limitations

Kubernetes Version

Kubernetes 1.8 or higher is required in order to use our most up-to-date configuration files. Earlier Kubernetes releases do not support some of the options used in our configuration files. If you need to run on an older version of Kubernetes, we have kept around configuration files that work on older Kubernetes releases in the versioned subdirectories of https://github.com/cockroachdb/cockroach/tree/master/cloud/kubernetes (e.g., v1.7).

Storage

At this time, orchestrations of CockroachDB with Kubernetes use external persistent volumes that are often replicated by the provider. Because CockroachDB already replicates data automatically, this additional layer of replication is unnecessary and can negatively impact performance. High-performance use cases on a private Kubernetes cluster may want to consider a DaemonSet deployment until StatefulSets support node-local storage.

Step 1. Choose your deployment environment

Choose whether you want to orchestrate CockroachDB with Kubernetes using the hosted Google Kubernetes Engine (GKE) service or manually on Google Compute Engine (GCE) or AWS. The instructions below will change slightly depending on your choice.

Step 2. Start Kubernetes

This includes installing gcloud, which is used to create and delete Kubernetes Engine clusters, and kubectl, which is the command-line tool used to manage Kubernetes from your workstation.

Tip:

The documentation offers the choice of using Google's Cloud Shell product or using a local shell on your machine. Choose to use a local shell if you want to be able to view the CockroachDB Admin UI using the steps in this guide.

From your local workstation, start the Kubernetes cluster:

copy

icon/buttons/copy

$ gcloud container clusters create cockroachdb

Creating cluster cockroachdb...done.

This creates GKE instances and joins them into a single Kubernetes cluster named cockroachdb.

The process can take a few minutes, so don't move on to the next step until you see a Creating cluster cockroachdb...done message and details about your cluster.

serviceaccount "cockroachdb" created
role "cockroachdb" created
clusterrole "cockroachdb" created
rolebinding "cockroachdb" created
clusterrolebinding "cockroachdb" created
service "cockroachdb-public" created
service "cockroachdb" created
poddisruptionbudget "cockroachdb-budget" created
statefulset "cockroachdb" created

Step 4. Approve node certificates

As each pod is created, it issues a Certificate Signing Request, or CSR, to have the node's certificate signed by the Kubernetes CA. You must manually check and approve each node's certificates, at which point the CockroachDB node is started in the pod.

Approve the CSR for the one-off pod from which cluster initialization happens:

copy

icon/buttons/copy

$ kubectl certificate approve default.client.root

certificatesigningrequest "default.node.cockroachdb-0" approved

Confirm that cluster initialization has completed successfully:

copy

icon/buttons/copy

$ kubectl get job cluster-init-secure

NAME DESIRED SUCCESSFUL AGE
cluster-init-secure 1 1 19m

Tip:

The StatefulSet configuration sets all CockroachDB nodes to write to stderr, so if you ever need access to a pod/node's logs to troubleshoot, use kubectl logs <podname> rather than checking the log on the persistent volume.

Step 6. Test the cluster

To use the built-in SQL client, you need to launch a pod that runs indefinitely with the cockroach binary inside it, check and approve the CSR for the pod, get a shell into the pod, and then start the built-in SQL client.

From your local workstation, use our client-secure.yaml file to launch a pod and keep it running indefinitely:

This pod will continue running indefinitely, so any time you need to reopen the built-in SQL client or run any other cockroach client commands, such as cockroach node or cockroach zone, repeat step 2 using the appropriate cockroach command.If you'd prefer to delete the pod and recreate it when needed, run kubectl delete pod cockroachdb-client-secure

Step 7. Monitor the cluster

The port-forward command must be run on the same machine as the web browser in which you want to view the Admin UI. If you have been running these commands from a cloud instance or other non-local shell, you will not be able to view the UI without configuring kubectl locally and running the above port-forward command on your local machine.

Click View nodes list on the right to ensure that all nodes successfully joined the cluster.

Click the Databases tab on the left to verify that bank is listed.

Step 8. Simulate node failure

Based on the replicas: 3 line in the StatefulSet configuration, Kubernetes ensures that three pods/nodes are running at all times. When a pod/node fails, Kubernetes automatically creates another pod/node with the same network identity and persistent storage.

To see this in action:

Kill one of CockroachDB nodes:

copy

icon/buttons/copy

$ kubectl delete pod cockroachdb-2

pod "cockroachdb-2" deleted

In the Admin UI, the Summary panel will soon show one node as Suspect. As Kubernetes auto-restarts the node, watch how the node once again becomes healthy.

Back in the terminal, verify that the pod was automatically restarted:

copy

icon/buttons/copy

$ kubectl get pod cockroachdb-2

NAME READY STATUS RESTARTS AGE
cockroachdb-2 1/1 Running 0 12s

Step 9. Scale the cluster

The Kubernetes cluster contains 4 nodes, one master and 3 workers. Pods get placed only on worker nodes, so to ensure that you don't have two pods on the same node (as recommended in our production best practices), you need to add a new worker node and then edit your StatefulSet configuration to add another pod.

Back in the Admin UI, view Node List to ensure that the fourth node successfully joined the cluster.

Step 10. Upgrade the cluster

As new versions of CockroachDB are released, it's strongly recommended to upgrade to newer versions in order to pick up bug fixes, performance improvements, and new features. The general CockroachDB upgrade documentation provides best practices for how to prepare for and execute upgrades of CockroachDB clusters, but the mechanism of actually stopping and restarting processes in Kubernetes is somewhat special.

Kubernetes knows how to carry out a safe rolling upgrade process of the CockroachDB nodes. When you tell it to change the Docker image used in the CockroachDB StatefulSet, Kubernetes will go one-by-one, stopping a node, restarting it with the new image, and waiting for it to be ready to receive client requests before moving on to the next one. For more information, see the Kubernetes documentation.

All that it takes to kick off this process is changing the desired Docker image. To do so, pick the version that you want to upgrade to, then run the following command, replacing "VERSION" with your desired new version:

If this was an upgrade between minor or major versions (e.g., between v1.0.x and v1.1.y or between v1.1.y and v2.0.z), then you'll want to finalize the upgrade if you're happy with the new version. Assuming you upgraded to the v2.0 minor version, you'd run: