Stardog on Kubernetes

Deploying Stardog Cluster on Kubernetes is a walk in the park. In this
post we show you how easy it is and how it works.

tl;dr

Now you can deploy Stardog Cluster on Kubernetes with a single command:

kubectl apply -f stardog-cluster.yaml

Once the pods are running, you’ll have a three node Stardog Cluster
behind a load balancer ready to go.

Overview

We’ll go through the specifics of what this command does (and the contents of the stardog-cluster.yaml file) a bit later in the post,
but here’s a brief overview. This yaml file first instructs Kubernetes to create a namespace on Kubernetes so
we don’t stomp on the default one:

apiVersion: v1
kind: Namespace
metadata:
name: stardog-k8s

Namespaces allow you to scope your objects (pods, services, etc.) within a single
physical cluster. It’s not required that we create a namespace, but it’s generally
good practice to avoid using the default one unless it’s a small development cluster.

After the namespace is created, it adds the credentials for Artifactory
(so it can download the Stardog Docker image) and a Stardog license:

Once the namespace is created and the secrets are in place, the cluster is
deployed. ZooKeeper is deployed and configured first, followed by the Stardog
Cluster. Finally, it creates a service with an externally accessible load balancer
that exposes Stardog Cluster. The command works with the Kubernetes
platforms we’ve tested, including Amazon EKS, Google Kubernetes Engine, and
Microsoft Azure Kubernetes.

Now, let’s take a step back and talk about Kubernetes: what it is, why you
may want to use it, and how we deploy Stardog on it.

A Brief Introduction to Kubernetes

Kubernetes is a container orchestration platform to help users deploy, manage, and scale containerized
applications. Kubernetes refers to a set of containers as a pod. Pods can be a confusing concept at first, as they may consist of multiple containers. If you want to read more about pods you can do so in the
Kubernetes docs.
For our purposes, it’s easiest to think of each ZooKeeper container and each
Stardog container as individual pods.

Kubernetes automatically deploys pods on hosts and monitors them to ensure that
they remain running, restarting them if necessary (possibly on other hosts
in the Kubernetes cluster). For example, if an underlying host crashes, Kubernetes
automatically starts the pods from the bad host on a working host in the
cluster. Or if a host needs to undergo maintenance, Kubernetes
can drain the host and start the pods on another host.
If a user decides they need an extra instance of their application,
they can simply increment the count and Kubernetes takes care of
the rest: it finds a host to deploy the pod, starts the pod, and
performs any required configuration for it. If the application is behind a
load balancer, Kubernetes automatically adds the pod
to the load balancer and directs traffic to it once it’s online.

For simple stateless applications, such as a web application with state
maintained in a database elsewhere, it’s obvious to see how Kubernetes can
effectively manage and scale the web server deployment. Kubernetes doesn’t
have to track anything specific about the containers, it can simply launch
replacement containers on any host in the cluster when needed.

What about more complex, stateful applications, such as a database? This is a
little trickier, but Kubernetes has primitives that make it possible to run those
applications as well. Typically, stateful applications require a few
guarantees: a consistent name, ordered deployments (if multiple services), and
persistent storage. If a host goes down, Kubernetes needs to make sure the stateful
containers on the new host keep the same name and underlying storage, regardless of
where they’re restarted in the cluster. StatefulSets do exactly that,
which is what we use for both ZooKeeper and Stardog containers in our deployment.

Getting Started with Kubernetes

There are a number of getting started guides for Kubernetes, depending on which platform
you want to use:
Google Kubernetes Engine,
Microsoft Azure Kubernetes, or
Amazon EKS, among others.
Of course, you can also deploy your own Kubernetes cluster manually. However, choosing one of
the managed platforms is typically a good starting point.

You’ll also need kubectl on your system and a kube config file for your particular
Kubernetes cluster. The kube config file is what specifies the Kubernetes cluster to use
and the credentials for it.

We’ll leave the details of setting up a Kubernetes cluster and configuring
kubectl to the getting started guide for the platform of your choice.

Stardog Cluster on Kubernetes

Once you’re setup for a Kubernetes cluster, let’s dig into the details of the Stardog
HA Cluster deployment.

The full stardog-cluster.yaml file is available in the stardog-examples GitHub repository. The file includes everything except the secrets: base64-encoded strings for
Artifactory credentials and a Stardog license.

You can encode your Stardog license with:

base64 stardog-license-key.bin

Replace <base64 encoded string of the license file> in stardog-cluster.yaml with the
base64 string.

There are a number of ways to add Artifactory credentials to Kubernetes, however, if you have
any special characters in your password, the easiest way is to create a
artifactory-credentials.json file with your username and password in the following json:

Replace <base64 encoded string of credentials> in stardog-cluster.yaml with the base64 string.

As we mentioned earlier, we first create a namespace for our deployment,
called stardog-k8s. Everything else is configured and deployed into that namespace.
Namespaces provide a scope for objects in a deployment (services, pods, etc.) within a single
physical Kubernetes cluster. There are no hard and fast rules that you must follow for
namespaces. Kubernetes provides a default namespace that may work for small development
clusters with a few users. However, as your use of Kubernetes grows, you may choose to consider
creating separate namespaces for different teams or efforts within your organization.

At Stardog we don’t use the default namespace, instead developers create their own namespaces for testing
and larger efforts are organized into their own namespaces as well. By default, kubectl uses the default namespace, therefore, if you want to see objects created in another namespace you must specify it, e.g.:

kubectl -n stardog-k8s get pods

This makes our lives a bit harder, but it also means that we are, as a vendor, less likely to clash names with customer deployments or environments. You can also set your preferred namespace
in your kubeconfig file so you don’t have to specify it for each kubectl command.

The stardog-cluster.yaml file has a number of different sections worth discussing. It creates services for both Stardog and ZooKeeper. It contains a ConfigMap object that specifies the stardog.properties necessary to enable the cluster. The two major sections are
the StatefulSets for the Stardog and ZooKeeper nodes. The ZooKeeper section largely mirrors a Kubernetes template for ZooKeeper deployments from the Kubernetes GitHub project.

We use a StatefulSet for Stardog and ZooKeeper pods, allowing the pods to maintain consistent DNS names, regardless of which host they’re on. With StatefulSets, each pod is assigned an ordinal index (e.g., zk-0, zk-1, and zk-2), that is consistent across restarts. The stardog.properties configuration file takes advantage of this by specifying the 3 ZooKeeper servers ahead of time using their names, e.g., zk-0.zk-service.stardog-k8s. The StatefulSet also ensures that the underlying data volume
for ZooKeeper and Stardog stay with the pod throughout the cluster.

replicas: 3 instructs Kubernetes to create a 3 node Stardog Cluster and to keep those 3 nodes running. If you need a bigger cluster, just specify a bigger value for replicas. The podAntiAffinity section tells Kubernetes to deploy the 3 Stardog nodes on different hosts in the Kubernetes cluster so a single host failure will only result in a single Stardog container failing.

Because the stardog-cluster.yaml contains both the ZooKeeper nodes and Stardog nodes, we use an initContainer for the Stardog nodes that forces them to wait until ZooKeeper is ready before they start. Finally, the deployment also creates a load balancer and exposes
an external address for the Stardog Cluster. You can list the services in the stardog-k8s namespace:

kubectl -n stardog-k8s get svc

The external Stardog service lists an EXTERNAL-IP you can use to reach the cluster:

stardog-admin --server http://<external-ip>:5820 cluster info

The liveness probe for Stardog pods uses the Stardog health check to determine if the Kubernetes load balancer should route requests to Stardog nodes (and that they are otherwise functioning pods):

The health check waits 30 seconds once the container is created and then queries the Stardog health check endpoint every 5 seconds to verify that the node is still a cluster member.

A Few Helpful Commands

Here are a few additional commands to help you examine the Kubernetes objects created.

List the namespaces in your Kubernetes cluster:

kubectl get namespaces

List the pods in your namespace:

kubectl -n stardog-k8s get pods

View the logs for a specific pod:

kubectl -n stardog-k8s logs zk-0

Connect into a pod in the cluster:

kubectl -n stardog-k8s exec -it stardog-cluster-0 -- sh

List the services (and their IPs and DNS names):

kubectl -n stardog-k8s get svc

Cleaning Up

To wrap up, let’s be a good user of our Kubernetes cluster and delete the objects we created. Luckily, deleting all of this is just as easy as creating it. Instead of running kubectl apply, we can run kubectl delete:

kubectl delete -f stardog-cluster.yaml

This command will delete everything, including the persistent volumes and namespace.

However, if you ever just delete the Stardog and ZooKeeper StatefulSets (and not the credentials or namespace), Kubernetes will keep the persistent volumes so the data won’t be lost if the pod dies or is otherwise removed. You can list the persistent volume claims with the command below. Remember to delete each of those when you’re done since there is an underlying physical volume associated with each claim (e.g., an EBS volume on AWS that is costing you money):

Conclusion

With Kubernetes, Stardog Cluster has never been easier to deploy and manage. Kubernetes provides powerful mechanisms to handle scheduled infrastructure maintenance and recover automatically from infrastructure failures. In future Kubernetes posts, we’ll cover some different failure scenarios and show how Stardog Cluster on Kubernetes adapts in the face of failure. We’ll also be looking at how can use all this elastic computing power to scale Stardog Knowledge Graph services. Stay tuned.