10/27/14

The last couple of weeks I've been playing around with docker and kubernetes. If you are not familiar with kubernetes let's just say for now that its an open source container cluster management implementation, which I find really really awesome.

One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I thought that it would be nice to share the experience.

For my experiments I used Docker v. 1.3.0 and Openshift V3, which I built from source and includes Kubernetes.

ZooKeeper on Docker

Managing a ZooKeeper ensemble is definitely not a trivial task. You usually need to configure an odd number of servers and all of the servers need to be aware of each other. This is a PITA on its own, but it gets even more painful when you are working with something as static as docker images. The main difficulty could be expressed as:

"How can you create multiple containers out of the same image and have them point to each other?"

One approach would be to use docker volumes and provide the configuration externally. This would mean that you have created the configuration for each container, stored it somewhere in the docker host and then pass the configuration to each container as a volume at creation time.

I've never tried that myself, I can't tell if its a good or bad practice, I can see some benefits, but I can also see that this is something I am not really excited about. It could look like this:

An other approach would be to pass all the required information as environment variables to the container at creation time and then create a wrapper script which will read the environment variables, modify the configuration files accordingly, launch zookeeper.

This is definitely easier to use, but its not that flexible to perform other types of tuning without rebuilding the image itself.

Last but not least one could combine the two approaches into one and do something like:

Make it possible to provide the base configuration externally using volumes.

Use env and scripting to just configure the ensemble.

There are plenty of images out there that take one or the other approach. I am more fond of the environment variables approach and since I needed something that would follow some of the kubernetes conventions in terms of naming, I decided to hack an image of my own using the env variables way.

Creating a custom image for ZooKeeper

I will just focus on the configuration that is required for the ensemble. In order to configure a ZooKeeper ensemble, for each server one has to assign a numeric id and then add in its configuration an entry per zookeeper server, that contains the ip of the server, the peer port of the server and the election port.

The server id is added in a file called myid under the dataDir. The rest of the configuration looks like:

Note that if the server id is X the server.X entry needs to contain the bind ip and ports and not the connection ip and ports.

So what we actually need to pass to the container as environment variables are the following:

The server id.

For each server in the ensemble:

The hostname or ip

The peer port

The election port

If these are set, then the script that updates the configuration could look like:

For simplicity the function that read the keys and values from env are excluded.

The complete image and helping scripts to launch zookeeper ensembles of variables size can be found in the fabric8io repository.

ZooKeeper on Kubernetes

The docker image above, can be used directly with docker, provided that you take care of the environment variables. Now I am going to describe how this image can be used with kubernetes. But first a little rambling...

What I really like about using kubernetes with ZooKeeper, is that kubernetes will recreate the container, if it dies or the health check fails. For ZooKeeper this also means that if a container that hosts an ensemble server dies, it will get replaced by a new one. This guarantees that there will be constantly a quorum of ZooKeeper servers.

I also like that you don't need to worry about the connection string that the clients will use, if containers come and go. You can use kubernetes services to load balance across all the available servers and you can even expose that outside of kubernetes.

Creating a Kubernetes confing for ZooKeeper

I'll try to explain how you can create 3 ZooKeeper Server Ensemble in Kubernetes.

What we need is 3 docker containers all running ZooKeeper with the right environment variables:

The env needs to specify all the parameters discussed previously.

So we need to add along with the ZK_SERVER_ID, the following:

ZK_PEER_1_SERVICE_HOST

ZK_PEER_1_SERVICE_PORT

ZK_ELECTION_1_SERVICE_PORT

ZK_PEER_2_SERVICE_HOST

ZK_PEER_2_SERVICE_PORT

ZK_ELECTION_2_SERVICE_PORT

ZK_PEER_3_SERVICE_HOST

ZK_PEER_3_SERVICE_PORT

ZK_ELECTION_3_SERVICE_PORT

An alternative approach could be instead of adding all these manual configuration, to expose peer and election as kubernetes services. I tend to favor the later approach as it can make things simpler when working with multiple hosts. It's also a nice exercise for learning kubernetes.

The name of the port is already defined in the previous snippet. So we just need to find out how to select the pod. For this use case, it make sense to have a different pod for each zookeeper server container. So we just need to have a label for each pod, the designates that its a zookeeper server pod and also a label that designates the zookeeper server id.

Something like the above could work. Now we are ready to define the service. I will just show how we can expose the peer port of server with id 1, as a service. The rest can be done in a similar fashion:

The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port.

Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy.

I hope you found it useful. There is definitely room for improvement so feel free to leave comments.