Cluster Control with Etcd and Docker

Adding and removing hosts for live multi-server sites is a lot easier with technology such as Docker and etcd.

Docker containers make it easy to create a template for a new server that you want to spin up, such as a new web server, but you have to connect all the servers together by configuration files when you start up the containers. This is especially problematic when you have an existing running cluster and want to add say another web server instance. You don’t want to take the whole site down just to add one node.

An better approach is where whenever a new server starts up, it registers itself in a service registry such as etcd as shipped with CoreOS. Other interested servers can watch for newly created services and adjust their own configuration to point to the new service that just started. Similarly, they can watch for services that exit to remove them automatically as well.

In this post I describe a set of “Cluster Control” command line scripts written in PHP that implement this functionality on top of etcd. Each container runs several scripts to watch for new entries in etcd, send heartbeat updates to etcd, and shutdown the current server if the server’s entry is removed from etcd. The code can be found on GitHub in https://github.com/alankent/cluster-control. At present it is more of a proof of concept to see what is possible, but so far I am liking the results.

Disclaimer: The opinions in this post are my own and not that of my employer.

Scenario

Consider a example situation where there is

A single load balancer pointing to a set of web servers

Each web server points to a set of database servers (for load balancing and high availability – if any database server goes down, a different database can be used)

A set of database servers

For the sake of this post, I have implemented all three servers using Apache servers with a single “index.php” script. (The server definitions can be found under the demo directory in the repository.) A later project will be to add support for Magento with other containers such as Redis caches.

The goal is then to support the following use cases:

A container is added to an existing site simplify by starting up a new container (e.g. adding a web server will automatically be added to load balancer)

A container can be shut down and removed by deleting the server’s entry in etcd (e.g. removing a database server will automatically deregister the database server from all web servers)

A container that crashes or is is killed will automatically be removed from the cluster (after a period with no heartbeat, the server will be automatically removed)

Please note I do not tackle the harder problem of guaranteeing all operations will success even if a node crashes. My goal here is the simpler problem of having the plumbing for a cluster growing easily, shrinking easily, and recovering without human assistance if a server goes down unexpectedly (even though errors may occur for a short period of time).

The Building Blocks

There are a number of PHP scripts that run behind the scenes. The scripts are as follows. (I go into how to use them all together below.) All of the commands accept a –verbose command line option for printing out more details (such as etcd calls and responses). The commands also all accept a –conf argument to specify the filename to use, defaulting to cluster-control.conf in the current directory.

“key” under “self” is the etcd key that this container will set to indicate it is alive. (Key names look like URL paths allowing key namespacing using the directory structure.)

“ttl” is time to live in seconds for key values stored in etcd. If a container goes down, the entry in etcd will last for at most this length of time. Setting it too small may generate a lot of entries in etcd; setting too large may take a while to recover from availability problems.

“heartbeat” is the duration in seconds of how long to wait between sending update messages. The value of “heartbeat” should always be less than “ttl” or else the heartbeat will expire, which will in turn cause the web server to shut down.

The “clusters” group allows multiple groups of child servers to be defined. This include the name of the cluster, the directory under which the etcd key for this node will be created, and handler code for writing the new configuration details. (A Varnish handler in the future could write out a configuration file in the format Varnish can load directly.)

The supported commands are as follows.

$ bin/cluster-control cc:heartbeat [--once]

This command reads the configuration file and starts setting the “self key” with a time to live value. While things are all good and healthy, the node will keep generating periodic updates to the container’s key. The TTL for the key and the interval between setting of the key are read from the configuration file. The code first uses a “set key” API, then uses a “set key value but only if key already exists” etcd API. The latter means if the key is removed for any reason, the heartbeat will stop and the command will exit. This allows an administrator to shut down a server by simply removing its entry from etcd.

There is also a –once command line option that will generate a single heartbeat and exit. This is useful to guarantee a single heartbeat occurs before starting the cc:watchkey command below.

$ bin/cluster-control cc:removekey

This command removes the key for the current container. This script is typically called after the current server exits to guarantee to update etcd that the container is no longer available.

$ bin/cluster-control cc:watchkey

This command blocks until the key is deleted. If followed by a command to shut down the server, this can be used to quickly detect when someone outside the container removes the key from etcd, and thus cause the container to shutdown. Without this command, the container would wait until the next heartbeat expires until it shuts down.

$ bin/cluster-control cc:clusterprepare {cluster}

This command should be invoked once per cluster the server is interested in when the server is started up. A server may watch multiple clusters (e.g. the web server may watch a cluster of database servers and a cluster of Redis servers). A separate command would be run per cluster. This command returns the etcd “watch index”, an integer that should be later passed to the cc:clusterwatch command. Using this index ensures that no updates are missed as normally the prepare command is run before the server is started, and the cc:clusterwatch command is started after the server is up and running. See the etcd documentation for more details.

This command watches for any changes in the specified cluster (starting at the specified watch index returned from the cc:clusterprepare command). When a change is spotted, a file on disk listing the container members is written, and the supplied command run (typically to send a signal to the server so it knows to reload configuration settings from disk). For a web server, apachectl graceful might be used to do a graceful restart (let current connections finish first).

Start Up Sequence

There is an example start up script using the above commands for each of the demo servers in the GitHub repository. There are slight modifications between the scripts to deal with different clusters to be watched. The main outline of commands is as follows (this is based on the demo/webserver/startserver script).

The script runs in foreground mode to guarantee that if it exits for any reason (e.g. a core dump), the container would guarantee to exit (rather than hang around). (Another script for example could watch the number of running web servers in etcd and start a new instance up automatically if required.) So I decided not to run the web server in background. To get all the other scripts to start running after the web server was ready to receive HTTP requests, I used the while loop to wait until it could connect to the web server. After that all the auxiliary commands can be run.

The following describes each step in more detail:

The first line saves the executable away in a variable. Arguments of –conf or –verbose may be added here as well.

The cc:clusterprepare command writes a fresh database server list file based on what is currently in etcd. This is guaranteed to complete before the Apache web server is started.

The big nested parenthesis group is a series of commands that need to be run after the web server is up. At the start is a while loop that loops until it succeeds in connecting to the web server.

The cc:heartbeat –once command sends a single heartbeat, guaranteeing the key to be set before the following cc:watchkey runs (so it does not mistakenly think a request to shut down the server has come before the server has really even started).

The cc:watchkey watches to see if someone external removes the key value from etcd. If so, the current web server is immediately shut down. This is optional (it would be detected by the missing heartbeat eventually) but it helps the cluster respond quickly allowing a longer delay between heartbeats. (The heartbeats are there for emergency situations only.)

The cc:heartbeat command updates the container’s key periodically with a value that will time out. If something goes wrong (e.g. the host crashes), the key value will expire in etcd and it will thus automatically be removed from other containers. If the heartbeat stops, the server is shutdown. Because this is more of a last resort stop, it does a graceful stop, waits a little while, then does a hard stop.

The next command is a background task to watch for changes in a cluster (have any new hosts joined, or any existing hosts left?), triggering a graceful Apache server restart upon change. It uses the return value of cc:preparecluster to make sure it does not miss any events while the server was starting up.

The Apache2 web server is then started in foreground mode – it does not move on to the next command until the web server exits.

If the web server exits, it is followed by a cc:removekey command to remove the key if the web server exits for any reason. This will tell other containers that this container is existing immediately. (The timeout of values a short time later would remove it without special action being required.) Because the other servers are listening to etcd updates, they should quickly remove the server from their local configuration.

The final command is exit 0 to make sure the cc:removekey command error status is not returned as the shell scripts value.

A bit complicated I know, but the end result is a solution that can deal with a range of problems that can occur.

Installation

If you want to get the scripts going in a demo, you will first need to get a CoreOS server up and running. (It does not strictly need to be CoreOS, but it comes with Docker and etcd out of the box. I found it easier that way.)

In production you would most likely have an etcd instance on every host with a single load balancer, web server, and database server container running on that host. This avoids external network problems stopping the scripts talking to the local etcd server. (The etcd server can worry about recovering from network problems.)

The supplied sample demo script and examples create many light weight mock servers on the one host to avoid the complexities of setting up a cluster of etcd servers for the purposes of this blog. It is important however to note that no change are required to make this operate across multiple hosts other than to configure the etcd servers to talk to each other.

Once etcd is running, the following are some sample commands you can run.

Next, on your CoreOS host get a copy of the Cluster Control library from GitHub.

$ git clone https://github.com/alankent/cluster-control.git

You can go into the demo directory and run the setup-etcd shell script from that directory or follow through the rest of this blog post line by line. You will need to change the “public IP” address of the current host in the setup script. ifconfig can be used to find the IP address. The setup-etcd script can be run multiple times if desired (although there may be some error messages about things that already exist). I am going to go through the commands step by step here and explain what they are all for. (The setup-etcd may different slightly from the following.) If just want to get it up and going, running the setup-etcd script is a good option.

$ ETCD_URL=http://172.17.42.1:4001/
$ PUBLIC_IP=10.64.255.235

Note these do not need to be environment variables – they are just to make the following commands easier to adjust for different IP addresses.

On my host the PUBLIC_IP is the IP address that the server can be reached from externally. This is important if you really do have multiple hosts involved. The ETCD_URL could use the public IP, but it is only used to connect to the local etcd server. You can see where I got these numbers from using the following ifconfig command. (Look for ‘inet’ below.)

Note that none of the configuration mentioned the hosts and ports of other servers. They were only told their own IP address and port numbers. These services can be seen in etcd using the following commands.

The 3 lines above ‘Done’ are the important ones. The three web servers (8110, 8111, and 8112) each have a connection to both database servers (8120 and 8121). (On a real distributed implementation running on different servers the IP addresses would differ rather than the port numbers.)

As you can see, all 3 web servers now see the new database server. (If you are freakishly lucky you might hit it at just the right time where some web servers have been updated and others have not. That should be rare.)

So how to remove a web server? Rather than have to log onto the CoreOS host and kill the docker container, the cc:watchkey command can be used to detect when the entry is deleted an trigger a server shutdown. Thus to shut down a server you just remove its key.

It may take a little while for cc:heartbeat to notice the server is no longer up. Requests to the web server will however fail. (This is where a good load balancer helps – if it can redirect the request to a working server so the client does not notice the failure.)

Using the supplied scripts and etcd, it is fairly easy to implement a cluster where nodes can self-register themselves into the cluster, and clean up upon exit. This makes it much easier to add new hosts to an existing cluster.

The current code base can only write a JSON file – this needs to be extended to allow writing of a range of file formats such as Magento (say for read-only database replicas).

Then my next goal is get some good standard Docker definitions for MySQL, Apache 2 web server, Redis server etc containers, and have those containers use this library. That would make it trivial to add a new web server node to an existing site to ramp up capacity before a special seasonal event.

If you are interested in collaborating on the development of standard Docker container definitions (e.g. good settings for Apache2 or MySQL) specifically for Magento 1.X or Magento 2, please drop me a line or leave a comment.