Automated docker ambassadors with CoreOS + registrator + ambassadord

I’m just starting to play around with docker, and I’ve been investigating the use of CoreOS for deploying a cluster of docker containers. Though I’ve only been using it for a week, I really like what I’ve seen so far. CoreOS is makes it very easy to cluster together a group of machines using etcd, and in particular, I really like their fleet software, which allows you to manage systemd units (which you can use to run docker containers) across an entire CoreOS cluster. Fleet makes it easy to do things like high availability, failure recovery, and other useful things without too much extra effort right out of the box. The one piece missing is how to connect the containers together. There are some ways they’ve documented to do it, but honestly most of the ways I’ve seen on the internet consist of a bunch of shell script glue that feels really hacky to me.

In the docker community, something called the ‘ambassador’ pattern has emerged, which is this idea of proxying connections to container A from container B via container P, and container P has enough smarts in it to transparently redirect connections to many different containers depending on parameters. However, most of the stuff I’ve found on the web is very labor intensive and full of nasty shell scripting that is easy to mess up.

Jeff Lindsay has created the first stage of what I think is a really good general solution to this problem — namely, his projects called registrator and ambassadord. Registrator listens for docker containers to startup, and automatically adds them something like etcd or consul. You link your containers to ambassadord, and when your container tries to make an outgoing connection, it will do a lookup to figure out where the connection needs to go, and connect you there. It’s pretty easy, with very little configuration needed for the involved containers.

CoreOS already ships with etcd built-in, so CoreOS + registrator + ambassadord seems to be a great combination to me. I’ve modified CoreOS’s sample vagrant cluster to demonstrate how to use these to connect containers together.

Setup the CoreOS cluster using Vagrant

First, use the instructions in the original README.md file to start the 3-machine cluster up — make sure you have at least 8GB of free RAM! If you already have Vagrant 1.6+ installed, it should be as easy as:

Once everything comes up, you’ll need to wait for registrator + ambassadord to download and come up. You can use ‘journalctl -u registrator.service’ and ‘journalctl -u ambassadord.service’ to check on the progress. If you execute ‘docker ps’, you should see both of their containers running, and it will show something like so:

If this doesn’t work, it’s probably because you didn’t setup the etcd discovery correctly in the user-data file. Refer to their cluster documentation for details.

Remote fleetctl operation

Using fleetctl from within the cluster is cool, but it’s even better if you install it on your host machine. Either build it with go from their github repo, or on OSX you can do ‘brew install fleetctl’. Once you’ve done that, you can do the following to get your fleet working remotely:

Ok, now that we’ve got things working, let’s do something useful. I’ve decided to setup a small NSQ messaging cluster, which consists of a lookup daemon and some messaging daemons that all need to talk to each other. Moderately complex, but pretty simple once the config is all done.

In my repo, there’s a directory called ‘units’. cd into that, and you can launch an nsq cluster from that directory. First, do ‘fleetctl start nsqlookupd.service’. You can check the status using ‘fleetctl list-units’, and once it’s ready to go it will look like this:

If it says it’s ‘activating’, that means it’s downloading the docker images still. Otherwise, piece of cake. Now, let’s launch 2 NSQ daemons to connect to the lookupd. Just do ‘fleetctl start nsqd.1.service’ and ‘fleetctl start nsqd.2.service’. Once it’s done launching, list-units should show something similar to this:

Easy! Now, we can verify that things are connected by checking the logs. ssh into the machine using ‘vagrant ssh core-0X’, and do ‘journalctl -u nsqd.1.service’ (or etc), and you should see log messages indicating that the nsq daemon has started, and connected to the lookup daemon!

It doesn’t get much simpler than that. Let’s test out this setup, using the instructions taken from the NSQ docker page. In one terminal, watch nsqlookupd (substitute x.x.x.x with the IP address nsqlookupd lives at).

watch -n 0.5 "curl -s curl http://x.x.x.x:4161/topics"

In another terminal, you can send a message to one of the nsq daemons. But let’s do that through the ambassador container to one of the nsq dameons randomly, instead of explicitly specifying the IP address! SSH into one of the cluster machines, and execute this:

If everything works, it should print an ‘OK’, and in the other terminal you should see the list of topics show something like:

{"status_code":200,"status_txt":"OK","data":{"topics":["test"]}}

Now, once you have this working, you can use this cluster + techniques (see below) to connect your own applications to nsqd and nsqlookupd, without any of them needing to know explicitly where the others are.

How it works

The most important part is setting up registrator and ambassadord as services to run on the cluster. I created two systemd unit files for them, and added them to the cloud-init used to initialize the cluster. These should work pretty generically for you regardless of your seutp. You can even extract the relevant parts and run them on a linux that isn’t CoreOS.

Once those services are running, you need to connect your containers to the target containers via docker links. The key part of nsqd.service is right here:

The –link part means link to the backends container using nsqlookupd as an alias, and the -e BACKEND_4160 part tells ambassadord to search for the key at the specified location (which needs to be shorter), and connect any connections to port 4160 to one of the IP addresses it finds there. The SERVICE_NAME environment variable tells registrator to publish the ports for this container under that service name, instead of the default. As all of this stuff is really young and very much subject to change, I’m not going to go into it in huge detail — check out the README files for registrator + ambassadord to see how it works and how you can customize it for yourself.

Conclusion

Now, there’s still a bunch of things that are a bit awkward about this setup, and this NSQ cluster is far from production ready — but I think this is miles past what I’ve seen from other docker container connection solutions at the moment. The good part is that even though registrator/ambassadord is very new and needs some work, this setup will work today. As this software gets better, I agree with Jeff that this is going to make a huge impact on the docker community.

Let me know what you think! If you have improvements/suggestions, drop a note in the comments or do a pull request on the github repo.

This entry was posted
on Monday, July 28th, 2014 at 2:39 am and is filed under docker, tips.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.

If anyone else is having issues starting the nsqd.*.service units, I found that I needed to double the RAM allocated by Virtualbox. Edit both the Vagrantfile and config.rb and change $vb_memory to 2048.