Docker Containers – Better and Faster Virtualization

Last post I looked at Vagrant. This post I look at Docker. I started dubious about a Linux only solution, but the more I looked, the more I liked. I think Docker will be big.

Docker

Docker allows you to define light weight application containers based on Linux cgroups (controller groups). For example, you might put a web server instance in one container, MySQL in another, a redis cache in yet another. Containers are then wired together so they can talk to each other through controlled interfaces (network connections, shared file systems, etc). This allows multiple containers to run on a single server with a high degree of isolation from each other, just like virtualization offers.

Containers are isolated because of functionality now built into the Linux kernel. Each container can get its own area of disk space, its own memory, and so on. The Linux kernel limits how code running in one container can interact with code in a different container. This approach is no use if you want to run Windows on the same physical hardware as Linux – it does tie you to the usage of Linux. However, you can run different versions of Linux (such as Ubuntu or Debian) in different containers built on top of the same kernel.

There are several reasons Docker seems to be getting a lot of interest. One reason is efficiency – rather than have a single physical server run multiple copies of an OS, each talking through the virtualization layer to the underlying OS to do tasks like disk I/O, you have a single Linux kernel run multiple Docker containers directly. This wastes less system resources. (One estimate was Docker adds around 2% overhead, unlike some VMs where you may add 20%+ overhead for disk I/O.) Another benefit is since containers are light-weight and portable, you can reuse the same container definition in development, during test automation, and in production. Production may have more instances of the container, but the container definition itself is the same. This is really appealing to architects. You can define logical architectural blocks, without having to worry (too much) about their overheads.

Oh, and the fact that a Docker container can start sub-second is probably not that interesting to people (its just like starting a few new processes on a running Linux server). That is until you realize how many new opportunities are opened up by starting and stopping containers on demand with near zero delay. For example, if you have a set of servers, you can have some batch jobs running in some containers, but if your production site load suddenly spikes, suspend the batch containers and start up more web containers instead. Potentially within seconds!

Docker itself runs on a single server. You can create and run multiple containers on the server and wire them together securely. Where it gets more interesting for me is all the effort now around building distributed applications using clusters of servers. There are patterns emerging like the Ambassador Pattern to make it easier to change wiring between containers on the fly, but I prefer the potential service registry approaches like from etcd. More on this later.

Decking moves up a level to defining multiple containers and relationships between them. You can declare a web container then specify you want 8 copies of it. You can also define different configurations for different environments (such as dev, test, and production). There are some good ideas here, but from my (limited) reading does not tackle the problem of putting the containers on to multiple hosts. Interesting, some good ideas, but not going to change the world by itself. Good for small applications though.

Docker and Serf

There are a series of blog posts at CenturyLinkLabs.com that I found particularly interesting. It started with a 2 container configuration, moving on to using Serf with Docker, and an auto-load balancing solution. These posts I think clearly demonstrated the benefit of being able to spin up and down instances and have them self-register (via Serf in these posts). That is, adding a new web server container gets itself added to a load balancer automatically. This makes more sense to me than the Decking approach as it supports the ability to grow and shrink a cluster dynamically.

Docker and libswarm

Libswarm is a relatively newly announced API dealing with common orchestration requirements for multiple technologies. It seems to be an attempt to define a degree of standardization in basic requirements, while encouraging experimentation to continue.

Docker and CoreOS

One of the light dawning moments for me was when I got to the coreos.com site. CoreOS is a thin Linux kernel that is based on the idea of supporting Docker containers and nothing else. That made sense. The really interesting parts to me were the use of etcd as a service registry.

For example, Decker (above) allows you to predefine a configuration of N servers pretty easily. That is nice, but why is N fixed? What if a machine dies? What if more capacity is needed? Rather than have N as a fixed constant, using a service registry allows a container to register its presence when it comes online. Other containers can listen for such occurrences. For example, a load balancer can listen to the service registry and automatically add any new web servers that come online. Yes, you could have made the process to launch a new web server also register it with the load balancer, but the self-organizing cluster is just “nice”. If you decide to have 2 load balancers instead of 1, no change is required to your installation process. Or if you change the type of the load balancer, you don’t have to worry about it exposing a new API. Or if you add a monitoring application also watching for web servers, you don’t have to worry about that. You just start a new web server, it registers itself in the service registry, and everything else learns and adapts. This is what has me excited about Docker.

CoreOS also includes Fleet for setting up more complex configurations. I think Fleet is an interesting start, but has a way to go. It looks like you have to define a profile per container instance for example. I want to define a pattern and then have N copies of the container. It does have some nice features however, such as saying “don’t put two web servers on the same server – I want them on different machines for resilience”. It can also say “I want this monitoring container on the same machine as each web server to collect stats locally”.

Docker feels like it is still under active development, especially in the area of tools above to wire clusters of multiple servers together.

There are some areas I would like to see improved:

Wiring containers together is not that hard with Docker, but there is still an issue of security. When two containers run on the same physical server the connection is guaranteed secure by the OS. When they run on different servers, you do get issues of security. How do you know whether you can trust the client of a request?

The service registry approach looks promising to me, but it would be great if it could be standardized. CoreOS is using etcd, but that is not a standard.

Defining topologies seems to be not good enough yet. Tools like Decker and Fleet are having forays here, but this seems to be the weakest area to me. I predict major improvements in this area over the next 6 months to a year.

Another area that I think will need a bit more work is database instances (like MySQL). Web server nodes can float around the cloud – if your application is designed well, you can shut down or start up new nodes any time on any server. But databases are not like that – the data on disk is important. The database engine and disk (including network disks) need to stay attached! It is not clear whether Docker is going to help here, or just leave it to hosting partners to come up with their own solutions. I could imagine having dedicated database services (like RDS on Amazon) rather than use Docker containers here might be the sensible option.

Final opinion? Docker gets a big thumbs up from me. Lower performance overhead, nice modularization, lots of community activity, lots of support from multiple hosting providers – that is a lot of momentum.