In The Mythical Man Month, Fred Brooks estimates that it takes nine times the effort to create a complete software system as it does to create just the core software functionality. This rule of thumb system architecture, which is much more complex than just a simple application running in a container.

To design a container architecture, we need to gain a deeper understanding of how container infrastructure works, especially in three main areas:

Multiple containers: How does our approach change when our system consists of multiple containers?

Multiple hosts: How can we distribute our containers across multiple hosts without having to explicitly manage individual container or host instances?

Networking: How can we connect a set of containers while also providing the same isolation we expect when we run containers for separate applications?

It might be tempting to avoid this complexity by just bundling our system components into a single ubercontainer. Unfortunately, this solution eventually hits limits.

First, in order to scale a system efficiently, we usually need to scale different parts in different amounts. For example, we generally need many more web server instances to handle serving content to users than we need database instances for those web server instances to use.

Second, we still have to contend with sharing state across those multiple uber-containers.

Third, we miss out on some of the isolation advantages with an uber-container, such as avoiding a failure in one component rippling into other components.

So it appears that we are stuck running multiple containers. Indeed, the usual approach is the microservice architecture, where each discrete piece of our system gets its own container so that it can be scaled independently, upgraded independently, and developed independently.

And we also need multiple hosts in order to create a highly available system because servers are fleeting. So we need to run lots of instances of lots of container images, spread across multiple hosts but networked together. Here’s how that is accomplished. I’ll be using the Docker and Kubernetes ecosystems for examples, but the core issues arise no matter what technology is used.

Container Orchestration

Running multiple linked containers in Docker can be done in a shell script. Here’s the one I use for running a development instance of Atlassian JIRA:

But this approach is missing essential features. First, there is no built-in mechanism to restart any failed containers. Second, this solution is tied to a specific host and can’t easily be distributed to multiple hosts or scaled to multiple instances. Third, the script requires close reading to figure out what is going on. This is why container orchestration exists; it addresses allof these issues and more.

A simplified Docker Compose file for the same purpose looks like this:

A Kubernetes Replication Controller definition would look much the same. Both are much more readable than our Bash script. Also, by allowing an orchestration engine to run the containers, we can restart failed containers, scale to multiple instances, and distribute our application across multiple hosts.

Container Networking

Scaling to multiple instances and distributing across multiple hosts raises important issues with networking. On a single host, it is easy to understand how links work between containers. Docker gives each container its own set of virtual network devices. These devices are all connected to some software-defined network, and Docker determines the IP address a container gets. So a link between containers is ultimately just an entry in /etc/hosts that ties a name to the correct IP address.

However, once we start using a container orchestration engine, the situation gets more complicated. First, there might be multiple instances of each container, so the orchestration engine must apply a unique name to each.

Second, these instances may come and go. To allow for a more dynamic way for containers to find each other, both Docker and Kubernetes provide a Domain Name Service (DNS) server that is automatically updated as instances come and go. Where there are multiple instances, the DNS server either provides the IP address for one, or sends the whole list.

Crossing Hosts

Multiple software-defined networks allow for isolation if we run multiple sets of containers. DNS provides discovery of container instances on those software-defined networks. But in order to spread our containers across hosts, we need one more feature, which is connecting a software defined network on one host with the right software-defined network on the other host.

Of course, the container orchestration engine places the software-defined networks on the same subnet and avoids duplicate IP addresses. But any switches and routers between the hosts are going to get confused by container IP addresses. So the host encapsulates the traffic inside messages that look like normal host-to-host communication. There are multiple ways to do this,but the most popular is Virtual Extensible Local Area Network (VXLAN).

VXLAN works by sending User Datagram Protocol (UDP) packets on port 4789 between hosts. The receiving host unpacks and sends the contents to the correct container. As a result, the container appears to be directly connected to another container, even when those containers are running on separate hosts.

Here’s an example Wireshark capture from a Docker Swarm environment. One container is connecting to another on port 80.

You can see the TCP port 80 connection attempt (blue bar). This is inside a regular IP packet inside an Ethernet message. This whole Ethernet message is then hidden inside a VXLAN message that travels over UDP on the host network.

So now our container orchestration environment can deploy a set of containers spread across many hosts and give the software in these containers access to each other over what looks like a completely independent private network.

Exposing Services

We’re still missing one piece in order to construct a typical system. We need at least one thing to be accessible from outside the container network. With “docker run”, this just meant exposing a port. Docker opens this port on the host, and all traffic to it goes to a port on the container.

Both Docker Swarm and Kubernetes support a similar approach in an orchestration environment. However, there is extra complexity because there may be multiple instances across multiple hosts. So the orchestration agent on every host must listen on all exposed ports used by any container, then route the traffic to a host running an instance of that container.

This way of doing things has the limitation that we must either communicate the port to clients (which is complex) or find a free port (which removes one of the advantages of containers: lack of resource conflicts with other applications).

One alternative is to allocate an IP address to a service that is accessible from outside the orchestration environment. This is currently possible with Kubernetes, though it requires some integration with the external IP provider. Docker Enterprise Edition has an alternate approach specific to HTTP that adds an entry in DNS that routes to an HTTP load balancer. The HTTP load balancer selects the server based on the host identified in the HTTP request.

Architectural Implications

A typical containerized system has lots of instances of services. These instances start and stop at any time. Some services are exposed to the outside world. So how do we locate a service? From the client side, there are two solutions.

First, we can get the address of one instance and start our conversation with that one. Second, we can get a list of instances and either pick one or load balance across multiple instances. The first approach might not be as efficient, but it does not require any special logic in the client other than the robustness to handle losing its server connection.

In the service, we also have two main approaches. First, we can have one instance handle all of the client traffic itself, with other instances just acting as backup. Second, we can have all of the instances sharing the work as much as possible. The second approach scales better, but is of course much more complex, especially if the service is storing data that needs to be synchronized.

For a containerized microservice architecture, the main appeal is scaling across many instances. So we would like to have clients and services that understand load balancing. However, in real systems, clients such as browsers don’t know about our highly distributed, load balanced approach. So at some level, we have to use both these approaches: a single front-end (with backup) that dispatches work, and lots of load balanced services that do the heavy lifting.

Not surprisingly, both Docker Swarm and Kubernetes are designed around this pattern. Of course, this article is an introduction to a very complex topic, but hopefully, it is a useful foundation that makes it clear why container orchestration is important and why the two main implementations offer so many of the same features.