Overview Container Orchestration Tools

I’ve been experimenting with Docker and it’s ecosystem for a while, and my setup has become a bit of a mess; different machines using various old versions of Docker and various generations of custom scripts to manage them. It was time for an overhaul, and I set out to have a closer look at the tools our there.

It’s kind of a mess. Everyone wants to release an orchestration tool, and often their places in the stack are all over each other.

So let’s consider different parts an orchestration system might cover:

A container engine. Often this is docker, but there are alternatives, including just talking to the Linux plumbing directly. Because the container engine is ultimately replaceable plumbing, and Docker Inc. is a highly funded business, what will continue to happen is that Docker tries to become a full-stack orchestration tool, and competitors will support other container engines.

A scheduler that will run the containers. The most bare-bones version is a CLI script that runs imperatively; It might be upstart or systemd on single-host systems. Or it’s a networked-cluster scheduler like Swarm. A cluster scheduler essentially needs a process that runs on every host. A lot of full-stack orchestrators have their own scheduler, say Tutum (now Docker Cloud). There are standalone schedulers like fleet though that you can harness.

A networking solution to let containers talk to each other. On multi-node clusters this will be some sort of overlay network; your cloud provider might provide it. Even on a single host, you want to let services talk to each other, but only expose some particular services to the public (say the HTTP router). In the early days of Docker, the answer was host mapping.

A Service discovery solution. You want your web app container to talk to the MySQL container, so you have to know it’s address. These days, pretty much everyone seems to use a) a cluster-internal network that every service has an address on b) DNS-based resolving, usually build right into the networking layer and c) ‘links’ in the sense that the DNS name to connect to is injected via environment variables. In the early days of Docker, this was messy. We used various tools to register the services when they start (sdutil, registrator), and to query the service discovery and link on service to another (sdutil, ambassadord); frequently, host mapping to random ports was involved. See also my previous post on this.

A proxy to route to the services you want to expose. In simple cases, you can just publish your webapp on port 80 directly, but if you have two apps, you need to route to them based on the respective domain. Because the router needs to know the address of the backend, this router might integrate with service discovery.

Developer tools, for example deploying an app on every push to the repo.

Cluster-wide persistent storage. Unless you’re not on cloud provider, I consider this to be still unsolved, despite various Docker volume plugins. It’s just very hard to setup.

Before I look at the full-stack tools, here are some of the implementations that focus on particular layers in that stack:

Schedulers

swarm (old)
The swarm scheduler that you ran on as a docker container, now being presumably deprecated in 1.12.

Docker Swarm Mode
The new swarm mode built directly into docker. It is incredibly simple to use.

fleet
A networked systemd. Core-OS specific.

Networking solutions

The network overlay that gives every container their own private ip is clearly the winner here. A lot of orchestrators have their own solution. Generic ones include weave and flannel.

Dev experience

A lot of orchestration tools naturally target ops and don’t deal with this part. There are basically two approaches:

Heroku-like/slugbuilder
A service handles git pushes, runs the code through slugbuilder, stores the slug as a tar.gz somewhere. To run it, the blob is given to a slugrunner image. In other words, your build artifacts use the Heroku slug format, and there is a custom system to hold the version history.

Docker
You build every version of your app into a docker image directly. You use your docker registry for version management. In the simpliest case, you just set up a Github webhook and let Docker Hub build.

Volumes

GlusterFS
A distributed filesystem.

Flocker
Docker volume pugin; too enterprisy for me.

Convoy
Written as part of Rancher. Integrates nicely there. Outside of it, has bad instructions. Does support NFS, block dvices.

Now, let’s look at some full-stach approaches and where they fall in the stack:

Flynn

Flynn literally implements the whole stack by themselves, and exposes everything with a very limited, thin Heroku-like API.

They have their own container engine, their own scheduler, their own network overlay, their own service discovery, their own router, and their own dev UI to create apps, and “git push” release.

You wouldn’t run MySQL by ourself like you would do in a docker-compose. MySQL, as in Heroku, is a backing service serving multiple apps.

But your app can be a Dockerfile, and apps can find each other via service discovery.

So you could setup our own MySQL server as an app, but structurally, you now have two apps: myblog-app and myblog-mysql.

The UX of Flynn is to give devs something Heroku-like apps, not for ops to spin up various containers that interact.

What do I think?

I like that the whole stack is lightweight go. It’s limited API surface has a certain beauty.

I worry they have too much work for a small team.

Installation is difficult; only on Ubuntu now.

Web UI is still very basic.

Tutum/Docker Cloud

Tutum was bought by Docker and rebranded.

Container engine is docker

Scheduler: It’s own.

Networking: weave

Service Discovery: DNS, injecting env vars based on links.

Router: They offer a haproxy image that talks to their scheduler to know about services, and the services device env vars.

Dev-Tools: Let the Docker Hub auto-build after a Github Push, and can auto-deploy after a docker image is published.

Essentially, Docker Cloud is:

a) a scheduler.
b) assembles some tech for you (weave).
c) a UI that is thin interface on top of Docker itself (you still interact with containers and their config a lot), including the “stack” abstraction (a collection of multiple services).

Regarding the proxy: The idea of using a battle-tested haproxy is nice, but in practice, I continuously run into issue: Often it required a restart when updating/changing services. It’s also limited in that it requires defining https + http urls, and cannot do redirects. It requires manually linking the proxy container to all services; if a reload fails (say an issue with ssl cert), all of the sites will be down.

Also I wonder what will happen to Docker Cloud now that the Docker daemon itself now implements essentially everything that Docker Cloud offers, but with different tech (stacks and services as an abstraction, a network overlay, a scheduler). It might end up being just a UI on top of the docker daemon.

Rancher

Container engine and scheduler: The default is their own scheduler running Docker containers. But they have backends for Swarm, Kubernetes and Mesos.

Networking: Custom L3 IPSec tunnel. It seems this is encrypted by default and doesn’t require any user-space implementation.

Service Discovery: DNS and env vars.

Dev-Tools: None.

The idea of different backends is nice, but in practice, Rancher doesn’t paint over the differences. In other words, whatever backend you choose, the frontend you work with will be different, too. The “app catalogs” they offer are separate too. So it’s basically four different products, and not all of them have the same quality. I see a lack of focus here.

What do I think?

UI is a little less polished than Docker Cloud; but I like it more in same ways, plus it’s more speedy.

I easily ran into a bunch of bugs and issues on deeper use.

Does not support v2 of docker-compose.

The abstraction of an external DNS service that they have is very neat. Currently only supports a single root domain, and adds exposed services as subdomain. But still good enough to use with multiple root domains too if CNAME is used.

Storage pool integration via their Convoy service; this worked quite well; the key here is that they wrote the docker volume plugin + they show the pool in the UI. Maybe they execute some docker register plugin command, too. Nice helpers, but independent of the rest of the system, really.

The native docker stack

Docker used to be just the engine. Then they added Swarm as a separate scheduler. A native network overlay. docker-compose as a dev tool. I already talked about Docker cloud.

Now with 1.12, Docker itself has the swarm scheduler built in, and understands a “service” abstraction. Just everything.

Kubernetes

Container engine: Docker, later likely supporting others.

Scheduler: Their own.

Networking: This is up to you, which makes it so difficult to install. Solved for you if using Google Container engine.

Service discovery – Their own, based on DNS

Storage – Various volume implementations.

Proxy up to you, but provides an “Ingress resource” abstraction that you can build on.

Dev tools – up to you.

While it uses Docker as a basic container runner, contrary to other tools it doesn’t expose it at all. You are dealing with a custom CLI and custom abstractions, and there are *a lot of them. Ingress resources, secrets, it’s own volume system. For example, a “service” in Kubernetes doesn’t actually need to run on Kubernetes. Other apps can refer to the service without knowing whether it runs inside or outside of the cluster. Or think about the fact that for scaling, you don’t say replicas=3. It’s abstracted inside a “replication controller”.

What’s my idea?

Looking at Kubernetes/Helm and the config files around it, you get the impression that it has a strong backing/ecosystem/architecture.

It seems well thought out in that it seems to have a resource type of every conceivable problem space.

But kubernetes is far away from docker, and in fact the host itself, that it very much feels like a blackbox.

If you look on your host, there are so many containers management and side-car containers running for each actual container, that it’s not something I want to interact with; I only have the Kubernetes CLI.

The config files feel complicated.

I wish it could be installed more easily.

How does it work inside?

Random proxy port on each host necessary (service port is routed to random proxy port which is routed to pod ip). Apparently because if the pause pod is restarted, it gets a new netns and then the user containers have to be restarted too (is this right?)

Services get a fake ip. The proxy on every node picks up traffic to that fake subnet, probably looks up the ip to map it to a pod, then can forward to the pod ip.

The kubeproxy on each host is essentially the global load balancer; in our own architecture the load balancing would be done via DNS/Service discovery. Here DNS gives you only the (more stable) service ip, which load balances to all the pods/containers.

Similarly, if you give a service an external ip, it will simply create the routes on each node that when a request for this ip comes in it will be routed like internal ips. As such you can use as an external address either any minion ip that is stable, or an external load balancer ip that routes to any of your minions. There might still be an extra hop until it goes to the real pod. It seems internal services are protected by either a firewall and/or the network setup (you cannot get your packets routed to one of these internal service ips).