The problem of unpredictable interface order in multi-network Docker containers

Whether we like it or not, the era of DevOps is upon us, fellow network engineers, and with it come opportunities to approach and solve common networking problems
in new, innovative ways. One such problem is automated network change validation and testing in virtual environments, something I’ve already written about a few years ago. The biggest problem with my original approach was that I had to create a custom REST API SDK to work with a network simulation environment (UnetLab) that was never designed to be interacted with in a programmatic way. On the other hand, technologies like Docker have been very interesting since they were built around the idea of non-interactive lifecycle management and came with all API batteries already included. However, Docker was never intended to be used for network simulations and its support for multiple network interfaces is… somewhat problematic.

Problem demonstration

The easiest way to understand the problem is to see it. Let’s start with a blank Docker host and create a few networks:

Now we’re seeing that networks are in a completely different order. Looks like net1 is connected to eth2, net2 to eth1, net3 to eth4 and net4 to eth3. In fact, this issue should manifest itself even with 2 or 3 networks, however, I’ve found that it doesn’t always reorder them in that case.

CNM and libnetwork architecture

In order to better understand the issue, it helps to know the CNM terminology and network lifecycle events which are explained in libnetwork’s design document.

Each time we run a docker network create command a new CNM network object is created. This object has a specific network type (bridge by default) which identifies the driver to be used for the actual network implementation.

network, err := controller.NewNetwork("bridge", "net1", "")

When container gets attached to its networks, first time in docker create and subsequently in docket network connect commands, an endpoint object is created on each of the networks being connected. This endpoint object represents container’s point of attachment (similar to a switch port) to docker networks and may allocate IP settings for a future network interface.

ep, err := network.CreateEndpoint("ep1")

At the time when container gets attached to its first network, a sandbox object is created. This object represents a container inside CNM object model and stores pointers to all attached network endpoints.

sbx, err := controller.NewSandbox("test")

Finally, when we start a container using docker start command, the corresponding sandbox gets attached to all associated network endpoints using the ep.Join(sandbox) call:

Going down the rabbit hole

Looking at the above snippet from sandbox.go, we can assume that the order in which networks will be attached to a container will depend on the order of elements inside the epList array, which gets built earlier in the function:

At this point it looks like endpoints is just an array of pointers to endpoint objects, which still doesn’t explain the issue we’re investigating. Perhaps it would make more sense if we saw how a sandbox object gets created.

Since sandbox object gets created by calling controller.NewSandbox() method, let’s see exactly how this is done by looking at the code inside the controller.go:

The last statement explains why sandbox connects networks in random order. The endpoints array is, in fact, a heap - an ordered tree, where parent node is always smaller than (or equal to) its children (minheap). Heap is used to implement a priority queue, which should be familiar to every network engineer who knows QoS. One of heap’s properties is that it re-orders elements every time an element gets added or removed, in order to maintain the heap invariant (parent <= child).

Problem solution

It turns out the problem demonstrated above is a very well-known problem with multiple opened issues on Github [1,2,3]. I was lucky enough to have discovered this problem right after this pull request got submitted, which is what helped me understand what the issue was in the first place. This pull request reference a patch that swaps the heapified array with a normal one. Below I’ll show how to build a custom docker daemon binary using this patch. We’ll start with a privileged centos-based Docker container:

I tried using VNDR to update the libnetwork files inside the Docker repository, however I ran into problems with incompatible git options on CentOS. So instead I’ll update libnetwork manually, with just the files that are different from the original repo:

Huge kudos to the original author of the libnetwork patch which is the sole reason this blogpost exists. I really hope that this issue will get resolved, in this form or another (could it be possible to keep track of the order in which endpoints are added to a sandbox and use that as a criteria for heap sort?), as this will make automated network testing much more approachable.