Weave is kinda slow

In our new world of containers Docker, many old problems have been
rediscovered. Thankfully, the fact that these problems were solved decades ago
has not stopped people from coming up with their own solutions, and we now all
get to witness the resulting disasters.

The particular problem I’ll talk about today is IP level overlay networks. The
basic problem these overlay networks try to solve is “I have 1 IP per machine,
but I need a subnet multiple IPs per machine”. This was originally relevant
because you might want have a few networks and want to do some funky networking,
then became relevant because you might have a few VMs and want to do some funky
networking, and now, in 2015, it is relevant because you might have a few
containers and want to do some funky networking. Obviously, these use cases are
distinct enough to require their own implementations and protocols, as we will
see.

The way you solve this problem is usually via some sort of
IP encapsulation, though the specific implementation will vary wildly. The
IP encapsulation RFC
talks about a structure that would look like

There are a couple of other standards for this,
GRE and
VXLan. GRE
(generic routing encapsulation) is a network layer protocol that is most
commonly used to do things such as
IPv4 over IPv6, extending LANs over VPNs etc etc. VXLan (Virtual Extensible LAN)
is a more recent protocol that was designed specifically to enable funky
networking when working in VM heavy environments. The encapsulation provided by
VXLan looks quite different however, as VXLan encapsulates link layer frames
(though it itself is an application layer protocol; the frames are transmitted
via UDP). This looks a bit like this:

Weave

Weave is a company/open source project that provides an
overlay network for your Docker containers. Due to their unique use case of
providing each container with an IP, they have developed their own custom
protocol, which looks something like this (courtesy of the
weave documentation):

This is quite different from the examples I talked about above. Weave captures
data on the frame level, a la VXLan, but then collates multiple frames and
transmits them together via UDP.
This means that 2 packets sent by the container are not guaranteed to cross the
network as 2 packets; if they are sent sufficiently close together, and the sum
of their size is sufficiently smaller than the MTU, they may travel as a single
packet. We’ll see how this affects the connection speed.

Benchmarking Networks

I have two boxes, $IP1 and $IP2. They’re both $5 digital ocean boxes, so should
be representative of the standard machines used in enterprise settings today.
I’ll start off the test by running qperf, a network testing tool, on the first
machine, and then running
qperf $IP1 tcp_bw tcp_lat on the other. This will run a test on TCP
bandwidth and latency between the two IPs:

So I guess you get roughly what you pay for. Anyway, the defining feature of the
cloud is clueless CTOs poor networks, so this shouldn’t be a problem. Let’s
try running the test under two Weave connected containers.

Weave

So running things under Weave is a little more complicated. I’ve annotated the
commands below (this requires a Weave network to have been set up that includes
the two machines).

Boy, for all that work, that’s pretty damn slow. Let’s tabulate that data:

Name

TCP BW (MB/s)

TCP Lat (µs)

BW %

Lat %

Native

116

91.8

100

100

Weave

6.91

372

5.96

405

So two Weave networked container provide about a 6% of the throughput two native
services might, at 4x the latency. Not great. I would guess that a lot of the
time is spent simply getting the packet out of the kernel and into the Weave
process.

Flannel

Weave’s main competitor in the giving-each-container-an-ip space is
flannel, by CoreOS. Flannel offers a range
of encapsulation protocols, all working at the IP level. By default it uses the UDP
based encapsulation I described above, but also supports
VXLan encapsulation, a recent encapsulation
standard that has in-kernel support. I don’t know about you, but I view every packet
that avoids userspace as another step towards salvation.

Flannel uses etcd as its control plane, so I
dumped it on $IP1, and then loaded up the first configuration I wanted to test,
the default UDP encapsulation:

$ etcdctl mk /coreos.com/network/config '{"Network":"10.0.0.0/16"}'

We then fire up flannel on each node, and tell Docker to use the flannel bridge

So, not exactly great, though a fair bit better than Weave. This is likely due
to the fact that Weave captures data via packet capture, while flanneld uses
ipmasq, a lesser known library that allows userspace to make decisions on the
destiny of packets coming out of iptables chains. However, as mentioned before,
in kernel routing is what we would like, and neither of these solutions provide
it. Let’s turn on flannel’s VXLan backend:

Results

Name

TCP BW (MB/s)

TCP Lat (µs)

BW %

Lat %

Native

116

91.8

100.00

100.00

Weave

6.91

372

5.96

405.23

Flannel UDP

23

164

19.83

178.65

Flannel VXLan

112

129

96.55

140.52

I think that speaks for itself. The only other thing I should mention at this
point is that if you are relying on Weave’s encryption feature, I would
recommend investing in an actual VPN implementation. Weave rolls its own
crypto, and I would not suggest people rely on Weave’s custom protocol for
confidentiality on their network links.