Linux and FreeBSD networking

Incipit

I work on the networking subsystem of the Linux kernel and I find networks rather fascinating. Often I read statements about the FreeBSD networking stack being faster and more mature than the Linux counterpart, but I didn’t find any comparative tests between the two OS, and I was so curious that I decided to do some tests myself.

Test setup

To avoid having to setup cables and interfaces on bare metal systems, I decided to get a single, powerful server, and partition it into four VMs, two running Fedora 29, and two running FreeBSD 11.2-RELEASE.

The hypervysor is KVM on Fedora 29 with latest 4.19 kernel.The server was partitioned so that any task of the host OS couldn’t interfere with the guests VCPU: 8 physical CPUs (12,14,16,18,20,22,24,26) were removed from the scheduler with RCU callback and timer off, and system was booted with nosmt=force to avoid using their HT siblings.Both the 10 Gbit cards were on the first NUMA node, so I decided to completely disable the second NUMA node by putting all the CPUs and memory of the second node off with themem=64G kernel command line and a script run at boot.Intel Turbo Boost and P-states are known to skew benchmarks, so they were disabled by the kernel command line and writing withwrmsr-pX 0x1a0 0x4000850089 in the appropriate register on each CPU.Spectre and Meltdown mitigations were disabled via kernel command line.At the end, the kernel command line was:

Guests were interconnected via a DPDK switch, which can handle milions of packets per second with a single core. Two cores were dedicated to DPDK.While storage was not relevant for our purposes, an LVM logical group was created for every VM, instead of using files or even worse, sparse qcow2 images.

Guest OS setup

Each VCPU was pinned to an isolated physical core, RAM was backed by 1 GB hugepages and VirtIO drivers and peripherals were used when possible.2 CPU and 4 GB RAM were given to each VM.The FreeBSD wiki suggests to disable entropy harvesting when doing benchmarks, so I did it adding harvest_mask="351" in /etc/rc.conf. The idea makes sense, even if I didn’t find a way to do the same under linux.

Linux is a fresh Fedora 29 install with a vanilla 4.19 kernel recompiled with the Fedora config just to make it possible to unload iptables_filter:

Syscall overhead

Before starting the network tests I wanted to measure the overhead of syscall invocation, as every I/O operation would trigger at least one.

I wrote a tool to measure the syscall overhead of the OS.Basically it does some millions syscalls in a loop measuring the elapse time by reading the TSC register.The syscall(-1) call which I used on Linux triggered a SIGSYS signal on FreeBSD, which consumes lot of cycles even if trapped, so I switched to slightly a less accurate getuid(0), which just returns a number. The results of the two runs are:

root@fedora1:~# ./ctx_timectx: 243 clocks

root@freebsd1:~# ./ctx_timectx: 281 clocks

The same test could be done with dd bs=1 which performs a syscall for every byte read or written

The smarter readers will note that the Linux buffer sizes are different than the one on your distro, this because I changed the Linux values to be the same as the FreeBSD one with this sysctl, even if the difference was negligible:

At first look, FreeBSD performances looks disastrous, so I’ve shared my results with the #freebsd folks on IRC, and they told me that slow VirtIO drivers for FreeBSD are a known issue. Not a big deal then, KVM was developed on Linux and sure Linux guest drivers are more optimized.I changed the VM setup to use an emulated Intel Gigabit ethernet or Realtek rtl8139, but I get very poor results from both OS so I don’t even report them.So I did the test again on the loopback interface on both guests. It’s not a very professional test, but at least I’m not facing any VirtIO deficience.To make the test fair, I lowered the Linux MTU to 16384 as this is the maximum allowed on FreeBSD loopback device.

“Although close to bare metal, the average interrupt invocation latency of KVM+DID is 0.9µs higher.[…]As a result, there is no noticeable difference between the packet latency of the SRIOV-BM configuration and the SRIOV-DID configuration, because the latter does not incur a VM exit on interrupt delivery.”

So I got some Intel 82599ES 10 Gigabit cards as suggested by the FreeBSD Network Performance Tuning, another identical server and I connected the servers back to back with this setup.

The server was running a single Linux or FreeBSD machine at once with PCIE passthrough, while another server is running TRex bound to two 10 Gbit cards.TRex is a powerful DPDK based traffic generator which is capable to generate dozens of millions of packet per second. It can easily achieve line rate with 10 Gbit cards by sending 64 byte frames from 4 CPU.TRex sends 10 Gbit (14.8 millions of 64 byte packets) per seconds from NIC1, packets got received from NIC2, gets handled by the OS under test, which sends the packets to NIC3, and it finally the packets go back to TRex, which checks them and makes statistics.With this setup, and PCI passthrough tests results became more stable and reproducible between different runs, so I assume that this is the correct way to do it.Each test was repeated ten times, the graph line plots the average, while min and max values are reported with candlesticks.

First test is a software bridge. The two interfaces are bridged and packets are just forwarded.

L2 forwarding

The first thing that struck me, is that FreeBSD packet rate was substantially the same with one or 8 CPU. I investigated a bit, and I’ve found it to be a known issue: bridging under FreeBSD is known to be slow because the if_bridge driver is pratically monothread due to excessive locking, as written in the FreeBSD network optimization guide.

The second thing that I noted is that when running a test on a single core FreeBSD guest, the system freezes until traffic is stopped. It only happens to FreeBSD when the guest has only one core. Initially I tought that it could be a glitch of the serial or tty driver, but then I ran a while sleep 1; do date; done loop, and if it was just an output issue, the time wouldn’t freeze. I looked in all the sysctl to find if the FreeBSD kernel was preemptible, and it is, so I can’t explain what is going on. I made an asciinema which better illustrates this weird behavior.

Second test is routing. Two IP adresses belonging to different networks are assigned to the interfaces, and the TRex NIC4 address is set as default route. TRex is sending packets to the first interface and packets are forwarded.

When talking about L3 forwarding both OS scale quite well. While achieving more or less the same performances with a single core, Linux does a better job with multiple processors.

Third test is about firewall. The setup is the same as the routing test, except that some firewall rules are loaded in the firewall.The rules are generated in a way that they can’t match any packet sending from TRex (different port range than the generated traffic), they are here only to weigh.We know that both OS have two firewall systems, Linux has iptables and nftables, while FreeBSD has PF and IPFW. I tested all of them and in the graph below I report performances for iptables and IPFW because they resulted faster than the other two solutions.

packet filtering

As said before, I deliberately omitted the nftables and PF numbers to avoid confusion, if you want to have a look to the full numbers, here there are the raw data.

Conclusions

Both OS performs well, being able to forward more than 1 million of pps per core, which lets you achieve the 10 gbit line rate with 1500 byte frames.FreeBSD scales relatively well with core numbers (except in bridge mode which is kinda monothread), but Linux does a near perfect job using all the power of a multicore system. The same applies to firewalling, where we can see that a large firewall ruleset can disrupt the performance of both kernels, unless using tricks like fastpath and HW offloading.

About me

I have been working in Red Hat for a year, but I have been collaborating to the Linux kernel and a lot of other open source software for 15 years. I used FreeBSD in production back in 2008 and I ported one of my programs to the FreeBSD ports.I love open source in general, I’m a FSFE supporter and I’m all against DRM and proprietary software.