It is an exciting time to be a software developer in the networking space and as the roles of engineer are changing so too are the rules.

For 15 years, the traditional thinking behind high performance networking has been to take all the packet processing functionality and push as much as possible into the kernel. That model has been changing as the cost of always crossing the divide between kernel and user space, context switching on interrupts to service packets, and copying the data has limited performance of packet processing applications.

Many of these lessons have been discussed and implemented by projects like mTCP. The techniques from mTCP project have since been adopted by other similar projects and include changing expensive system calls to shared memory access that can be used by trusted threads within the same CPU core, efficient flow-level event aggregation, and the use of batch packet processing to achieve higher I/O efficiencies.

Using these principals, mTCP claims to improve the performance of various popular applications by 33% (SSLShader) to 320% (lighttpd) as compared with the native Linux stack.

Scaling for performance on multicore systems has led to new approaches in architecting network software. For example, all the functionality and processing the kernel has been doing including the network drivers is now being placed directly into the user space application and the application now is assuming direct control of NUMA, core affinity, and parallelism. The result of having all of the kernel and user space network processing in the same context of execution keeps the cache fresh and avoids the latency penalty of other designs. These high performance user space network stacks dramatically reduce latency and CPU utilization while increasing message rate and bandwidth. .Additionally, a run-to-completion model can be replicated across available cores to independently process similar workloads..

At the heart of the rush to user space, these stacks are using DPDK to create an interrupt free run-to-completion model for packet processing and adding additional performance improvements by mapping the NIC packet buffers directly to user space. In turn, DPDK leverages the features available in these network stack to manage TCP when the interfaces are unbound from the kernel.

Below, I have gathered some of the open source projects I found. Regardless if you decide to use a vSwitch or a full network stack, network developers have a lot of options to bring their applications to user space to scale performance on multi-core systems.