Improving Server Performance

How to improve your server's performance by offloading a TCP/IP stack from a Linux-based server onto an iNIC.

Looking to improve the performance on
your high-end Linux server? Is your Linux system connected to a
high-speed network? Are your servers spending too much of their
resources on processing the TCP/IP stack and Ethernet frames? These
are just some of the problems in today's network environment. These
problems arise because TCP/IP traffic on the Internet and on
private enterprise networks has been growing dramatically for the
past decade, and there is no sign that growth will be slowing down
any time soon. The widespread global adoption of the Internet and
the development of new networking storage technologies such as
iSCSI are driving networking speeds even faster. Although the
processors being manufactured today are gaining speed at an
astonishing pace, it is likely that network-related growth will
continue to outpace the increasing processor speeds, slowing
servers from their primary tasks to process network packets.

Intel has developed a prototype that offloads an entire
TCP/IP stack from a Linux-based operating system onto an
intelligent network interface card (iNIC).

The iNIC Design

The iNIC contains a real-time operating system (RTOS) and an
entire TCP/IP version 4 stack. An I/O processor (IOP) on the iNIC
processes all of the network packets allowing the host processor to
process other tasks. To accomplish this division of labor, a thin
layer of logic is needed on the host side to route all the network
traffic through the iNIC.

Socket Offload

This technology is based on the intelligent I/O (I2O)
architecture, which is already incorporated into the Linux 2.4.x
kernel. Figure 1 is an I2O primer for those who are not familiar
with I2O. The I2O specification is a message-based communication
mechanism between a host operating system (OS) and I/O devices that
are sitting on an IOP. The IOP runs an intelligent RTOS (IRTOS),
which contains device-driver modules (DDMs) for each connected
device. To obtain portability, I2O uses a split-driver model. The
specification defines a set of abstract messages for each supported
device class (i.e., LAN, tape, disk). The host OS uses the
abstracted message layer to communicate with DDMs running on an
IOP. The DDM translates these I2O messages to hardware-specific
commands. To communicate with an I2O device, the host OS must have
a driver that knows how to translate OS device commands to I2O
device class commands. This module in the host OS is referred to as
operating system module (OSM).

Figure 1. I2O Architecture

The solution is based on an extension of this architecture
with the creation of the socket class. Changes were made to the I2O
architecture to increase performance and minimize latencies usually
associated with a split-driver model.

Figure 2. Socket Offload Architecture

The socket class defines messages needed for communication
between the host OS and the DDM. The two drivers, OSM and DDM,
communicate over a two-layer communication system. The message
layer sets up the communication session, and the transport layer
defines how information will be shared. The DDM is composed of two
modules: intermediate service module (ISM) and hardware device
module (HDM). The ISM provides the full functionality of the TCP/IP
version 4 stack. The HDM is the device driver for the iNIC. The
socket OSM is unlike any other network device in Linux. Normal
network card drivers are protocol-independent and interface with
the Linux kernel at the network application program interface
(API). The socket OSM, on the other hand, will interface directly
below the socket API. This allows the necessary socket services to
be offloaded onto the IOP running the socket class ISM. The socket
OSM replaces the services that the TCP/IP stack provided to the
kernel, thereby providing necessary interfaces to the Linux kernel.
It also transmits and converts socket requests and data in the
socket offload format to the iNIC running the TCP/IP offload
stack.

Socket OSM

The OSM is divided into the following subsystems: user
interface, message interface, kernel interface and memory
management.

The user interface replaces the af_inet socket layer in the
kernel. It provides feedback to the users' programs exactly as the
native (non-TCP/IP offloaded) kernel would provide.

The message interface provides the initialization and control
of the socket offload system. It translates the user socket
requests to the socket messages.

The kernel interface provides kernel services to the OSM.
This is the point at which the OSM provides any services normally
provided to the kernel by the TCP/IP stack. This subsystem was
designed to minimize the modifications needed for the Linux
kernel.

The memory management module provides the buffer pools needed
for data transfer to and from the user-space applications. Memory
management was designed to 1) minimize the number of data copies
and DMA requests, 2) minimize the host interrupts, 3) avoid
requiring costly physical-virtual address mappings and 4) avoid
overhead of dynamic-memory allocation at runtime. Two pools of
DMA-capable data buffers are maintained in the OSM. The transmit
buffer contains the data headed for the iNIC; the receive buffer
receives the data into the kernel from the socket device.

Figure 3. Linux TCP/IP Stack

As shown in Figure 3, Linux network components consist of a
layered structure. User-space programmers access network services
via sockets, using the functions provided by the Linux socket
layer. The socket structure defined in include/linux/net.h forms
the basis for the implementation of the socket interface. Below the
user layer is the INET socket layer. It manages the communication
end points for the IP-based protocols, such as TCP and UDP. This
layer is represented by the data structure sock defined in
include/net/sock.h. The layer underneath the INET socket layer is
determined by the type of socket and may be the UDP or TCP layer or
the IP layer directly. Below the IP layer are the network devices,
which receive the final packets from the IP layer.

The socket OSM replaces the INET socket layer. All
socket-related requests passed from the socket layer are converted
into I2O messages, which are passed to the ISM on the IOP.

Hi,
thanks for this useful article.
I'm a linux user and I'm looking at the network device drivers.
I've seen a few NICs support the TCP segmentation offload (as HW function), UFO etc.
But I've not clear if the GSO (Generic Segmentation Offload) can actually improve the network performance.
What do you think about the GSO performance?
Many thanks.