Tuning on-chip 10GbE (NIU) on T5120/T5220

uses 16 soft rings per 10GbE port. If you don't want to reboot the system, you can use ndd to change the number of soft rings

ndd -set /dev/ip ip_soft_rings_cnt 16

and plumb nxge for the tuning to take effect. Solaris 10 Update 8 or later uses 16 soft rings for 10GbE by default, so the tuning is no longer needed.

Soft rings are kernel threads that offload processing of received packets from interrupt CPU, thus preventing interrupt CPU from becoming the bottleneck. The trade-off is increased latency from switching from interrupt thread to soft ring thread. If your workload is latency sensitive, you may want to see if turning off soft rings help meet your latency needs while also delivering required packet rate or throughput.

Soft rings can be used with any GLDv3 network drivers, such as nxge, e1000g or bge. A little known trick about soft rings is that it can be configured on a per port basis, so you can, for example, configure NIU with 16 soft rings and on-board 1GbE with 2 soft rings. To continue the example:

Tuning Sun Multi-threaded 10GbE PCI-E NIC on T5120/T5220

If your T5120/T5220 has plugged in Sun Multi-threaded 10GbE PCI Express NIC, then two tunables are recommended for throughput:

set ip:ip_soft_rings_cnt=16
set ddi_msix_alloc_limit=8

ddi_msix_alloc_limit is a system-wide limit of how many MSI(Message Signaled Interrupt) or MSI-X a PCI device can allocate. The default is to allow maximum 2 MSI per device. Since each 10GbE port of Sun Multi-threaded 10GbE PCI-E has 8 receive DMA channels and each channel can generate one interrupt, it can generate up to 8 interrupts. To avoid interrupt CPU becoming the performance bottleneck, it is recommended to set ddi_msix_alloc_limit to 8 so that network receive interrupts can target 8 different CPU. This tunable will be unnecessary with a patch to Solaris 10 update 4.

Which CPU are taking interrupts?

If your application threads are pinned too often by interrupts and it becomes a problem, you can create processor set to dedicate these CPU for interrupt processing.
To find out which CPU are taking interrupts, you can use intrstat. In the intrstat output below, you can see CPU 27-34 is interrupted by NIU (niumx is the name of nexus driver for NIU):

The same driver, nxge, is used for both NIU and Sun Multi-threaded 10GbE PCI-E NIC. If you have both installed, you can tell which instance is NIU by looking into /etc/path_to_inst: the instance with /niu prefix is NIU.

In the example above, nxge0 and nxge1 are NIU, and nxge2 and nxge3 are PCI-E NIC.

You may wonder: is there performance difference between NIU and PCI-E NIC? The short answer is: NIU wins in all micro-benchmarks except one - single 10GbE port transmitting UDP small packets.

Sun Multi-threaded 10GbE PCI-E NIC can transmit an impressive 2.1 million 64 byte UDP packets per second, 50% more than NIU(1.4 million pps) out of one port, due to the fact that it has 50% more transmit DMA channels than NIU (12 vs. 8 per port). So if your workload is mostly sending small UDP packets, Sun Multi-threaded 10GbE PCI-E NIC may deliver higher packet rate than NIU on UltraSPARC T2 systems.

In all other scenarios, NIU gives higher throughput or lower CPU utilization at similar throughput. On 2 10GbE port throughput test, 2 NIU can achieve an impressive 14.6 Gbits/s on TCP transmit, or 18.2 Gbits/s on TCP receive using 8Kbytes messages and 145 connections on 1.4GHz T5120. For TCP transmit, the CPU efficiency (measured by Gbps/GHz) of NIU is 23% higher than PCI-E NIC at maximum throughput for 2 ports, and 46% higher at maximum throughput for 1 port. These clearly demonstrates the performance advantage of integrating 10GbE on-chip.