Post navigation

Symmetric RSS (Receive-Side scaling)

Receive-Side scaling is a good method, supported by hardware NIC vendors to load-balance traffic flows,up to 5 tupples, to different cores. This helps for the locality of reference and cache coherency (which improves performance).

RSS (Taken from Microsoft)

The RSS algorithm which is mostly used, and suggested by Microsoft is Toeplitz hash algorithm. This algorithm has two inputs: (1) Key, (2) Input (Tupples) from the packet. It outputs a 32 bit hash value, which is used to determine which hardware queue the packet will be delivered to. The relevant core, can poll the associated queue for it’s data – which might be even application specific. The input can be select-able 2,3,4,5 tupples, supporting IPV4 and IPV6. The drawback is that Microsoft’s recommended key, won’t allow symmetric flows to be load-balanced to the same core. To understand this, we need to know how the Toeplitz hash algorithm works.
This is the pseudo code for the hash calculation:

ComputeHash(input[], n)
result = 0
For each bit b in input[] from left to right
{
if (b == 1) result ^= (left-most 32 bits of K)
shift K left 1 bit position
}
return result

As you can see, the input is XORed with the key data, whenever there is a “1” bit. Let’s assume we have a frame IP source: 1.1.1.1, IP destination: 2.2.2.2 and UDP port 22 to udp port 55. This means that the input for the hash function of the 4 tupples will be: [1.1.1.1][2.2.2.2][22][55] and for the opposite direction: [2.2.2.2][1.1.1.1][55][22]. To support the same hash value for these two inputs, the first 32bit of the key need to be identical to the second 32bit, and the 16bit afterwards should be identical to the next 16bit.
The problem with this key requirements, is that we weakens it, in such way we can get a lot of collisions, leading to a very bad distributed load-balancing.
Luckily I ran into this paper.

yields to a similar distribution performance.This new absolute symmetrical key, answers both requirements: The first two 32bit values are identical (first row), and also the next 2 16bit values (second half row).
Having the key configured to the NIC, we can now achieve load-balancing of a bi-directional TCP connection. This is useful, for applications like IPS, DPI, Security and data analytic algorithms, which the opposite data direction needs to be processed too.
Toeplitz hash is widely used, and there are plenty of benchmarking on the web, exploiting the collisions use-cases. Usually the more the traffic is random (usually apply to high-rate links) the more “uniform” the distribution is, and less care about.