Hi,
On Nov 2, 2011, at 2:37 PM, Seth Hall wrote:
>> Some NICs seem to use a per-flow scheme for distributing traffic onto multiple queues. This can lead to problems if you use such NICs for distributing traffic to multiple Bro instances: Client and server traffic of a single TCP connection might be forwarded to different worker nodes.
>> The fairly common RSS feature in NICs does this sort of round robin packet distribution across queues, but Intel's newer Flow Director feature on their high ends NICs does flow based load balancing across the queues so you actually get client and server traffic in the same queue. On Linux, the only way I know to take advantage of that is with TNAPI (from Luca Deri and the other NTOP guys).
we use an Intel card (82598 Based PCI-E 10) that supports flow-based distribution across queues (RSS). And flow-based on this chipsset really means flow-based. Each flow seems to be hashed according to some hash function that works on:
H(srcIP, srcPort, dstIP, dstPort, proto)
instead of something like
H(srcIP + srcPort + dstIP + dstPort + proto)
which would be bi-directional.
I just checked with one of our setups that employs 8 RX queues with PF_RING + TNAPI. This setup distributes client and server side traffic from several connections onto different queues.
Since I don't have a 82599-based card, I don't know if they changed this behavior. However, from a network card engineers' points of view, it seems to be perfectly fine to not map bi-flows to different queues but split the directions of the traffic on a per-flow base:
If you operate a NIC in a server for normal network communication and not traffic monitoring, which is probably the primary use case for NICs, you will have client traffic on the RX queues and server traffic on the TX queues. So both directions can be hashed independently.
Anyways, the original point I was trying to make is: If you employ the hardware features of current NICs directly, you might run into hardware-specific issues. Such as different chipsets (or even worse: different chipset revisions), that oppose different behavior.
> Lately I've been very impressed with Myricom's sniffer drivers which do the hardware based load balancing and direct memory injection.
Is this similar to the DNA driver that has been done by Luca Deri?
>> We therefore use software load-balancing for setups with multiple Bro worker nodes on a single machine.
>>> How are you doing this? PF_RING is also doing software based load balancing in the kernel, but it's actually slightly wrong because it includes the vlan-id as one of the tuples they balance on which can cause problems for network where each direction of traffic is in a different vlan. I filed a ticket with them to make the load balancing configurable though so hopefully that will be fixed in the next release of PF_RING.
Yes, we use this PF_RING feature for load-balancing. I haven't had any problems with the VLAN problem that you describe, as we do not have any PF_RING setups that employ multiple VLANs.
But yes, this behavior is definitely a bug, and should be fixed.
Best regards,
Lothar
--
Lothar Braun
Chair for Network Architectures and Services (I8)
Department of Informatics
Technische Universität München
Boltzmannstr. 3, 85748 Garching bei München, Germany
Phone: +49 89 289-18010 Fax: +49 89 289-18033
E-mail: braun at net.in.tum.de