Multicast deploy terribly slow and huge re-xmits percentage

I’m trying to deploy a linux image over about 100 workstations. For testing purpose I’ve tried with only one classroom (16 workstations).

With these 16 workstations deploying in multicast was terribly slow (between 20 and 50 MB/min) and in the udpcast log I see a lot of timeout with a really high re-xmits percentage (about 230%) like this :

There is a certain amount of processing power involved with replicating a packet to all ports - and cheap switches just don’t cut it.

That’s the real problem, these switch aren’t cheap switch (they are really expensive when we bought them many years ago), they have an internal bandwitch of 48,8 Gb/s (for x250e) and 128 Gb/s (for x450e). So no problem on this side.

After some research, and many many tests, I found a problem with workstations in 100Mb. I think it’s probably a bug (or a need of some tweaking) in workstation’s network driver (kernel 4.17). I explain :

First of all : To avoid some congestion on switch, I set a max bitrate in storage configuration at 80mb.

First test : Workstations and server on a x250e (workstation on 100Mb ports and server on a 1Gb port). Result : Many packets are dropped (about 1 milion for a 10 GB image) and about 50% of re-xmits.

Second test : Workstations and server on a x450e all on 1Gb ports (auto-neg). Result : No drop at all and 0% of re-xmits.

Third test : Workstations and server on a x450e all on 1Gb ports but all workstation’s ports are fixed in 100Mb/full duplex. Result : Same as first test.

Conclusion : Problem is not switches, they can easily manage this load. So I think there’s a problem on the client side… But I’ve no idea about that…

There is a certain amount of processing power involved with replicating a packet to all ports - and cheap switches just don’t cut it.

There’s also maximum total throughput to consider. For example, at home I have a consumer grade Cisco Small business switch. It’s 1Gbps on each port and has 5 ports. But total internal throughput is 3Gbps. That means that I would never be able to multicast at home using that switch at 1Gbps speeds for more than 2 computers at a time. However I have a new 8 port 1Gbps z-link switch from China (for 28 bucks new) that has internal throughput of 5Gbps. Meaning that device would be able to multicast to 4 computers at once with 1Gbps speed to each.

Again, cheap equipment just doesn’t cut it when it comes to multicast and really needing every port to operate at it’s maximum speed. The higher end Cisco equipment usually doesn’t have a problem though with this, they have the horsepower and typically have very high total internal throughput.

Just for informational purpose, workstations are DELL Optiplex 7010 connected on a extreme network switch (100mb ports) and fog-server is hosted on a DELL poweredge R420 server connected on same stack (1 Gb port).