Tuesday, November 27, 2012

Slow Network Access Within Virtual Machines - Broadcom and Hyper-V

Time to update this old dusty thing! I figured this is worthy of a blog entry, as it's not exactly something you can google quickly at the time of this writing.

Let's talk about network speeds with Hyper-V. I have been testing Server 2008 R2 with Hyper-V enabled running a virtual Win7 'guest' and for the life of me couldn't figure out why a member server hosting file shares was getting a ping response time of 30 to over 200+ms. Doing my research pulled up a dozen or so similar instances of network slowness and everyone seemed to attribute that to the TCP chimney offloading and disabling this in the operating system as well as on the physical and virtual network adapters will correct the problem instantly.

Well, it didn't. Turns out after playing with several of the settings within the Broadcom network adapter and recreating the virtual network for the VM's half a dozen times that the real culprit is a feature called 'Virtual Machine Queues'. If any of you out there are running into any issues with your VMs' network functionality, you might try disabling this (along with the TCP offloading, or chimney offloading) and see how that works in your case. It made an immediate difference in mine and now my response time is <1ms as usual and as is seen in any physical box on the network.

35 comments:

Dude! Nice piece of research and good, concise write up. Thank you for standing out among the posts about TCP Offloading, Receive Side Scaling, etc.

I think the show stopper on VMQ is that, according to MS's article on the feature (http://technet.microsoft.com/en-us/library/gg162681(v=ws.10).aspx), "To use this feature, VMQ must be supported by the network hardware." So, it sounds like your switches and possibly other network equipment would have to support VMQ in order for it to work.

I know that in my case, as soon as I disabled VMQ on the host NIC (dedicated) to which guest VM NICs were connected, throughput increased by a factor of 10 and latency fell from as high as 30ms (and quite variable) to <1ms and consistent.

just followed your advice and I don't think I'm going to have to shoot myself now. As soon as I disabled VMQ my network sped up about 10 times and latency decreased tremendously. I have two NIC's, one dedicated to the host and the other to my VM's. I disabled VMQ on both. Any thoughts about whether it was necessary on my NIC dedicated to the host?

Was having the problem on a new Dell T520 with 4-Broadcom NIC’s. Network performance was horrible. Ping results were consistently unpredictable. Disabled the VMQ on the NIC’s and saw an immediate improvement. Ping results <1ms. Thanks.

I am setting up a new Dell R520 server running Hyper-V and Windows Server 2012. The server's Broadcom gigabit adapters (2 LOM and 2 on a PCI card) are connected to a gigabit LAN, but network throughput to the VMs via the Hyper-V virtual switch was dismal, averaging less than 10 Kbps. Virtually unusable, pun intended. (I felt like it was 1980 all over again, and I was downloading a file from Compuserve with my brand new 2400 baud modem.)

Disabling "Virtual Machine Queues" under the ADVANCED tab of the network adapter properties in Device Manager solved the problem immediately. I even went back and toggled the VMQ setting from disabled to enabled several times while monitoring the network throughput to one of the VMs, and it was simply amazing to watch how profoundly it was affecting throughput. Makes me think that Broadcom should be changing the default setting for VMQ in their NIC driver to DISABLED.

I am so glad I Googled the issue and found this solution. Being new to Hyper-V, I was completely lost as to why the virtual machines were experiencing such poor network performance !

I just resolved this same issue using your instructions on a hyper-v machine with many guests. I switched my vSwitch from External to internal, change my adapter settings as described, and then switched my vSwitch back to external on that same NIC. Worked like a charm, even on fully up systems with a 3 minute outage.Thanks!

Awesome post. My Server 2012 VMs were really slow running on a PowerEdge. It took six minutes to log into them via Remote Desktop. I turned off VMQ for the physical NICs as well as the VMS, and login was cut down to about six seconds.

The real issue here is a duplication of every packet running through the VM switch. Wiresharking and pings under unix will result in DUP packets.

Funny thing was I was unable to reproduce the issue with 1gb nic's but the issue was constant with 10gb broadcom nics; it's clearly a bug and disabling on VM's using alot of network IO will spike CPU. The feature is needed, but a fix is needed more.

No one has yet to ask why Broadcom or Microsoft hasn't addressed this issue. It is obvious so many people have wasted their time with this issue. I spent a solid few hours trying to adjust settings, then another few hours trying to find something online, then finally found a related post. It's sad someone doesn't actually FIX the issue. Broadcom's driver should obviously default to disabling that feature. Shame on them for wasting all our time.

I have multiple WS 2012 with Hyper-V hosts and most of the guests have run OK -- because I used the built-in NIC driver, not Broadcom's. The one with the Broadcom driver had VMQ enabled, and even though it was disabled in Hyper-v, it still caused slowness. With VMQ enabled on HyperV and the NIC, it ran better. But it ran best with VMQ disabled on both.

There have been many different terms and concepts associated with TCP offload and one of them is the 10G Bit TCP offload which has a significant role played in the entire process.TCP Offload IP coreThanks for posting here.

THANK YOU! You've saved what was left of my hair. Brand new Dell PowerEdge T420 hardware, 4 Broadcom physical NICs, two guest VM's (each attached to a physical NIC through an External Virtual Switch), brand new out-of-the-box installs of Server 2012R2 and 2012R2 guest VM's, and getting 200+ms latency and about 3MB/s throughput. The second I disabled Virtual Machine Queues in the host adapter Advanced settings, latency dropped to 8ms (internet pings) and throughput is maximized now. I was "this close" to reformatting everything and starting over!