HP Critical Advisory – NC522 and NC523 10Gb/s Server Adapters

HP has issued a critical customer advisory regarding some of their high performance server adapters. I’ve had a number of customers impacted by the NC522 and NC523 10Gb/s server adapters losing connectivity. It’s good to see there is a firmware update now that may solve this.

IMPORTANT: The network adapter firmware and driver upgrades provided in the Resolution are required to prevent the loss and recovery of Ethernet connectivity, or adapter unresponsiveness requiring a reboot to recover, from occurring. HP recommends performing these upgrades at the customer´s earliest possible convenience. Neglecting to perform the recommended action and not performing the recommended resolution could result in the potential for subsequent errors to occur.

The network adapters listed in the Scope section (below) may encounter either of the following:

* The adapter may temporarily lose Ethernet connectivity, and then automatically recover.

OR

* The adapter may stop responding, requiring a server reboot to recover the operation of the adapter.

Note: There is a low probability of this occurring when operating under a normal network workload.

Driver and Firmware locations and instructions can be found at the following location:

Share this:

Like this:

Related

About the Author

Michael is Technical Director, Business Critical Applications Engineering at Nutanix. He has been using VMware products since 1998 and has been deploying ESX solutions since 2002. He specializes in designing virtualization solutions for Unix to Linux migrations, business critical applications, disaster avoidance, and mergers and acquisitions. Michael has been in the IT industry since 1995 and consulting since 2001. Michael is Nutanix Platform Expert (NPX) #007. In addition to VMware Certified Design Expert (VCDX) he holds VCP, VCP-Cloud, VCAP-DCD, VCAP-DCA, VCAP-CID, VCAP-CIA, ITIL Foundation, MCP I, and MCSE (NT4 – 2K3).

[…] recently had a situation at a customer site where a physical NIC failure (Read the Critical Advisory I blogged about) caused an All Paths Down (APD) and management network failure. This required a […]

This "Fix" was a no go for us. We have 20 servers with 2 10Gb NC523SFPs and we continue to have an issue with random NIC flapping. One of the 2 nics will randomly lose connection to the switch for about 2 seconds, long enough to trigger that network redundancy was lost. We are running DL380 G7s and we have two separate datacenters with 10 Esx hosts in each one. We even tried putting two NC522SFPs in a server and when it "flapped" the only way I could get the network connection to stop bouncing was to reboot the server.

I can believe that. I know a number of customers still have ongoing problems with their NC522 and NC523 NIC's and are still experiencing some disconnections, although much less frequently and more minor on the whole. The real solution would be to change to NIC's that don't have these problems. I use the Intel X520 dual port cards and have not had any issues. But in a lot of cases it may not be possible to change. I was made aware recently of another firmware revision for these cards, not sure about the driver. So it's even more important with these cards to keep up to date with firmware and drivers, but really these problems should not happen with equipment that has been tested and certified to work.

I tried going to support.hp.com, but the drivers are not accessable that way. Instead, go to http://www.hp.com and then in the search box on the top right of the page, put the card in that you are looking for. A list where you can select your operating system should come up. Click "vmware esx/esxi 4.1". Download the HP Qlogic P2P Flash Update Kit. You have to create a CD/DVD to boot off of, and it will patch the firmware from that.

I don't know if this works, but it is the direct link for the NC523SFP card.

Hi Simon, The customers I've helped with issues related to this advisory got the firmware and drivers directly from HP support (and had HP Techs update them). It is a bit of a process to go through when doing the update though as you have to boot from CD or USB stick. HP definitely do support the servers if it is one they have certified that is on the VMware HCL. Your colleagues in NZ IT experienced exactly the same issue and I helped them resolve it.

Hi Simon, Just so you are aware there are a bunch of other BIOS settings that are recommended in addition to the drivers and firmware. It's recommended that you configure static high performance for power management and enhanced cooling. One of the main reasons for this is that the cards are overheating under load, which is one of the problems. With these settings and the new drivers / firmware the symptoms should be greatly reduced, although I'm not 100% sure it's a complete fix.

HP techs have confirmed to me that for some customers their firmware and drivers fixed the flapping, but for others it continues. Our solution was to replace our NC523SFP cards with Intel x520-DA2 cards. We have had no more issue and are now able to move on with our migration plans.

HP does have some techs configuring firmware dumps for the network cards to try to capture what is causing the problem, but we had already replaced the nics when HP sent this to me. When calling them, make sure you ask about this.

Many thanks for all your help. We have run the SPP against all our hosts now (and updated the 'VMWare' NIC driver) and so far… have not experienced the same issue. We reviewed the BIOS settings previously when we had PSOD crashes when the hosts first went in, but that was a seperate issue that was resolved by a BIOS update.

Anyway hopefully the latest issue is resolved (fingers crossed!)…

I notice HP haven't updated the advisory I linked to above which still suggests people go to Qlogic for the firmware, which is strange considering that the Qlogic firware hangs on these servers (our 6 core CPU builds anyway), and the HP SPP is bootable and does update the firmware succesfully.

We spent 2 long months trying to get these adapters to work on our DL580 G7 servers with VMware ESXi 4.1 u1 and had constant issues – many of the same issues Simon was reporting. We use NFS and iSCSI datastores on a NetApp using jumbo frames and Nexus 5ks and we had many SQL db and vmdk corruptions due to the NC522SFP's experiencing "firmware hangs" or network entropy states…each week we had a new driver to update from HP and Qlogic but they never fixed the issues. We had to pull them and replace them with NC552SFP cards (Emulex OEM). Have not had one single issue since we installed the 552's. We swapped them out in late December, so I don't know if recent firmware has fixed the issues. Good riddance to QLogic – their high performance ethernet cards should have never passed QA. They put my organization through the ringer.

BTW – we also had similar issues with their QLE8242 CNA's and the NC522SFP was suppossed to bring stability to our environment…double whammy.

Hi J, thanks for the feedback, it's unfortunate how many customers have been impacted by these problems. The first release of vSphere 5 from HP had issues with the OneConnect Emulex NIC's (mainly in blade infrastructures). This has been fixed in the latest patch releases. For a while the OneConnect wasn't on the HCL and wasn't supported by HP. So when the time is right to upgrade to ESXi 5 it would pay to run a test for a bit with the OneConnect cards and make sure you use the latest patch release.

We have 5 x HP DL585G7 servers with the onboard Quad Port HP NC375i, a separate Quad Port 1G NC375T and the Dual Port 10G NC522SFP (Yes that is 10 NIC's per server).

We have never had reliable networking with these servers (ESX 4.1, ESXi 4.1, ESXi5.0) and it is constantly a battle between ESXi versions and updates, driver updates, firmware updates and hope to keep things running. The only reliable way to ensure network connectivity is to reboot the servers every 4 weeks or so, before the network cards poop themselves.

I have calls logged with HP, If I ever hear back with a resolution, I'll let you know.

As an update to this issue, I have been in contact with HP support for the last few weeks about this problem and it has done the rounds of their various support people. The short answer is I will be receiving new Main Boards and possibly NIC's for my 5 x DL585G7 servers in late May. I can't give any details unfortunately because the new boards aren't available yet but they are an update, not a direct replacement, of the existing boards. If you are still having this issue with your HP servers and these (IMO) dodgy QLogic NIC's hassle HP support, log a case, follow it up and don't take "No" for an answer until they rectify the problem.

Apparently the firmware and driver updates do fix the problems that some people are seeing so they are definitely worth a shot, but ours needed re-engineering to get past the issue.

I just wanted to let you know that you aren't alone on this one. After a good few calls with HP on this issue, they agreed to send us new system boards and NICs for our DL580 G7 servers which we started switching out. The first server went well but the second box now can't load the onboard NIC drivers even though we have the latest firmware (4.0.585) and the latest drivers (4.0.614). I have an open ticket with VMware and with HP on this issue, so hopefully have resolution soon. I will post the resolution here (if we ever get one)….

HP came onsite almost two weeks ago to replace all 10 of the NIC's on two of my servers. They had to replace the riser board that houses the 4 x LOM NC375i NIC's, they replaced the 4 Port NC375T and the 2 Port NC522SFP.

When I restarted the server after the replacement the only issue was the ILOM had been reset (it resides on the riser board), apart from that no issues.

ESX started fine and all networking settings were the same, no issues with drivers or hardware. I migrated some test VM's to these servers and have not had any outages yet.

Everything seems good now and there have been no problems since. I can't guarantee that this will fix the issue, but it's a start and HP appear confident for the first time. I have HP coming onsite later this week to do the same thing on the last of my servers.

I also have another client with same problem on a number of servers that are experiencing the same issue. They have vSphere 4.1, but are seeing the same problems I have with vSphere 5.0u1. Same escalation with HP and same fix.

Again there's are a couple of advisory's from HP. And I'm very frustrated because since 2 years there's again again issues with the Bios FW and EMULEX FW and Drivers. I would be really interested how many people have the same problems with the G7 Server. So i created a facebook group:

Hi Stefano, my customers have seen an improvement when using the BIOS settings of Static High Performance and Enhanced Cooling. However there is additional power consumption with these settings. I would encourage you to work with support and get a resolution. Many customers have had their NIC's replaced, and others have decided to change hardware platforms completely as a result of the experience.

i chatted with HP Support and they told me to:
– install the NC523SFP in the PCI-E riser slot 2 instead of slot 1 (slot 1 is near to mainboard)
– change thermal configuration in RBSU from "Optimal Cooling" to "Increased Cooling"

Sponsors

Featured Virtualization Book

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email

Disclaimer

The views expressed anywhere on this site are strictly mine and not the opinions and views of VMware or anyone else. All content is provided without any form or warranty explicit or implied, for informational purposes and for use at your own risk.