Posts in category Support

I think that we all remember the day: 27th of August 2013. On this day Microsoft announced that Windows Server 2012 R2 has been released to manufacturing. This means that Microsoft handles over the software to their hardware partners for them to complete their final system validation.

Back in those days I was involved in a project for upgrading a Windows Server 2008 R2 Hyper-V environment to Windows Server 2012 R2. We started the project with a design phase and after that a POC phase. Although this was all successfull we could not continue to implement Windows Server 2012 R2 Hyper-V to production because there was no support for Windows Server 2012 R2 Hyper-V from out the backup solution used. The backup solution in place is: HP Dataprotector.

Positive as we are we asked HP if Dataprotector will support Windows Server 2012 R2 the next month or the month after that. HP told us that we must be a little patient but in November there will be an announcement. That announcement was quite dissapointing: HP Dataprotector support for Windows Server 2012 R2 (including Hyper-V) will be available in January 2014

Hey, but we are patient! So we sit back and wait…

January 2014: Release of HP Dataprotector version 8.1, However without support for Windows Server 2012 R2! HP told us that they could not make it this time but support for Windows Server 2012 R2, with Hyper-V, will be available in April 2014!

April 2014: Release of HP Dataprotector version 8.11… with support for Windows Server 2012 R2!!! So we could continue our project… no just kidding, we couldn’t! Although there is support for Windows Server 2012 R2, there is no support for Hyper-V. Astonishment and dander all around. We got the feeling that we be kept on a leash. HP told us very friendly that they do not expect support for Windows Server 2012 R2 Hyper-V before September 2014.

Let’s hope it will be earlier than September. HP promised us to support Hyper-V 2012 R2 in the future and in a futured version. When this will be? I’ve no idea and I’m afraid that HP also not having a idea right now.

Unfortunately this is not the only frustration, a lot of customers are waiting for the final release of Lefthand OS 11, features like ODX (in 3PAR) are not working well and so I can continue this story.

A couple of week ago Windows Server 2012 R2 and System Center 2012 R2 reached the GA milestone. We started with a LAB environment for validation our designs. During the deployment we were experiencing connectivity issues with VMs and vNICs. At random virtual machine or vNIC would loose connectivity completely. After a simple live migration the virtual machine would resume connectivity. After verifying our VLAN configuration a couple of times things even got more weird. After live migrating the virtual machine back to the host were it lost connectivity, it still was accessible. Most virtual machines were functioning properly and there was no clear pattern in the what and when a virtual machine was having the issue. And without a way to reproduce the issue on demand it was complex to troubleshoot.

A week later I did an implementation at a customer site. The design was based on a two node Windows Server 2012 R2 Hyper-V cluster with System Center workloads and a five node Windows Server 2012 R2 Hyper-V cluster for production workloads. The nodes of the production cluster were deployed using the bare metal deployment process in System Center VMM 2012 R2. All the hosts were deployed successfully, but we were having issues creating a cluster from these nodes. The cluster validation wizard showed connectivity issues between the nodes. As you might know from my previous blog on bare metal deployment, System Center VMM 2012 R2 can only create a NIC Team with the Logical Switch if a vSwitch is created on top of the NIC team. This required vNICs in the ManagementOS for host connectivity. After validating the VLAN configuration we rebooted the host. Connectivity resumed when a host was rebooted , but at random different hosts lost connectivity again. We were experiencing a similar situation as in our lab environment.

The was another similarity in the two environments. The customer site and our lab consisted of an HP BladeSystem c7000 with BL460c Gen8 blades that contained HP FlexFabric 10Gb 2-port 554FLB Adapters. These BladeSystems use Virtual Connect technology for converged networking. We upgraded our Virtual Connect to the latest version 4.10 before implementing Windows Server 2012 R2, but the customer was still running version 3.75. The HP FlexFabric 10Gb 2-port 554FLB Adapter is based on Emulex hardware and an inbox driver was provided by Microsoft with version number 10.0.430.570. After contacting my friend Patrick Lownds at HP he provided me with a link to the Microsoft Windows Server 2012 R2 Supplement for HP Service Pack. Running this did not provide any update to drivers. The details of the HP FlexFabric 10Gb 2-port 554FLB Adapter showed that this is Emulex hardware. A search on the Emulex site provided an newer version of the driver. After installing the new driver with version 10.0.430.1003 the issue occurred again.

We submitted a case with Microsoft and I have been debugging this issue with a Software Development Engineer from Microsoft (who has verified my blog series on NIC Teaming about a year ago) for the last week. I must say Kudos to Silviu for his assistance every evening this week and Don Stanwyck for communicating with HP. I also reached out to a couple of community members to know if the issue sounded familiar. Rob Scheepens (Sr. Support Escalation Engineer at Microsoft Netherlands) was aware of another customer with the same issue on exactly the same hardware and yesterday evening I was contacted by another one. Same issue, same hardware. This morning I was pinged by Kristian Nese who has a repro of the issue with 2x IBM OCe11102-N2-X Emulex 10GbE in a team (created from VMM) with Emulex driver version 10.0.430.570.

The issue is not solved yet but I though that a quick post would prevent a lot of people from wasting valuable time on troubleshooting. Please submit a case at the hardware vendor as this would create more priority at their site. I’ll update the blog with any progress or relevant information.

A possible temporary workaround seems to configure the NIC Team members in Active/Passive. I have not been able to test and confirm this.

A while ago I wrote a blog about problems with virtual guest clusters and NIC teaming. See this link.

I ended this blog with a workaround: disable checksum offloading.

Today I received a message from Microsoft Premier Support that they found the root cause for this problem: The NetFTflt (Microsoft Failover Cluster Virtual Adapter Performance Filter (NetFT-LWF) ).

If you disabled Checksum Offloading, re-enable it using:

Get-NetAdapter –name XXXX | enable-NetAdapterChecksumOffload

Here is a more detailed explanation of the symptoms, the cause and the workaround:

SYMPTOMS

Failover clusters that are running inside of virtual machines (sometimes referred to as “guest clusters”) may have problems with nodes joining the cluster.
If using the “Create Cluster Wizard” the cluster may fail to create. Additionally, the report from the wizard may have the following message:

An error occurred while creating the cluster.An error occurred creating cluster ‘<clustername>’.
This Operation returned because the timeout period expired

Note: The above errors can also be seen anytime that communications between the servers that are specified to be part of the cluster creation do not complete. A known cause is described in this article.

In some scenarios, the cluster nodes are successfully created and joined if the VMs are hosted on the same node, but once the VMs are moved to different nodes the communications between the nodes of the guest cluster starts to fail. Therefore the nodes of the cluster may be removed from the cluster.

CAUSE

This can occur due to packets not reaching to the virtual machines when the VMs are hosted on Windows Server 2012 failover cluster nodes, due to a failover cluster component that is bound to the network adapters of the hosts. The component is called the “Microsoft Failover Cluster Virtual Adapter Performance Filter” and it was first introduced in Windows Server 2012.

The problem only effects the network packets addressed to cluster nodes hosted in virtual machines.

WORKAROUND

It is recommended that if the Windows Sever 2012 failover cluster is going to host virtual machines that are part of guest clusters, you should unbind the “Microsoft Failover Cluster Virtual Adapter Performance Filter” object from all of the virtual switch network adapters on the Windows Server 2012 Failover Cluster nodes.

Note: This problem can affect any Windows Server Failover Cluster version that is running inside of virtual machines as a guest cluster. The information mentioned in the cause and workaround of this article is specific to Windows Server 2012 Failover Clusters that are used to host virtual machines.

You can disable the “Microsoft Failover Cluster Virtual Adapter Performance Filter” object using one of the methods from below:

Disabling using the GUI

Open “Network Connections” to get the list of network adapters. All network adapters with the “vEthernet” (default name) are the virtual networks (i.e. virtual switch). The physical adapters that also have a Hyper-V virtual adapter configured for it will not have the “Microsoft Failover Cluster Virtual Adapter Performance Filter” binding, so there is nothing to disable for those adapters.

Right click on one of the “v” adapters and select “Properties” from the menu.

Click on “OK to close the dialog and have the binding disabled for the unchecked item.

Repeat for all the adapters.

Disabling using Windows PowerShell

The following will disable the network adapter binding for “Microsoft Failover Cluster Virtual Adapter Performance Filter” on every adapter on the server that has the Componentid of “vms_mp”. This Componentid indicates that the adapter is a Hyper-V adapter used by the virtual switch.

You can run this on each node of the server so that every server has the binding disabled for the adapters used by the virtual switch.

Open the Windows PowerShell console with Administrator access using the “Run as Administrator” option.