We were dealing with some issues on on several WNLB clusters running on a Windows Server 2012 R2 Hyper-V cluster after a migration from an older cluster. So go read that and come back .

Are you back? Good.

Let’s look at the situation we’ll use to show case one possible solution to the issues. If you have a 2 node Hyper-V cluster, are using NIC Teaming for the switch and depending on how teaming is set up you’ll might run into these issues. Here we’ll use a single switch to mimic a stacked one (the model available to me is non stackable and I have only one anyway).

Make sure you enable MAC Spoofing on the appropriate vNIC or vNICs in the advanced settings

Note that there is no need to use a static MAC address or copy your VIP mac into the settings of your VM with Windows Server 2012 (R2) Hyper-V

Set up WNLB with IGMP multicast as option. While chancing this there will be some advice warnings thrown at you

I’m not going in to the fact that since W2K8 the network default configurations are all about security. You might have to do some configuration work to get the network flow to do what it needs to do. Lots on this weak host/strong host model behavior on the internet . Even wild messy ramblings by myself here.

On to the switch itself!

Why IGMP multicast? Unicast isn’t the best option and multicast might not cut it or be the best option for your environment and IGMP is less talked about yet it’s a nice solution with Windows NLB, bar replacing into with a hardware load balancer. For this demo I have a DELL PowerConnect 5424 at my disposal. Great little switch, many of them are still serving us well after 6 years on the job.

What MAC address do I feed my switch configurations?

Ah! You are a smart cookie, aren’t you. A mere ipconfig reveals only the unicast MAC address of the NIC. The GUI on WNLB shows you the MAC address of the VIP. Is that the correct one for my chosen option, unicast, multicast or IGMP multicast? No worries, the GUI indeed shows the one you need based on the WNLB option you configure. Also, take a peak at nlb.exe /? and you’ll find a very useful option called ip2mac.

Let’s run that against our VIP:

And compare it to what we see in the GUI, you’ll notice that show the MAC to use with IGMP multicast as well.

You might want to get the MAC address before you configure WNLB from unicast to IGMP multicast. That’s where the ip2mac option comes in handy.

Configuring your switch(es)

We have a multicast IP address that we’ll convert into the one we need to use. Most switches like the PowerConnect 5424 in the example will do that for you by the way.

I’m not letting the joining of the members to the Bridge Multicast Group happen automatically so I need to configure this. I actually have to VLANs, each Hyper-V host has 2 LACP NIC team with Dynamic load balancing connected to an LACP LAGs on this switch (it’s a demo, yes, I know no switch redundancy). I have tow as some WNLB nodes have multiple clusters and some of these are on another VLAN.

I create a Bridge Multicast Group. For this I need the VLAN, the IGMP multicast MAC address and cluster IP address

When I specify the IGMP multicast MAC I take care to format it correctly with “:” instead of “–“ or similar.

You can type in the VIP IP address or convert is per this KB yourself. If you don’t the switch will sort you out.

The address range of the multicast group that is used is 239.255.x.y, where x.y corresponds to the last two octets of the Network Load Balancing virtual IP address.

For us this means that our VIP of 172.31.3.232 becomes 239.255.3.232. The switch handles typing in either the VIP or the converted VIP equally well.

This is what is looks like, here there are two WNLB clusters in ICMP multicast mode configured. There are more on the other VLAN.

We leave Bridge Multicast Forwarding here for what it is, no need in this small setup. Same for IGMP Snooping. It’s enable globally and we’ve set the members statically.

We make unregistered multicast is set to forwarding (default).

Basically, we’re good to go now. Looking at the counters of the interfaces & LAGs you should see that the multicast traffic is targeted at the members of the LAGs/LAGs and not all interfaces of the switch. The difference should be clear when you compare the counters adding up before and after you configured IGMP.

The Results

No over the top switch flooding, I can simultaneously live migrate multiple WLNB nodes and have them land on the same switch without duplicate IP address warning. Will this work for you. I don’t know. There are some many permutations that I can’t tell you what you should do in your particular situation to make it work well. I’ll just quote myself from my previous blog post on this subject:

“"If you insist you want my support on this I’ll charge a least a thousand Euro per hour, effort based only. Really. And chances are I’ll spend 10 hours on it for you. Which means you could have bought 2 (redundancy) KEMP hardware NLB appliances and still have money left to fly business class to the USA and tour some national parks. Get the message?”

But you have seen some examples on how to address issues & get a decent configuration to keep WNLB humming along for a few more years. I really hope it helps out some of you struggling with it.

Wait, you forgot the duplicate IP Address Warning!

No, I didn’t. We’ll address that here. There are some causes for this:

There is a duplicate IP address. If so, you need to address this.

A duplicate IP address warning is to be expected when you switch between unicast and multicast NLB cluster modes (http://support.microsoft.com/kb/264645). Follow the advice in the KB article and clearing the ARP tables on the switches can help and you should get rid of it, it’s transient.

There are other cause that are described here Troubleshooting Network Load Balancing Clusters. All come down to the fact that somehow you’re getting multiple MAC address associated with the same IP address. One possible cause can be that you migrated form an old cluster to a new cluster, meaning that the pool of dynamic IP addresses is different and hence the generated VIP MAC … aha!

Another reason, and again associated multiple MAC address associated with the same IP address is that you have an old static ARP entry for that IP address somewhere on your switches. Do some house cleaning.

If all the above is perfectly fine and you’re certain this is due to some Hyper-V live migration, vSwitch, firmware, driver bug you can get rid of the warning by disabling ARP checks on the cluster members. Under HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, create a DWORD value with as name “ArpRetryCount” and set the value to 0. Reboot the server for this to take effect. In general this is not a great idea to do. But if you manage your IP addresses well and are sure no static entries are set on the switch it can help avoid this issue. But please, don’t just disable “ArpRetryCount” and ignore the root causes.

Conclusion

You can still get WNLB to work for you properly, even today in 2014. But it’s time to start saying goodbye to Windows NLB. The way the advanced networking features are moving towards layer 3 means that “useful hacks” like MAC spoofing for Windows NLB are going no longer going to work. But until you have implement hardware load balancing I hope this blog has given you some ideas & tips to keep Windows NLB running smoothly for now. I’ve done quite few and while it takes some detective work & testing, so far I have come out victorious. Eat that Windows NLB! I have always enjoyed making it work where people said it couldn’t be done. But with the growing important of network virtualization and layer 3 in our networks, this nice hack, has had it’s time.

For some reasons developers like Windows NLB as “it’s easy and they are in control as it runs on their servers”. Well … as you have seen nothing comes free and perhaps our time is better spend in some advanced health checking and failover in hardware load balancing. DevOps anyone?

Here’s the deal. While Windows NLB on Hyper-V guests might seem to work OK you can run into issues. Our biggest challenge was to keep the WNLB cluster functional when all or multiple node of the cluster are live migrated simultaneously. The live migration goes blazingly fast via SMB over RDMA nut afterwards we have a node or nodes in an problematic state and clients being send to them are having connectivity issues.

After live migrating multiple or all nodes of the Windows NLB cluster simultaneously the cluster ends up in this state:

A misconfigured interface. If you click on the error for details you’ll see

Not good, and no we did not add those IP addresses manually or so, we let the WNLB cluster handle that as it’s supposed to do. We saw this with both fixed MAC addresses (old school WNLB configuration of early Hyper-V deployments) and with dynamic MAC addresses. On all the nodes MAC spoofing is enabled on the appropriate vNICs.

The temporary fix is rather easy. However it’s a manual intervention and as such not a good solution. Open up the properties of the offending node or nodes (for every NLB cluster that running on that node, you might have multiple).

Click “OK” to close it …

… and you’re back in business.

Scripting this out somehow with nlb.exe or PowerShell after a guest gets live migrated is not the way to go either.

But that’s not all. In some case you’ll get an extra error you can ignore if it’s not due to a real duplicate IP address on your network:

We tried rebooting the guest, dumping and recreating the WNLB cluster configuration from scratch. Clearing the switches ARP tables. Nothing gave us a solid result.

No you might say, Who live migrates multiple WNLB nodes at the same time? Well any two node Hyper-V cluster that uses Cluster Aware Updating get’s into this situation and possibly bigger clusters as well when anti affinity is not configured or chose to keep guest on line over enforcing said anti affinity, during a drain for an intervention on a cluster perhaps etc. It happens. Now whether you’ll hit this issue depends on how you configure and use your switches and what configuration of LBFO you use for the vSwitches in Hyper-V.

How do we fix this?

First we need some back ground and there is way to much for one blog actually. So many permutations of vendors, switches, configurations, firmware & drivers …

Unicast

This is the default and Thomas Shinder has an aging but great blog post on how it works and what the challenges are here. Read it. It you least good option and if you can you shouldn’t use it. With Hyper-V we and the inner workings and challenges of a vSwitch to the mix. Basically in virtualization Unicast is the least good option. Only use it if your network team won’t do it and you can’t get to the switch yourself. Or when the switch doesn’t support mapping a unicast IP to a multicast MAC address. Some tips if you want to use it:

Don’t use NIC teaming for the virtual switch.

If you do use NIC teaming for the virtual switch you should (must):

use switch independent teaming on two different switches.

If you have a stack or just one switch use multicast or even better IGMP with multicast to avoid issues.

I know, don’t shout at me, teaming on the same switch, but it does happen. At least it protects against NIC issues which are more common than switch or switch port failures.

Multicast

Again, read Thomas Shinder his great blog post on how it works and what the challenges are here.

It’s an OK option but I’ll only use it if I have a switch where I can’t do IGMP and even then I do hope I can do two things:

Add a static entry for the cluster IP address / MAC address on your switch if it doesn’t support IGMP multicast:

The big rotten thing here is that this is great when you’re dealing with physical servers. They don’t tend to jump form switch port to switch port and switch to switch on the fly like a virtual machine live migrating. You just can’t hardcode all the vSwitch ports into the physical switches, one they move and depending on the teaming choice there are multiple ports, switches etc …it’s not allowed and not possible. So when using multicast in a Hyper-V environment stick to 1). But here’s an interesting fact. Many switches that don’t support 1) do support 2). Fun fact is that most commodity switches do seems to support IGMP … and that’s your best choice anyway! Some high end switches don’t support WNLB well but in that category a hardware load balancer shouldn’t be an issue. But let’s move on to my preferred option.

This is your best option and even on older, commodity switches like a DELL PowerConnect 5424 or 5448 you can configure this. It was introduced in Windows Server 2003 (did not exist in NT4.0 or W2K). It’s my favorite (well, I’d rather use hardware load balancing) in a virtual environment. It works well with live migration, prevents switch flooding and with some ingenuity and good management we can get rid of other quirks.

So Didier, tell us, how to we get our cookie and eat it to?

Well, I will share the IGMP with Multicast solution with you in a next blog. Do note that as stated above there are some many permutations of Windows, teaming, WNL, switches & firmware/drivers out there I give no support and no guarantees. Also, I want to avoid writing a 100 white paper on this subject?. If you insist you want my support on this I’ll charge at least a thousand Euro per hour, effort based only. Really. And chances are I’ll spend 10 hours on it for you. Which means you could have bought 2 (redundancy) KEMP hardware NLB appliances and still have money left to fly business class to the USA and tour some national parks. Get the message?

But don’t be sad. In the next blog we’ll discuss some NIC teaming for the vSwitch, NLB configuration with IGMP with Multicast and show you a simple DELL PowerConnect 5424 switch example that make WNLB work on a W2K12R2 Hyper-V cluster with NIC teaming for the vSwitch and avoids following issues:

Messed up WNLB configuration after the simultaneous live migration of all or multiple NLB Nodes.

I’d show you on redundant Force10 S4810 but for that I need someone to ship me some of those with SFP+ modules for the lab, free of cost for me to keep

Conclusion

It’s time to start saying goodbye to Windows NLB. The way the advanced networking features are moving towards layer 3 means that “useful hacks” like MAC spoofing for Windows NLB are going no longer going to work. But until you have implement hardware load balancing I hope this blog has given you some ideas & tips to keep Windows NLB running smoothly for now. I’ve done quite few and while it takes some detective work & testing, so far I have come out victorious. Eat that Windows NLB!

Ladies & gentleman, on May 30-June 1, 2014 the E2EVC 2014 Brussels Virtualization Conference is taking place. This is a non marketing event by experts in virtualization. So these people design, implement and support virtualization solutions for a living. E2EVC Virtualization Conference is a non-commercial, it does not run a profit for the organizers or speakers. Everybody volunteers. The attendance fee covers the costs of the conference rooms, coffee breaks and such. The value is in the knowledge sharing and the networking.

This community event strives to bring the best virtualisation experts together to exchange knowledge and to establish new connections. It’s a weekend event (so people can attend without interrupting their work or customer services. Filled with presentations, Master Classes and discussions you can have 3 days to network and learn from your peers.

If you’re not Belgian you are also very welcome. So do register for E2EVC 2014 Brussels. If you have knowledge to share, please volunteer to speak. This community event has as a goal to share knowledge and stimulates professionals to present on their subject matters.

A big thank you for Alex Juschin & team for his never ending efforts to help organize this conference!

When it comes to gaining insight and understanding of your virtual environment vKernel has some nifty products. They just added two new utilities, Storage Explorer and Change Explorer, to their free vOPS™ Server Explorer that give you more management capabilities with SCOM/SCVMM or vCenter. Sure it’s to get you looking into and considering buying the paid stuff with more functionality and remediation but it does provide you with tools to rapidly asses your virtualization environment for free as is. So what did they add?

Identifies critical storage issues such as over commitment, low capacity, high latency, VMFS version mismatch

Alerts you to critical VM issues such as low disk space, latency and throughput issues

There’s sorting and searching support

Change Explorer

You get a listing of the changes to resource pools, hosts, data stores and VMs within the past week. They also indicate a risk associated with hat change

You can search & filter to find specific changes

There is a graphical mapping of changes over a time line for rapid reporting/assessment.

So if you need some free tools to help you get a quick insight into your environment or the need to be informed about changes of performance issues you can try these out. The press release is here http://www.vkernel.com/press-kits/vops-server-explorer-6-3. We have smaller environment at work next to our main production infrastructure where we’d like to test this out. So they need to add support for SCVMM 2012 SP1 a.s.a.p. I think

In a world were complexity reduction is paramount and the TCO/ROI needs to be good from day one competition is heating up between 3rd party vendors active in this arena providing tools to make that happen. This is especially true when they are adding more and more Hyper-V support. It also doesn’t hurt to push Microsoft or VMware to make their solutions better.

I enjoyed my time at the TEC2011 conference. The networking with fellow IT Pros was excellent and the discussions were rich in content. It’s fun to see that a lot of people are already looking at Windows 8 server. In that respect the session by Hans Vredevoort “Hyper-V Storage Deep Dive” was a very good one, offering a look at things to come. Information is not yet flowing in on Windows 8, or at least not in the quantities we’d like. We all expect that situation to improve before the end of the year when we’ll (hopefully) find a first beta under the x-mas tree to play with during the holidays season . Jaap Wesselius was haunted by the demo gods. Well it was either the demo gods or all those “always on” IT Pros bringing the wireless down. But he recovered strong in his session “Virtualizing Exchange 2010” that was actually called "Exchange on Hyper-V:do’s and don’ts". Not even the demo gods can keep Jaap down. We also enjoyed seeing Carsten Rachfahl in action in his session on “Hyper-V networking Best Practices” which he brought well and with a sense of humor on his new best practice since the evening before when a bunch of us where testing things out on Hyper-V clusters all over Europe Maarten Wijsman (in Holland) & Rick Slagers (on the scene at TEC2011) form Wortell.nl were both assisting in this endeavor.

I had a good time, met a lot of people from the community like Joachim Nässlander (@Nasslander). The Belgian ProExchange community was well represented and Ilse Van Criekinge was there as well. I learned a lot and I’m happy to have attended.To conclude our conference on Wednesday Carsten Rachfahl (@hypervserver), Hans Vredevoort (@hvredevoort) and I did a video panel interview on Hyper-V Windows 8 Server and some 10Gbps cluster networking. He’ll put it on line on his web site http://www.hyper-v-server.de/videos/ so you can find there when it’s released. Jaap, Hans and I went to dinner and concluded the conference with a beer in the bar. It’s back home tomorrow and then to work.

We had a kind reminder recently that we shouldn’t forget to complete all steps in a Hyper-V cluster node upgrade process. The proof of a plan lies in the execution . We needed to configure a virtual machine with a whooping 50GB of memory for an experiment. No sweat, we have plenty of memory in those new cluster nodes. But when trying to do so it failed with a rather obscure error in System Center Virtual Machine Manager 2008 R2

Error (12711)

VMM cannot complete the WMI operation on server hypervhost01.lab.test because of error: [MSCluster_Resource.Name="Virtual Machine MYSERVER"] The group or resource is not in the correct state to perform the requested operation.

(The group or resource is not in the correct state to perform the requested operation (0x139F))

Recommended Action

Resolve the issue and then try the operation again.

One option we considered was that SCVMM2008R2 didn’t want to assign that much memory as one of the old host was still a member of the cluster and “only” has 48GB of RAM. But nothing that advanced was going on here. Looking at the logs found the culprit pretty fast: lack of disk space.

We saw following errors in the Microsoft-Windows-Hyper-V-Worker-Admin event log:

Sure enough a smaller amount of memory, 40GB, less than the remaining disk space on the CSV did work. That made me remember we still needed to expand the LUNS on the SAN to provide for the storage space to store the large BIN files associated with these kinds of large memory configurations. Can you say "luxury problems"? The BIN file contains the memory of a virtual machine or snapshot that is in a saved state. Now you need to know that the BIN file actually requires the same disk space as the amount of physical memory assigned to a virtual machine. That means it can require a lot of room. Under "normal" conditions these don’t get this big and we provide a reasonable buffer of free space on the LUNS anyway for performance reasons, growth etc. But this was a bit more than that buffer could master.

As it was stated in the planning that we needed to expand the LUNS a bit to be able to deal with this kind of memory hogs this meant that the storage to do so was available and the LUN wasn’t maxed out yet. If not, we would have been in a bit of a pickle.

In my blog post “Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)” in a series of thoughts on 10Gbps and Hyper-V networking a discussion on NIC teaming brought up the subject of 10Gbps for virtual machine networks. This means our switches will probably no longer exist in isolation unless those virtual Machines don’t ever need to talk to anything outside what’s connected to those switches. This is very unlikely. That means we need to start thinking and talking about integrating the 10Gbps switches in our network infrastructure. So we’re entering the network engineers their turf again and we’ll need to address some of their concerns. But this is not bad news as they’ll help us prevent some bad scenarios.

Optimizing the use of your 10Gbps switches

As not everyone runs clusters big enough, or enough smaller clusters, to warrant an isolated network approach for just cluster networking. As a result you might want to put some of the remaining 10Gbps ports to work for virtual machine traffic. We’ve already pointed out that your virtual machines will not only want to talk amongst themselves (it’s a cluster and private/internal networks tend to defeat the purpose of a cluster, it just doesn’t make any sense as than they are limited within a node) but need to talk to other servers on the network, both physical and virtual ones. So you have to hook up your 10Gbps switches from the previous example to the rest of the network. Now there are some scenarios where you can keep the virtual machine networks isolated as well within a cluster. In your POC lab for example where you are running a small 100% virtualized test domain on a cluster in a separate management domain but these are not the predominant use case.

But you don’t only have to have to integrate with the rest of your network, you may very well want to! You’ve seen 10Gbps in action for CSV and Live Migration and you got a taste for 10Gbps now, you’re hooked and dream of moving each and every VM network to 10Gbps as well. And while your add it your management network and such as well. This is nothing different from the time you first got hold of 1Gbps networking kit in a 100 Mbps world. Speed is addictive, once you’re hooked you crave for more

How to achieve this? You could do this by replacing the existing 1Gbps switches. That takes money, no question about it. But think ahead, 10Gbps will be common place in a couple of years time (read prices will drop even more). The servers with 10Gbps LOM cards are here or will be here very soon with any major vendor. For Dell this means that the LOM NICs will be like mezzanine cards and you decide whether to plug in 10Gbps SPF+ or Ethernet jacks. When you opt to replace some current 1Gbps switches with 10gbps ones you don’t have to throw them away. What we did at one location is recuperate the 1Gbps switches for out of band remote access (ILO/DRAC cards) that in today’s servers also run at 1Gbps speeds. Their older 100Mbps switches where taken out of service. No emotional attachment here. You could also use them to give some departments or branch offices 1gbps to the desktop if they don’t have that yet.

When you have ports left over on the now isolated 10Gbps switches and you don’t have any additional hosts arriving in the near future requiring CSV & LM networking you might as well use those free ports. If you still need extra ports you can always add more 10Gbps switches. But whatever the case, this means up linking those cluster network 10Gbps switches to the rest of the network. We already mentioned in a previous post the network people might have some concerns that need to be addressed and rightly so.

Protect the Network against Loops & Storms

The last thing you want to do is bring down your entire production network with a loop and a resulting broadcast storm. You also don’t want the otherwise rather useful spanning tree protocol, locking out part of your network ruining your sweet cluster setup or have traffic specifically intended for your 10Gbps network routed over a 1Gbps network instead.

So let us discuss some of the ways in which we can prevent all these bad things from happening. Now mind you, I’m far from an expert network engineer so to all CCIE holders stumbling on to this blog post, please forgive me my prosaic network insights. Also keep in mind that this is not a networking or switch configuration course. That would lead us astray a bit too far and it is very dependent on your exact network layout, needs, brand and model of switches etc.

As discussed in blog post Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4) you need a LAG between your switches as the traffic for the VLANs serving heartbeat, CSV, Live Migration, virtual machines but now also perhaps the host management and optional backup network must flow between the switches. As long as you have only two switches that have a LAG between them or that are stacked you have not much risk of creating a loop on the network. Unless you uplink two ports directly with a network cable. Yes, that happens, I once witnessed a loop/broadcast storm caused by someone who was “tidying up” spare CAT5E cables buy plugging all ends up into free port switches. Don’t ask. Lesson learned: disable every switch port not in use.

Now once you uplink those two or more 10Gbps switches to your other switches in a redundant way you have a loop. That’s where the Spanning Tree protocol comes in. Without going into detail this prevents loops by blocking the redundant paths. If the operational path becomes unavailable a new path is established to keep network traffic flowing. There are some variations in STP. One of them is Rapid Spanning Tree Protocol (RSTP) that does the same job as STP but a lot faster. Think a couple of seconds to establish a path versus 30 seconds or so. That’s a nice improvement over the early days. Another one that is very handy is the Multiple Spanning Tree Protocol (MSTP). The sweet thing about the latter is that you have blocking per VLANs and in the case of Hyper-V or storage networks this can come in quite handy.

Think about it. Apart from preventing loops, which are very, very bad, you also like to make sure that the network traffic doesn’t travel along unnecessary long paths or over links that are not suited for its needs. Imagine the Live Migration traffic between two nodes on different 10Gbps switches travelling over the 1Gbps uplinks to the 1Gbps switches because the STP blocked the 10Gbps LAG to prevent a loop. You might be saturating the 1Gbps infrastructure and that’s not good.

I said MSTP could be very handy, let’s address this. You only need the uplink to the rest of the network for the host management and virtual machine traffic. The heartbeat, CSV and Live Migration traffic also stops flowing when the LAG between the two 10Gbps switches is blocked by the RSTP. This is because RSTP works on the LAG level for all VLANs travelling across that LAG and doesn’t discriminate between VLAN. MSTP is smarter and only blocks the required VLANS. In this case the host management and virtual machines VLANS as these are the only ones travelling across the link to the rest of the network.

We’ll illustrate this with some picture based on our previous scenarios. In this example we have the cluster network going to the 10Gbps switches over non teamed NICs. The same goes for the virtual machine traffic but the NICs are teamed, and the host management NICS. Let’s first show the normal situation.

Now look at a situation where RSTP blocks the purple LAG. Please do note that if the other network switches are not 10Gbps that traffic for the virtual machines would be travelling over more hops and at 1Gbps. This should be avoided, but if it does happens, MSTP would prevent an even worse scenario. Now if you would define the VLANS for cluster network traffic on those (orange) uplink LAGs you can use RSTP with a high cost but in the event that RSTP blocks the purple LAG you’d be sending all heartbeat, CSV and Live Migration traffic over those main switches. That could saturate them. It’s your choice.

In the picture below MSTP saves the day providing loop free network connectivity even if spanning tree for some reasons needs to block the LAG between the two 10Gbps switches. MSTP would save your cluster network traffic connectivity as those VLAN are not defined on the orange LAG uplinks and MSTP prevents loops by blocking VLAN IDs in LAGs not by blocking entire LAGs

To conclude I’ll also mention a more “star like” approach in up linking switches. This has a couple of benefits especially when you use stackable switches to link up to. They provide the best bandwidth available for upstream connections and they provide good redundancy because you can uplink the lag to separate switches in the stack. There is no possibility for a loop this way and you have great performance on top. What’s not to like?

Well we’ve shown that each network setup has optimal, preferred network traffic paths. We can enforce these by proper LAG & STP configuration. Other, less optimal, paths can become active to provide resiliency of our network. Such a situation must be addressed as soon as possible and should be considered running on “emergency backup”. You can avoid such events except for the most extreme situations by configuring the RSTP/MSTP costs for the LAG correctly and by using multiple inter switch links in every LAG. This does not only provide for extra bandwidth but also protects against cable or port failure.

Conclusion

And there you have it, over a couple of blog posts I’ve taken you on a journey through considerations about not only using 10Gbps in your Hyper-V cluster environments, but also about cluster networks considerations on a whole. Some notes from the field so to speak. As I told you this was not a deployment or best practices guide. The major aim was to think out loud, share thoughts and ideas. There are many ways to get the job done and it all depends on your needs an existing environment. If you don’t have a network engineer on hand and you can’t do this yourself; you might be ready by now to get one of those business ready configurations for your Hyper-V clustering. Things can get pretty complex quite fast. And we haven’t even touched on storage design, management etc.. The purpose of these blog was to think about how Hyper-V clustering networks function and behave and to investigate what is possible. When you’re new to all this but need to make the jump into virtualization with both feet (and you really do) a lot help is available. Most hardware vendors have fast tracks, reference architectures that have a list of components to order to build a Hyper-V cluster and more often than not they or a partner will come set it all up for you. This reduces both risk and time to production. I hope that if you don’t have a green field scenario but want to start taking advantage of 10Gbps networking; this has given you some food for thought.

I’ll try to share some real life experiences,what improvements we actually see, with 10Gbps speeds in a future blog post.

As you saw in my previous blog post “Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)” we created an isolated network for Hyper-V cluster networking needs, i.e. Heartbeat, Cluster Shared Volume and Live Migration traffic. When you set up failover clustering you’re doing so to achieve some level of high availability. We did this by using 2 switches and setting up redundant paths to them, making use of the fault tolerance the cluster networks offer us. The darks side of high availability is that is always exposes the next single point of failure and when it comes to networking that means you’ll need redundant NICs, NIC ports, cabling and switches. That’s what we’ll discuss in this blog post. All the options below are just that. There is never an obligation to use them everywhere and it might be not needed depending on the type of network and the business needs we’re talking about. But one thing I have learned is to build options into your solutions. You want ways and opportunities to work around issues while you fix them.

Redundant Switches

The first thing you’ll need to address is the loss of a switch. The better ones have redundant power supplies but that’s about it. So you’ll need to have (at least) two switches and make sure you have redundant connections to both switches. That implies both switches can talk to each other as they form one functional unit even when it is an isolated network as in our example.

One of the ways we can achieve this is by setting up a Link Aggregation Group (LAG) over Inter Switch Links (ISL). The LAG makes all the connections available between the switches for the VLANs you define. There are different types of LAG but one of the better ones is a LAG with LACP.

Stacking your switches might also be a solution if they support that. You might need stacking modules for that. Basically this turns two or more switches into one big switch. One switch in the stack acts as the master switch that maintains the entire stack and provides a single configuration and monitoring point. If a switch in the stack fails the remaining switches will bypass the failed switch via the stacking modules. Depending on the quality of your network equipment you can have some disruption during a the failure of the master switch as then another switch needs to take on that role and this can take anything between 3 seconds and a minute depending on vendor, type, firmware, etc. Network people like this. And as each switch contains the entire the stack configuration it’s very easy to replace a dead switch in a stack. Just rip out the dead one, plug in the replacement one and the stack will do the rest.

We note that more people have access to switches that can handle LAGs versus those who have stackable ones. The reason for this is that the latter tend to be more pricy.

Redundant Network Cards & Ports

Now whether you’re using LAGs or stacking the idea is that you connect your NICs to different switches for redundancy. The question is do we need to do something with the NIC configuration or not to benefit from this? Do we have redundancy in via a cluster wide virtual switch or not? If not can we use NIC teaming? Is NIC teaming always needed or a good idea? Ok, let’s address some of these questions.

First of all, Hyper-V in the current Windows Server 2008 SP1 version has no cluster wide virtual switch that can provide redundancy for your virtual machine network(s). But please allow me to dream about Hyper-V 3.0. To achieve redundancy for the virtual machine networks you’ll need to turn to NIC teaming. NIC teaming has various possible configurations depending on vendor and the capabilities of the switches in use. You might be familiar with terminology like Switch Fault Tolerance (SFT), Adaptive Fault Tolerance (AFT), Link Aggregation Control protocol (LACP), VM etc. Apart from all that the biggest thing to remember is that NIC teaming support has to come from the hardware vendor(s). Microsoft doesn’t support it directly for Hyper-V and Hyper-V gets assess to a NIC team NIC via the Windows operating system.

On NIC Teaming

I’m going to make a controversial statement. NIC teaming can be and is often a cause of issues and it can expensive in time to both set up and fix if it fails. Apart from a lot of misconceptions and terminology confusion with all the possible configurations we have another issue. NIC teaming introduces complexity with drivers & software that is at least a hundred fold more likely to cause failures than today’s high quality network cards. On top of that sometimes people forget about the proper switch configuration. Ouch!

Do a search on Hyper-V and NIC teaming and you’ll see the headaches it causes so many people. Do you need to stay away from it? Is it evil? No, I’m not saying that. Far from that, NIC teaming is great. You need to decide carefully where and when to use it and in what form. Remember when you can handle & manage the complexity need to achieve high availability, generally speaking you’re good to go. If complexity becomes a risk in itself, you’re on the wrong track.

Where do I stand on NIC teaming? Use it when it really provides the benefits you seek. Make sure you have the proper NICs, Switches and software/drivers for what you’re planning to do. Do your research and test. I’ve done NIC Teaming that went so smooth I never would have realized the headaches it can give people. I’ve done NIC teaming where buggy software and drivers drove me crazy.

I’d like to mention security here. Some people tend to do a lot of funky, tedious configurations with VLANS in an attempt to enhance security. VLANS are not security mechanisms. They can be used in a secure implementation but by themselves they achieve nothing. If you’re doing this via NIC teaming/VLANS I’d like to note that once someone has access to your Hyper-V management console and /or the switches you’re toast. Logical and physical security cannot be replaced or ignored.

NIC Teaming To Enhance Throughput

You can use NIC teaming enhance bandwidth/throughput. If this is you major or only goal, you might not even be worried about using multiple switches. Now NIC teaming does help to provide better bandwidth but, sure but nothing beats buying 10Gbps switches & NICs. Really, switches with LAGs or stack and NIC Teaming are great but bigger pipes are always better for raw throughput. If you need twice or quadruple the ports only for extra bandwidth this gets expensive very fast. And if, on top of that, you need consultants because you don’t have a network engineer to set it all up just for that purpose, save your money and invest it in hardware.

NIC Teaming For Redundancy

Do you use NIC teaming for redundancy? Yes, this is a very good reason when it fits the needs. Do you do this for all networks? No, it depends. Just for heartbeat, CSV & Live migration traffic it might be overkill. The nature of these networks in a Hyper-V Cluster is such that you don’t really need it as they can mutually provide redundancy for each other. But what if a NIC port fails when I’m doing a live migration? Won’t that mean the live migration will fail? Yes. But once the NIC is out of the picture Live Migration will just work over the CSV network if you set it up that way. And you’re back in business while you fix the issue. Have I seen live migration fail? Yes, sure. But it never left the VM messed up, that kept running. So you fix the issue and Live Migrate it again.

The same goes for the other networks. CSV should not give you worries. That traffic gets queued and send to the next available network available for CSV. Heart beat is also not an issue. You can afford the little “down time” until it is sent over the next available network for cluster communication. Really a properly set up cluster doesn’t go down when a cluster networks fails if you have multiple of them.

But NIC teaming could/would prevent even this ever so slight interruption you say. It can, yes, depending on how you set this up, so not always by definition. But it’s not needed. You’re preventing something benign at great cost. Have you tested it? Is it always a lossless, complete transparent failover? No a single packet dropped? Not one ping failed? If so, well done! At what cost and for what profit did you do it? How often do your NICs and switch ports fail? Not very often. Also remember the extra complexity and the risk of (human) configuration errors. As always, trust but verify, testing is your friend.

Paranoia Is Your Friend

If you set up NIC teaming without separate NIC cards (not ports) and the PCIe slot goes bad NIC teaming won’t save you. So you need multiple network cards. On top of that, if you decide to run all networks over that team you put all your eggs in that one basket. So perhaps you might need 2 teams distributed over multiple NIC cards. Oh boy redundancy and high availability do make for expensive setups.

Combine NIC Teaming & VLANS Work Around Limited NIC Ports

This can be a good idea. As you’ll be pushing multiple networks (VLANs) over the same pipe you want redundancy. So NIC teaming here can definitely help out. You’ll need to consider the amount of network traffic in this case as well. If you use load balancing NIC teaming you can get some extra bandwidth, but don’t expect miracles. Think about the potential for bottle necks, QoS and try to separate bandwidth hogs on separate teams. And remember, bigger pipes are always better, so consider 10Gbps when you are in a bandwidth crunch.

Don’t Forget About The Switches

As a friendly reminder about what we already mentioned above, don’t forget to use different switches for up linking the NIC ports. If you do forget your switch is the single point of failure (SPOF). Welcome to high availability: always hitting the net SPOF and figuring out how big the risk is versus the cost in money and complexity to deal with it J. Switches don’t often fail but I’ve seen sys admins pull out the wrong PDU cables. Yes human error lures in all corners in all possible variations. I know this would never happen to you, and certainly not twice, but other people are not so skillful. And for those who’d rather be lucky than good I have bad news. Luck runs out. Inevitably bad things happen to all of our systems.

Some Closings Thoughts

One rule of thumb I have is not to use NIC teaming to save money by reducing NIC Cards, NIC ports, number of switches or switch ports. Use it when it serves your needs and procure adequate hardware to achieve your goals. You should do it because you have a real need to provide the absolute best availability and then you put down the money to achieve it. If you talk the talk, you need to walk the walk. And while not the subject of this post, your Active Directory or other core infrastructure services are not single points of failure , are they?

If you do want to use it to save money or work around a lack of NIC ports, there is nothing wrong with that, but say so and accept the risk. It’s a valid decision when you have you have your needs covered and are happy with what that solution provides.

When you take all of this option into consideration, where do you end with NIC teaming and network solutions for Hyper-V clusters? You end up with the “Business ready” or “reference architecture” offered by DELL or HP. They weigh all pros and cons against each other and make a choice based on providing the best possible solution for the largest number of customers at acceptable costs. Is this the best for you? That could very well be. It all depends. They make pretty good configurations.

I tend to use NIC teaming only for the Virtual Machine networks. That’s where the biggest potential service interruption exists. I have in certain environments when NIC teaming was something that was not chosen mediated that risk by providing 2 or 3 single NIC for 2 or 3 virtual networks in Hyper-V. That reduced the impact to 1/3 of the virtual machines. And a fix for a broken NIC is easy; just attach the VMs to a different virtual network. You can do this while the virtual machines are running so no shutdown is required. As an added benefit you balance the network traffic over multiple NICs.

10Gbps with NIC teaming and VLANs provide for some very nice scenarios. This is especially true especially if you have bandwidth hungry applications running in boatload of VMs. This all means that we need to start thinking and talking about integrating the 10Gbps switches in our network infrastructure. So that means we’re entering the network engineers their turf and we’ll need to address some of their concerns. But this is not bad news as they’ll help us prevent some bad scenarios. But that will be discussed in a next blog post.

Introduction

In this post we continue along the train of thought we set in a previous blog post “Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)”. Let’s say you want to set up a Hyper-V cluster for SQL Server virtualization. Your business & IT manager told you the need to provide them with the best performance you can get. They follow up on that statement with a real budget so you can buy high end servers (blades or rack) and spec them out optimally for SQL Server. You take into consideration NUMA issues, vCPU:pCPU ratios, SQL memory demands, the current 4 vCPU limit in hyper-V, etc. By the way, this will be > 16vCPU with Windows Server 8, which leads me to believe the 64GB memory ceiling for virtual machines will also be broken. But for now this means that with regard to CPU & memory you’ve done all you can. That leaves only networking and IO to deal with. Now the IO is food for another & very extensive discussion, but basically you have to design that around the needs of the application(s) or you’ll be toast. The network part is what we’ll tackle here.

Without going into details, what does a Hyper-V cluster need in terms of networking?

Mostly idle. When Live Migrating it demands high bandwidth & low latency required.

Private Cluster Network

Host Management: It is fine to leave this on 1Gbps, unless you have a need to deploy massive amounts of VMs or you backups are consuming all bandwidth. If so consider dedicated NICs for those roles and/or 10Gbps. Also note that you might be able to leverage your SAN for virtual machine deployment / backups.

VM Network: Use multiple “single” NICs or NIC teams to spread both the load and the risk. Remember that you can lose the host management or CSV network of a node, without affecting your virtual machine connectivity but not the virtual machine network(s). So don’t put all your eggs in one basket. So do consider multiple NICs and NIC teaming. Do remember that there are other bottle necks than bandwidth to a virtual machine running apps so don’t go completely overboard as there is no single magic bullet here for virtual machine performance. 2 or 3 will do perfectly fine. What about backups in the guest? Yes, that’s an extra burden but there are better solutions than that and if you hit and bandwidth issue with guest based backups it’s time to investigate them seriously. As you will see in these series I’m not a mincer with NIC ports but there’s no need to have one for every 2 Virtual machines. If you have really high bandwidth needs consider 10Gbps, not a truck load of NIC ports.

Heartbeat: Due to the mostly moderate needs it is often combined with the CSV traffic.

Cluster Shared Volume (CSV): Well you have the need for metadata of the clustered shared volumes. But that’s not all. You also have redirected access when you’re doing backups, defragmenting your CSV storage or when the storage paths are unavailable. So go for 10Gbps when you can, especially since this is your backup path for Live Migration traffic!

Side Note: Don’t say that Redirected Access over the CSV network will never happen when you have redundant storage paths. We’ve seen it happen in an environment with dual FC HBA cards, dual SAN controllers and the works. Redirected Access saved our service availability during that event! What happened exactly and how it all ties together is a long story and complicated but in essence an arbitrated loop management module when haywire and caused a loop, the root cause of this was a defective disk. When that event was over one of the controllers went nuts and decided this wasn’t his cup of tea and called it a day. Guess what? Some servers could not failover to the other controller as something went wrong in the internal workings of the SAN itself, dual HBA didn’t help here. How did our services stay available? Thanks to Redirected Access. It was at 1Gbps speeds so that hurt a little but we kept ‘m running. Our vendor worked through this with us but things where pretty bad and it was pucker time. However this is one example where we kept our services running for 24 hours (whilst working at the issue with the vendor) via redirected access. The bad thing was we needed to take the spare controller of line & restart both to get the replacement controller to be recognized, yes a complete shutdown of the cluster nodes to restart both SAN controllers. I still remember the mail I send and the call I made to management that is was shutting down the business for 30 minutes. But it was not because of Hyper-V, quite the opposite; it helped us out a lot!

Live Migration: The bigger and better the pipe the faster Live Migration gets done. With high density or resource (memory) intensive servers this becomes a lot more important. Think of SQL Server, Exchange consuming 16, 24, 32 or more GB of memory. So do consider 10Gbps.

iSCSI: As we are using Fiber Channel in our SAN we did not include iSCSI in the networking needs table above. Now I do want to draw your attention to the need for iSCSI in the virtual machines themselves. This is needed for clustering within the virtual machines. Today this is almost a requirement as clustering in the guest becomes more and more important. You’ll need at least two NIC ports in production for this, if possible in on two separate cards for ultimate redundancy. Now as a best practice we won’t share the iSCSI NICS between the hosts and the guests. I do this in the lab but won’t have it in production. So that could mean at least two more NIC ports. With 10Gbps you’ll have ample performance but depending on your IO needs you might want 4 if you’re using 1Gbps so those NIC numbers are rising fast.

What

Function

Traffic

Connection Type

iSCSI Guest

Virtual machine shared storage.

High bandwidth need, low latency is required to get good I/O

Dedicated to Hyper-V

iSCSI host

Host shared storage

High bandwidth need, low latency is required to get good I/O

Excluded from cluster, dedicated to the host.

What to move to 10Gbps?

Cool, you think, let’s throw some 10Gbps NICs & switches into our network. After that, depending on the rest of your network equipment & components, your virtual machines might be able to talk to other virtual and physical servers on the network at speeds up to 10Gbps or at least 1Gbps. I kind of hope that none of you are running 100 Mbps in your server racks today. And last but not least, with your 10Gbps network you’ll be able to do get the best performance for your CSV and Live Migration traffic. Life is good!

Until your network engineer hears about your plans. All of a sudden it’s no so cool anymore. You certainly woke the network people up! They’re nervous now they have seen all the double (redundancy) lines you’ve drawn on your copy of the schema representing the rack / server room network. They start mumbling things about redundancy, loops, RSTP, MSTP, LAG, stacking and a boatload other acronyms that sound like you’ve heard ‘m before but can’t quite place. They also talk about doom and gloom scenarios that might very well bring down the network. So unless you are the network admin you should dust of your communication skills and get them on board. So for your sake I hope they’re not the kind of engineers that states that most network problems that can’t be solved by removing servers and applications that ruin the nirvana of their network design. If so they’ll be vary weary of that “virtual switch” you’re talking about as well.

The Easy Way Out – A Dedicated CSV & Live Migration Network

Let’s say that you need a lot more time to get to a fully integrated solution for the 10Gbps network architecture figured out and set up. But your manager states you need to improve the Live Migration and other cluster network speeds today. What are your options? Based on the above information your boss is right, the networks that will benefit the most from a move to 10Gbps are CSV and Live Migration (and Heart Beat that piggy backs along with CSV). Now you have to remember that those cluster networks (subnets/VLANs) are for the Heart Beat, CSV and Live Migration cluster traffic only. So basically the only requirement you have is that these run on separate subnets/VLANs (to present them as distinct networks to your failover cluster) and that every node of the cluster can communicate over those subnets/VLANs. This means that you can leave the switches for those networks completely isolated from the rest of the network as shown in the picture below. I used some very common and often used DELL PowerConnect switches (5424, 6248, 8024F) in some scenario drawings for this blog series. They could make that 8024F an unbeatable price/quality deal if they would make them stackable. The sweet thing about stackable switches is that you can do Active-Active NIC teaming across switches rather than active-passive. I never went that way as I’m waiting to see what virtual switch innovations Hyper-V 3.0 will bring us. You see I’m a little cheap after all

Suddenly things are cool again. The network people get time to figure out an integrated & complete long term solution and you can provide you nodes with 10Gbps for cluster only traffic. By a couple of 10Gbps switches & NICs and you’re on your way. Is this a good idea? I can’t make that call for you. I just provide some ideas. You decide.

The Case For Physically Isolating Them

Now you might wonder if this isn’t very wasteful in resources. Well not necessarily. If your cluster is big enough, let’s say 12-16 nodes or if you have a couple of clusters (4 clusters with 5 nodes for example) this might be not overly expensive. Unless you’re on a converged network, you do (I hope) the same for your storage networks, isolate them that is. You have to when you’re using fiber and you’d better do it when using iSCSI. It provides for the best performance and less complex switch configurations. Remember I mentioned that high availability requires some complexity. Try to keep that complexity as low as possible and when you introduce complexity make sure you can manage it. This serves two purposes. One is making sure that the complexity doesn’t ruin you high availability and two is that you’ll be happy you did it when it comes to trouble shooting and fixing issues. Now you might say that this ruins the concept of converged networks. Academically this is true but when you are filling up ports on switches for a single purpose there is no room for anything else anyway. Don’t lose sight of the aim of a converged network. That is to have the ability to use the same hardware/technology when possible for multiple needs. This gives you options and capabilities where and when needed. It’s not about always using all technology and protocols on each and every switch. Don’t forget also that you’ll need to address QOS/Performance on a converged network per type of traffic. There is also the fact that in brown field scenario’s you’re dealing with replacing a part of the infrastructure and this example is a good way to get 10Gbps where needed and not making any change on the existing network infrastructure. This reduces risk and impact. As a matter effect if you plan this right you can do this without service interruption. That means going node by node (maintenance mode, evacuate all VMs), moving the CSV network first for example and only then the Live Migration network. You’re leveraging the ability of the cluster networks to take on each other’s role here to achieve this.

Another good reason to physically isolate the networks is for security. There was an exploit for manipulating VMs during live migrations in 2008 (http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-539-07.pdf). You can protect against this via very careful switch configuration and VLAN design. But isolating the switches is very easy, clean and effective as well. Overkill? I don’t know, but perhaps not if you do works for intelligence agencies.

Ethernet Out-of-Band (OOB) Port For Management

Don’t forget you still need to be able to manage those switches but today, in this class of equipment you get an Ethernet Out-of-Band (OOB) port for that. This one you can safely uplink to your regular management network. So if you really don’t need communication with the rest of the network you have no functional reason not to isolate them.

Money, Cost? No Value!

Still you think, isn’t this very expensive? Well look at the purpose. Manageable complexity, high availability and your management stated to eliminate, where possible, any limitation on performance and approved the budget for it all. Put this into perspective. The SQL Server data center editions running on these clusters, combined with the cost of development & maintenance of the databases and applications relying on this infrastructure put those extra € spent on a couple of switches really into perspective. On top of that you’re not wasting those switches. When the network people get their plans finished they’ll be integrated into the final solution if still needed and possible. Don’t forget that you might use all ports for just cluster traffic depending on the number of hosts you have! So even without integrating them into the rest of the network, you’re still getting very solid results. On top of that, sometimes you get to build solutions where budget is not the first, last and only concern. Sweet! I do know some people who’ll call me a money wasting nut case J. But get real, when you’re building high available, highly performing failover clusters and you’re in a discussion about the cost of a couple of NIC ports and you are going to adjust your design over that, perhaps you have a sponsorship issue. Put in into perspective. Hyper-V cluster are not a competition where the one who uses the least NIC ports/cards and switch ports/ switches wins. That’s why it hurts when I see designs like this claiming victory:

What I want to see is more like this:

But that will never fit into a blade design! Really? Have you seen the blades like the DELL M910? It’s a beast, comparable to the R810. It’s was the first blade I really felt like buying. Cisco also entered that market with guns drawn and is pushing HP to keep performing. So Again put the NIC/Switch and NIC port : Switch Port count into perspective against what you’re trying to achieve. To quote Anton Ego “… you know what I’m craving? A little perspective, that’s it. I’d like some fresh, clear, well-seasoned perspective.”

Well I’ve just finished doing the paperwork for attending the Experts 2 Experts conference in London http://www.pubforum.info/pubforum/E2E2011London.aspx. It runs from 18th to 20th November 2011. I’m looking forward to this one as I’m going to meet up with a lot of people from my on line network and have a change to discuss our virtualization experiences and share information in real life, face to face.

It’s good to get to attend vendor independent events and exchange information, enrich and extend our networks. I already know several people from my twitter/blogging network will be attending and I’m happy to meet up with you if you’re there. Just let me know via e-mail, the feedback option on this blog or via twitter (@workinghardinit). Well, I’ll see you there!