Meilleur auteur de réponses

NIC Team - Live Migration performance

Question

I've got a 2 node Server 2012 Hyper-V cluster, and each node has 6 gigabit NIC ports. 2 are dedicated to iSCSI traffic, but the other 4 are in a Team. When I Live Migrate 20+ VMs, it seems to me that I should be able to achieve at least 2-3Gbps (adjusting
for network overhead and such) transfers. I realize there are a great number of ways to configure a team in Server 2012, and I've experimented with all of them: Static Teaming, Switch Independent, LACP, and each of these can either be Address Hash or Hyper-V
port. I've occasionally seen brief spikes up to 1.5Gbps, but that's it. The rest of the time it's at a very steady 1Gbps.

I've got these servers connected to 2 ProCurve 2910al switches with a 10gb backplane, and the switches are not being taxed with any other workloads. When I use the LACP teaming, I have tried both LACP-enabled trunks and enabling Active LACP on the relevant
ports.

My biggest question is: is this even feasible? Has anyone achieved 2Gbps+ Live Migration speeds without using 10gb NIC cards?

Réponses

Not only can you achieve better than 1 Gbps in live migration, you should be able to approach 4Gbps in your 4 NIC team subject to the following restrictions:

1. The Live Migrations must not all be destined for the same remote machine. Live migration will only use one TCP stream between any pair of hosts. Since both Windows NIC Teaming and the adjacent switch will not spread traffic from a single stream
across multiple interfaces (because it causes potential out-of-order delivery of the TCP packets) live migration between host A and host B, no matter how many VMs you're migrating, will only use one NIC's bandwidth. To get more bandwidth try migrating
your 20 VMs to 5 other hosts (e.g., 4 to each other host) and see how well it does.

2. You must use Address Hash (TCP ports) for the NIC Teaming. HyperVPorts mode will put all the outbound traffic, in this case, on a single NIC.

How do I know this? I'm the PM for NIC Teaming in Windows Networking. All of this is covered, by the way, in the NIC Teaming User's Guide (search for Windows Server 2012 NIC Teaming Deployment and Management). Feedback on the
document is welcome. Email me at don.stanwyck -at- microsoft.com. I'm not usually on this forum.

"In total" I think is the key you were leaving out before. Yes, of course you can hit those speeds (cumulatively). They way you were describing it before sounded like you meant on a single transfer (regardless of how many were going simultaneously). Anyway,
yes.

Frustrating for us, too. I'm with the others in that I interpreted your initial explanation in a totally different manner from what you meant to say. Now we are getting shot at when trying to figure out what might be the issue.

Yes, you can get aggregate throughput greater than the speed of a single NIC. Yes, I have 10 GE connections, but the same principles apply. I get nearly 20 Gbps of throughput while two two or more live migrations.

Are you using SCVMM? If so, have you set it up to allow for multiple live migrations to occur simultaneously? By default, there is a maximum number of live migrations that can occur simultaneously. How do you start the simultaneous LMs?
How many do you see running simultaneously? How are you measuring your throughput?

I would read this (and the surrounding sections) to understand the different teaming methods. In fact, I would read the whole doc (as should anyone who doesn't fully understand teaming). Using address hash (when setup correctly), you should expect to utilize
more than one member in a team when migrating to separate hosts.

Aha! It finally all makes sense. Thanks so much, Don. Now I understand why I've gotten seemingly-contradictory answers: if you've got a lot of nodes, and you simultaneously migrate to all of them, then you can utilize the full bandwidth of your team.
If you're like me, and you've only got 2 nodes, you're never going to see more than 1Gbps.

So, now I have to decide the best way to use my 8 NICs. Instead of my original plan of teaming them all together, I'm now thinking of something like this:

2 for iSCSI
3 in a team for VMs
2 in a team for Live Migration and Heartbeat
1 for management

I really have a 9th NIC for out of band management, so if the 1 for management dies I can still get to the server.

Toutes les réponses

I can't say that I've seen 2Gbps on live migration. But I did find I had to update my NIC drivers before I could get descent speed with the migrations (due to a bug in the RX buffer) What type of NICs are you using and what drivers are being used? I has
using broadcom NICs in a Dell PowerEdge 420

I've got a Dell R310 and an HP DL360p Gen8. For NICs, I've got the latest drivers for all the NICs. For the HP, it's all HP (331T and 331 FLR), and for the Dell it's a mix of Broadcom and HP (BCM5716C and 331T). What speed did you see on your migrations?
Any idea why we can't exceed 1Gbps?

I only have one 1gb nic for the live transfer so I've never seen it any faster, but wasn't even getting that fast until my drivers were updated. Are all four nics in one team? are they all broadcom or hp?

> I've got a 2 node Server 2012 Hyper-V cluster, and each node has 6 gigabit NIC ports.
> 2 are dedicated to iSCSI traffic, but the other 4 are in a Team.

For the network teaming, some network bandwidth is used by virtual machine access and Management.

As best practice, for each node of the failover cluster, use more than one network adapter and configure at least one network adapter for the private network. We recommend that you configure separate dedicated networks with gigabit or faster speed for live
migration traffic and cluster communication, and these networks should be separate from the network used by the management operating system, and from the network used by the virtual machines.

You may set 1 NIC dedicated for Virtual Machine access and Management, Management will use bandwidth capped at 10%. Then use the rest 3 NICs to create a teaming for live migration.

Are you using jumbo frames on the NICs? There is a lot less overhead on jumbo frames for LiveMigration. Of course, the fact that you have CSV, Management, and cluster communications on the same NIC is not a best practice, either. I can't
say that I have ever tried with 1GE NICs, as I'm fortunate enough to have 10GE whenever I want it.

That's really great for you, Tim. I sincerely doubt that either jumbo frames or some piddly Management and cluster communications would take up 3Gbps of my traffic, thus explaining why I'm only getting 1Gbps. My CSV traffic is over my iSCSI network, which
is separate, and has jumbo frames enabled.

Again, I would really like an answer to this question: Can a multiple gigabit NIC Team achieve 2Gbps+ Live Migration speeds without using 10gb NIC cards?

For your scenario, I think the bottleneck is not the 4 NICs live migration network teaming, but the 2 gigabit NICs for iSCSI. Since they can hold maximum 2Gbps data read and write, and I think that is the reason you have 1.5 Gbps live migration speed.

Again, we recommend that configure separate dedicated networks for live migration traffic and cluster communication, and these networks should be separate from the network used by the management operating system, and from the network used by the virtual
machines.

This would almost be funny if it wasn't so important that I get a good answer.

I'll try asking again: Can a REAL WORLD ENVIRONMENT (not a theoretical one) with a 2 or more nodes each having FOUR
(or more) gigabit NIC cards in a Team achieve 2Gbps+ Live Migration speeds (when multiple VMs are migrating simultaneously)?

An answer I'm desperately trying to get would be: "I've actually done this and I see x Gbps transfers." Honestly, if you don't have something similar to that to say, please stop responding
here, because it's not helping. I've got a SANDBOX with absolutely no traffic of any significance in it, so trying to blame this on needing a separate network for cluster communication is just ridiculous.

That's just how TCP works... I think there is some confusion on what you expect the team to actually do. Yeah you can run 1 stream only at 1Gbps, but you can run 3 simultaneous streams now... and don't forget the obvious link failure tolerance advantages
of a team.

That is what it does. Each migration would be a separate stream between the source and destination hosts (that's a very basic explanation). Therefore, if you had 3 migrations going, each one would most likely
choose a different NIC from your team (perfect world) and you would get 3x1Gbps transfers.

OMG, are you all trying to drive me insane??? I've written this about 5 times in the previous posts above, but I'll post it again here, since apparently no one can read anything except the previous post:

Can a REAL WORLD ENVIRONMENT (not a theoretical one, not how it should
work) with a 2 or more nodes each having FOUR (or more) gigabit NIC cards in a Team achieve 2Gbps+ Live Migration speeds
in total when multiple VMs are migrating simultaneously?

"In total" I think is the key you were leaving out before. Yes, of course you can hit those speeds (cumulatively). They way you were describing it before sounded like you meant on a single transfer (regardless of how many were going simultaneously). Anyway,
yes.

Glad to hear it's possible, so now the obvious question is: why am I capped at 1Gbps. Does anyone have any specific configurations that they know to work here? For example, LACP, Hyper-V teams, or Switch Independent Address Hash?

Frustrating for us, too. I'm with the others in that I interpreted your initial explanation in a totally different manner from what you meant to say. Now we are getting shot at when trying to figure out what might be the issue.

Yes, you can get aggregate throughput greater than the speed of a single NIC. Yes, I have 10 GE connections, but the same principles apply. I get nearly 20 Gbps of throughput while two two or more live migrations.

Are you using SCVMM? If so, have you set it up to allow for multiple live migrations to occur simultaneously? By default, there is a maximum number of live migrations that can occur simultaneously. How do you start the simultaneous LMs?
How many do you see running simultaneously? How are you measuring your throughput?

Yes, you can get aggregate throughput greater than the speed of a single NIC. Yes, I have 10 GE connections, but the same principles apply. I get nearly 20 Gbps of throughput while two two or more live migrations.

Tim are you live migration to the same host or to more than one host when you saturate 20 GBit?

I have VMM 2012 SP1, but haven't really used it for much yet. I created the cluster using FCM, and have set the max # of migrations to be 14 using HVM on each node of the cluster. SCVMM detects that maximum number, though, and displays it.

I would read this (and the surrounding sections) to understand the different teaming methods. In fact, I would read the whole doc (as should anyone who doesn't fully understand teaming). Using address hash (when setup correctly), you should expect to utilize
more than one member in a team when migrating to separate hosts.

As I understand you have a two node Cluster => so it is working you get 1 GBit/s live Migration. Only with tree ore more nodes you can actual get more than 1 GBit live Migration. As I said bevore it is no magic like SMB3.

Not only can you achieve better than 1 Gbps in live migration, you should be able to approach 4Gbps in your 4 NIC team subject to the following restrictions:

1. The Live Migrations must not all be destined for the same remote machine. Live migration will only use one TCP stream between any pair of hosts. Since both Windows NIC Teaming and the adjacent switch will not spread traffic from a single stream
across multiple interfaces (because it causes potential out-of-order delivery of the TCP packets) live migration between host A and host B, no matter how many VMs you're migrating, will only use one NIC's bandwidth. To get more bandwidth try migrating
your 20 VMs to 5 other hosts (e.g., 4 to each other host) and see how well it does.

2. You must use Address Hash (TCP ports) for the NIC Teaming. HyperVPorts mode will put all the outbound traffic, in this case, on a single NIC.

How do I know this? I'm the PM for NIC Teaming in Windows Networking. All of this is covered, by the way, in the NIC Teaming User's Guide (search for Windows Server 2012 NIC Teaming Deployment and Management). Feedback on the
document is welcome. Email me at don.stanwyck -at- microsoft.com. I'm not usually on this forum.

Aha! It finally all makes sense. Thanks so much, Don. Now I understand why I've gotten seemingly-contradictory answers: if you've got a lot of nodes, and you simultaneously migrate to all of them, then you can utilize the full bandwidth of your team.
If you're like me, and you've only got 2 nodes, you're never going to see more than 1Gbps.

So, now I have to decide the best way to use my 8 NICs. Instead of my original plan of teaming them all together, I'm now thinking of something like this:

2 for iSCSI
3 in a team for VMs
2 in a team for Live Migration and Heartbeat
1 for management

I really have a 9th NIC for out of band management, so if the 1 for management dies I can still get to the server.

Why would Live Migration be restricted to a single TCP stream between two hosts?

In my reading, I missed that, and was working of the incorrect interpretation of 'simultaneous migrations' as being separate processes (and include simultaneous, multiple TCP streams.)

I am in the same sort of configuration as the original poster- small shop, and transfers from one host would very likely all be headed for the same physical host, maybe two different ones at best. So HyperV being able to use multiple TCP streams for live
migrations would be a *significant* benefit for us.

I think there are two improvements in R2: compression and SMB 3, but it sounds like they don't work in conjunction - you pick one or the other, and in small shops (such as ours), compression is going to yield the most benefit. For some bizarre reason this
forum isn't allowing me to post links so I'll have to get creative about the link I read on this. It's on aidanfinn dot com, then add /?p=14907 to the end.