Random thoughts and technical bits

Design Scenario: Gigabit network and iSCSI ESXi 5.x

Many months ago I posted some design tips on the VMware forums (I am Gortee there if you are wondering). Today a user updated the thread with a new scenario looking for some advise. While it would be a bad idea personally and professionally for me to give specific advise without a design engagement I thought I might provide some thoughts about the scenario here. This will allow me to justify some design choices I might make in the situation. In no way should this be taken as law. In reality everyone situation is different and little requirements can really change the design. The original post is here.

The scenario provided was the following:

3 ESXI hosts (2xDell R620,1xDell R720) each with 3×4 port NICS (12 ports total), 64GB RAM. (Wish I would have put more on them ;-))

2 x Dell 5424 switches dedicated for traffic between the MD3200i and the 3 Hosts

Each host is connected to the iSCSI network though 4 dedicated NIC Ports across two different cards

I personally have never used this array model, the vendor should be included on the design to make sure none of my suggestions here are not valid with this storage system. Looking at the VMware HCL we learn the following:

From my limited understanding the array the cabling follows the best practice guide I could find.

Connection from the ESXi hosts to switches are done to create as much redundancy as possible including all available cards. It is critical that the storage be as redundant as possible.

Each uplink (physical nic) should be configured to connect to an individual vmkernel port group. Each port group should be configured with only one uplink.

Physical switches and port groups should be configured to use native port assuming these switches don’t so anything other than provide storage traffic between these four devices (three ESXi and one array) if the array and switch is providing storage to more things you should follow your vendor’s best practices for segmenting traffic.

Port binding for iSCSI should be configured as per VMware document and vendor documents

New design considerations from storage:

4 1GB’s will be used to represent max traffic the system will provide

The array does not support 5.5 U1 yet so don’t upgrade

We have some VAAI natives to help speed up processes and avoid SCSI locks

Software iSCSI requires that forged transmissions be allowed on the switch

You might want to consider Storage DRS on your array to automatically balance load and IO metrics (requires enterprise plus license but saves so much time) – Also has an impact on CBT backups making them do a full backup.

Hardware iSCSI adapters might also be worth the time… thou they have little real benefit in the 5.x generation of ESXi

Networking

We will assume that we now have 8 total 1GB ports available on each host. We have a current network architecture that looks like this (avoided the question of how many virtual switches):

I may have made mistakes from my reading a few items pop out to me:

vMotion does not have any redundancy which means if that card fails we will have to power off VM’s to move them to another host.

Backup also does not have redundancy which is less of an issue than the vMotion network

All traffic does not have redundant switches creating single points of failure

A few assumptions have to be made:

No single virtual machine will require more than 1Gb of traffic at any time (otherwise we have to be looking into LACP or etherchannel solutions.

Management traffic, vMotion and virtual machine traffic can live on the same switches as long as they are segmented with VLAN’s

Recommended design:

Combine the management switch and VM traffic switch into dual function switches to provide both types of traffic.

This uses vlan tags to include vMotion and management traffic on the same two uplinks providing card redundancy (configured active / passive) Could also be configured with multi-nic vMotion but I would avoid due to complexity around management network starvation in your situation.

Backup continues to have it’s own two adapters to avoid contention

This does require some careful planning and may not be the best possible use of links. I am not sure you need 6 links for your VM traffic but it cannot hurt.

Final Thoughts:

Is any design perfect? Nope lots of room for error and unknowns. Look at the design and let me know what I missed. Tell me how you would have done it differently… share so we can both learn. Either way I hope it helps.

Post navigation

13 thoughts on “Design Scenario: Gigabit network and iSCSI ESXi 5.x”

I have a few questions and comments, I have a similar config 🙂 In my environment I have 4 nic ports connecting to the ESXi Hosts, 2 from Nic 0, and 2 from Nic 1 (Ports 1 and 2 from each) but I am not sure about how the ports on connected from the MD3200i (Active / Passive). Second I am already on ESXi 5.5 U1  But the latency issues I have been having were there long before the upgrade from 5.1.

Under New design considerations from storage:
“We have some VAAI natives to help speed up processes and avoid SCSI locks” Not sure what/how to implement these?
Already running 5.5 U1  now what?
Software iSCSI, can’t seem to find anything about Forged transmissions on the Dell 5424 switches.

Advise to speed up iSCSI storage:
“Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.” Can you elaborate???
“You might want to consider Storage DRS on your array to automatically balance load and IO metrics (requires enterprise plus license but saves so much time) – Also has an impact on CBT backups making them do a full backup.” I will consider that, although a call to support stated that DRS probably would not help much. CBT backups? Is that with or without DRS?

These are the points I am most concerned about so far, thanks again for your help and expertise!

Carl,
Thanks for reading.
“We have some VAAI natives to help speed up processes and avoid SCSI locks” Not sure what/how to implement these? – Should be enabled out of the box no work required by you. (click on the datastore to check and look for hardware acceleration: Supported

Already running 5.5 U1  now what? – Your on the latest version that is good.

Software iSCSI, can’t seem to find anything about Forged transmissions on the Dell 5424 switches. – This is a setting on your virtual switches. It should be enabled already by if you enabled software iSCSI in vmware… just something to keep in mind when doing work or moving virtual switches.

“Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.” Can you elaborate???
->Spelling error find your bottleneck. Basically you reported having some storage issues. Look at your storage ports on the switch and see if they are at 1GB all the time. Look at the storage processors on your array for max CPU. Look at esxtop with the storage commands and see if you see where the problem lies. You have to identify where the problem exists before you can resolve it. My guess it’s the storage processors on your array but you will not know without reviewing the source.

I will consider that, although a call to support stated that DRS probably would not help much.
->Support I assume reviewed logs and have more information than me about the situation. Storage DRS will only help if you have multiple luns to move between. I really doubt the cost is worth it. They money would be better spent on more storage.

One thing to consider is this is just general advise always review and consult your vendors I cannot provide advise to a environment without a chance to review it completely.

More comments, sorry. In our case the backup lan is using one NIC, but also has to have management Kernel as required by the backups system, it uses Management to create snapshots for backups. Do you see that being an issue?

Thank you so very much, the information you provided helps us out a lot! It seems that based on the esxtop review we have a few VM’s that are causing a lot of writes to the data stores and the fact that the MD3200i MAX throughput is 4,000MBs is also a problem, even with 8 1GB connections we seem to bump the top of that 4GB threshold, and according to the Dell support agent we spoke to it is physically impossible for it to achieve the full 4GB, So now I need to review new storage. Agin thank you for your help, this is very useful information!

I am glad you were able to identify the source of the issue… Storage will kill a esxi environment faster than anything else. Remember you don’t have to throw away your current array it can still run some machines while another array provides additional performance.

I know this is an old post but I found your insights to vmware designs very helpful. We are a SMB that’s starting to cross over to the world of Virtualization. I could really use your help on our network design. This is the current equipment we have:

About Author

Joseph Griffiths is a virtualization focused solutions architect who works with complex cloud based solutions. He currently holds many IT certifications including VMware VCDX-DCV and VCDX-CMA #143. This blog represents his random technical notes and thoughts. The thoughts expressed here do not reflect Joseph’s current employer in anyway. You can follow Joseph on Twitter @Gortees