iSCSI and ESXi: multipathing and jumbo frames

For my home lab I have decided to use ESXi running from a 2 GB USB flash disk without any local storage. The main reason is that now the ESXi host does not produce as much noise and heat as before. VMware claims that they will retire ESX classic in the future so I take it as an opportunity to learn and be prepared for the switch to service-consoleless hypervisor.

For the shared storage I am currently using Openfiler with iSCSI protocol. I am dedicating pair of 1Gb NICs on the ESXi host and on the Openfiler server for the iSCSI traffic. To get advantage of all the available bandwidth not so trivial set up is need which I am going to describe below.

iSCSI multipathing

In order to be able to use iSCSI multipathing in vSphere, we need to create two VMkernel ports, bind them to two different uplinks and attach them to software iSCSI HBA.
My ESXi host has 4 NICs. Two are assigned to vSwitch0 which has Management VM Port group and three VMkernel ports. One for Management and two for iSCSI. Following picture shows vSwitch0 in the networking tab of the vSphere Client:

The management traffic in untagged, iSCSI traffic is on VLAN 1000. As I also wanted to use jumbo frames (yes, ESXi supports jumbo frames, despite official documentation claiming for a long time otherwise), I had to create the VMkernel ports from CLI. The binding of the iSCSI VMkernel ports to sw iSCSI HBA must also be done from CLI. ESXi does not have service console, therefore first step is to install vMA (VMware Management Assistant) which replaces the service console.

VMware Management Assistant (vMA)

The VMware approach to service-console-less hypervisor is based on following rationale. If we have many ESX hosts in a datacenter each has its own service console, which needs to be maintained, patched and consumes host resources. The patching often requires host restarts, which means we have to vMotion workloads or accept downtime. vMA basicaly offers the same functionality as service console, is detached from the host (in fact can run virtual or physical) and can control many hosts. vMA comes as a 560 MB OVF package that can be downloaded from VMware website. I deployed it in the VMware Workstation running on my laptop. It is Red Hat Enterprise Linux 5 VM which takes about 2.5 GB hard drive space. The setup is quite straightforward with network and password questions.

There are various ways how to connect vMA to ESXi host. I decided to use vSphere FastPass. First we define the connection and then we can initialize it anytime with one command.

VMkernel port bindings

Now we have to add different uplink to each VMkernel iSCSI port. This can be done through the vSphere Client interface. Go to ESXi host Configuration tab, Networking, vSwitch0 properties select iSCSI1 or iSCSI2 port, click edit and in the NIC Teaming tab make only one vmnic active and the other unused. Following pictures show the result:

Next we have to bind the vmk ports to software iSCSI HBA. First find the number of your iSCSI software adapter in host Configuration, Storage Adapters. In my case it was vmhba39. Then run in vMA:

In order to successfully use jumbo frames the whole network infrastructure from vmk port to the Openfiler NIC port must support jumbo frames. It means all the virtual and physical NICs and virtual and physical switches must support jumbo frames. Therefore we must enable jumbo frames on vSwitch0 by running:

esxcfg-vswitch –mtu 9000 vSwitch0

To check if jumbo frames work we can generate vmkping of the right size. This must be run from the busybox console of the ESXi host. To get there press F1 on the actual physical server, type unsupported and log in.

vmkping –s 9000 <openfiler IP address>

Multipathing

Now we can set up the iSCSI HBA on ESXi host. Discover the targets and see how the host findsall the available paths. In my case I had two targets, each presenting two LUNs with two paths. Altogether we can see eight different paths.

We can check if the load balancing works correctly by generating storage traffic and monitoring the vmk ports usage. This can be done for example by creating Fault Tolerance compatible virtual hard drive (this creates eager zero thick drive which writes zeros to all disk blocks) and running resxtop (equivalent of esxtop) from vMA. Press ‘N’ key (network) and see if you get similar numbers in the vmk1 and vmk2 rows.

Openfiler setup

I also created NIC team with jumbo frames on my Openfiler server. For some reason I was not able to do it from the GUI, so here are the commands I run. The NIC team (bond0) is created from eth1 and eth2. The link aggregation is 802.3ad with xmit_hash policy layer2+3. The NICs are connected to switch with LACP enabled port group.

I think I was using version 2.3 at the time of the post. I see that the current Openfiler version is 2.99 but I have not tested it. I am not using Openfiler any more as I prefer VSA appliances from Falconstor or HP LeftHand.

I have also only used Openfiler 2.3, so it would be interesting to see what is new in 2.9. I will assume that for example HP Lefthand would be much better in an production environment of course, however for a test or lab Openfiler performs good. Do HP or Falconstor have any “free for lab” options or only short term evaluation licenses?

Not only HP and Falconstor offer trial versions (with replications, snaphosts and other enterprise features) but they also provide free limited editions of their virtual storage appliances. HP Lefthand VSA (now called Storageworks V4000) is the only free VSA I know of that provides VAAI integration with VMware vSphere (see my other post: http://wp.me/pG062-2D). It can also be used to create VMware Site Recovery Manager lab environment in a box.

Just thought I’d say thanks, this is by far the most concise and straightforward article on getting ESXi to talk mpio with jumbo frames I’ve seen. Maybe that’s because I’ve managed to wrap my mind around it now or something, but… either way, I appreciate the thorough explanation.

I did one minor change to this – I created two virtual switches instead and put one vmnic per vswitch. Apparently there is no best practices to choose between that and having just one vswitch with one vmnic assigned to each vmk – it just seems clearer to me, vswitch1 has vmk1 and vmnic2, and vswitch2 has vmk2 and vmnic3, round-robining away.

One thing you don’t cover is how much traffic the round robin thing sends across each line – that’s tunable, as I found in another very useful post elsewhere:

This is a great post and was very helpful understanding the multipathing setup for vSphere. I have been using openfiler for a couple of years now and continue to learn it’s subtleties and ways to make it more efficient. I was wondering though if you could point out where the free version of both the HP software and Falconstor could be found. I have located the trial versions but not the free versions and I would like to put them through the paces and implement them if possible in my test lab.

Thanks for writing this up. I followed these instructions (although I’m using OF 2.99.2 and had no problem creating the bond interface via the GUI), and everything *appears* to be set up correctly — at least ESXi shows two paths to the iSCSI target — but I’m noticing something rather odd:

Both the OF server *and* the ESXi server are showing traffic being evenly-divided between interfaces when data is TRANSMITTED from itself, but is showing all of the RECEIVED data on one interface (the first).

For example, after booting up two Ubuntu guests on ESXi, I see the following on OF:

…so ESXi interleaves its iSCSI write requests between the two interfaces, but the OF box only sees them come in on the first interface. Likewise, when ESXi requests data to be read from the volume, OF sends the data back to ESXi that it reads interleaved over its two interfaces, but the ESXi box only sees all of that data come into it via the first interface.

One thought that I had was that perhaps it would be better to not bond the interfaces on the OF server, but give them two separate IP addresses as well? The iSCSI target name advertised by OF would be identical, so hopefully ESXi would recognize that it is the same target even though it is being seen from two separate IP addresses.

Another possibility that crossed my mind is that perhaps this is an “optical illusion” of sorts, and that it is working just fine. I’m not sure if there is a good way to benchmark this, though.

Both servers are plugged into a Dell 5324. The two OF interfaces are in a LACP channel-group together, and the two ESXi interfaces are in a static (non-LACP) channel-group together.

That’s because NIC bonding/pairing only works for failover not for loadbalancing ISCSI, period.
You need MPIO (Multipath I/O) on both sides if you want to use multiple NICs for loadbalancing.

ISCSI isn’t a random data generator and expects a certain order to sessions/streams/commands received/sent.
Anything other than MPIO will always only use max bandwidth of 1 NIC because ISCSI first chooses a network path and then starts a session over it..
The session can not be split up over multiple NICs otherwise ISCSI might receive commands/data out of order because it is not aware the data might be coming in over different NICs with different latencies etc..
MPIO sits below the network and thus takes care ordering/loadbalancing before it gets put out on the wire..

This setup might be of some benefit if multiple initiators (vmware machines) connect to the target where each initiator might end up being routed to target over a different NIC.

Imho this article was probably glued together from pieces of info out there on the internet without the author actually understanding what he is doing..
The fact that no thruput tests were done kinda says enough.