Random thoughts and technical bits

How does storage multipathing work?

Every week I spend some time answering questions on the vmware forums. It also provides me great idea’s for blog posts just like this one. It started with a simple question how does multipathing work? Along with a lot of well thought out specific questions. I tried to answer the questions but figured it would be best with some diagrams and a blog post. I will focus this post on fiber channel multipathing. First it’s important to understand that Fiber channel is nothing more than L2 communication using frames to push scsi commands. Fiber channel switches are tuned to pass scsi packets as past as possible.

Types of Arrays

There are really three types connectivity with fiber channel (FC) arrays

Active/Active – I/O can be sent to a LUN via any of the arrays storage processors (SP) and port. Normally this is implemented in larger arrays with lots of cache. Writes are sent to the cache then destaged to disk. Since everything is delivered to cache SP and port does not matter.

Active/Passive – I/O is sent down to a single SP and port that owns the LUN. If I/O is send down any other path it is denied by array.

Pseudo Active/Active – I/O can be sent down any SP and port but there is a SP and port combination that owns the LUN. Traffic send to the owner of the LUN is much faster than traffic sent to non-owners.

The most common implementation of pseudo active/active is asymmetric logical unit access (AULA) defined in the SCSI-3 protocol. In AULA the SP identifies the owner of a LUN with SCSI sense codes.

Access States

AULA has a few possible access states for any SP port combination:

Active/Optimized (AO) – this is the SP and port that owns the lun best possible path to use for performance

Active/Non-Optimized (ANO) – this is a SP and port that can be used to access a lun but it’s slower than the AO

Transitioning – this lun is changing from one state to another and not available for IO – Not used in most AULA now

Standby – Not active but available – Not used in most AULA now

Unavailable – SP and port not available

In a active/active array the following states exist:

Active – All SP and ports should be this state.

Unavailable – SP and port not available

In a active/passive array the following states exist:

Active – SP and port to access the lun (single owner)

Standby – SP and port available is active is gone

Transitioning – Switch to Active or Standby

In AULA arrays you also have Target port groups (TPG) which are SP and ports that have a similar state. For example all the ports on a single SP may be a TPG since the LUN is owned by the SP.

How does your host know what the state is?

Great question. Using SCSI commands a host and array communicate state. There are lots of commands in the standard. I will show three management commands from AULA array’s since they are the most interesting:

Inquiry – Ask a scsi question

Report Target port – Reports what TPG has the optimized path

Set Target port group – ask the array to switch the target port group ownership

This brings up some fun scenario’s who can initiate these commands and when… All of these will use a AULA array

Setup:

So we have a server with two HBA’s connected to san switches. In turn the SP’s are connected to the san switches. SPa owns LUN1 via AO and SPb owns LUN2 via AO.

Consider the following failures:

HBA1 fails – assuming the pathing software on the OS is set correctly (more on this later) The operating system access LUN1 via ANO path to SPb to continue to access storage. Then it initiates a set target group command to SPb asking it to take over LUN1. Which is fulfilled and the array sends out a report target port groups to all known systems that they should use SPb for access to LUN1 for AO.

SPa fails – assuming the pathing in OS is good. Access to LUN1 fails via SPa and the OS fails over the SPb and initiates the LUN fail over.

This is designed just to show the interaction in a real environment you would want san switch a and b both connected to SPa and SPb if possible for redundancy.

How does ESXi deal with paths?

ESXi has three possible path states:

Active

Standby

Dead – cable unplug, bad connection / switch

It will always try to access to the lun via any path available.

Why does path selection policy matter?

The path selection policy can make a huge difference. For example if you have a AULA array you would not use the round robin path selection policy. Doing this would cause at least half your I/O’s to go down the ANO path which would be slow. ESXi supports three policies out of the box:

Fixed – Honors the AO path until available most commonly used with AULA arrays

Most recently used (MRU) – Ignores the prefered path and uses the most recently used path until it’s dead (used in active/passive arrays)

Round Robin (RR) – sends a fixed number or I/O’s / bytes down a path then switches to next path. Ignores AO. Used normally with active/active arrays

The number of I/O’s or bytes sent before switching in RR can be configured but defaults to 1000 io’s and 10485760 bytes.

Which path should you use? That depends on your storage array and you should work with your vendor to understand their best practices. In addition a number of vendors have their own multipath systems that you should use (for example EMC’s powerpath).

About Author

Joseph Griffiths is a virtualization focused solutions architect who works with complex cloud based solutions. He currently holds many IT certifications including VMware VCDX-DCV and VCDX-CMA #143. This blog represents his random technical notes and thoughts. The thoughts expressed here do not reflect Joseph’s current employer in anyway. You can follow Joseph on Twitter @Gortees