XenServer's LUN scalability

An important consideration when planning a deployment of VMs on XenServer is around the sizing of your storage repositories (SRs). The question above is one I often hear. Is the performance acceptable if you have more than a handful of VMs in a single SR? And will some VMs perform well while others suffer?

In the past, XenServer's SRs didn't always scale too well, so it was not always advisable to cram too many VMs into a single LUN. But all that changed in XenServer 6.2, allowing excellent scalability up to very large numbers of VMs. And the subsequent 6.5 release made things even better.

The following graph shows the total throughput enjoyed by varying numbers of VMs doing I/O to their VDIs in parallel, where all VDIs are in a single SR.

In XenServer 6.1 (blue line), a single VM would experience modest 240 MB/s. But, counter-intuitively, adding more VMs to the same SR would cause the total to fall, reaching a low point around 20 VMs achieving a total of only 30 MB/s – an average of only 1.5 MB/s each!

On the other hand, in XenServer 6.5 (red line), a single VM achieves 600 MB/s, and it only requires three or four VMs to max out the LUN's capabilities at 820 MB/s. Crucially,adding further VMs no longer causes the total throughput to fall, but remains constant at the maximum rate.

And how well distributed was the available throughput? Even with 100 VMs, the available throughput was spread very evenly -- on XenServer 6.5 with 100 VMs in a LUN, the highest average throughput achieved by a single VM was only 2% greater than the lowest. The following graph shows how consistently the available throughput is distributed amongst the VMs in each case:

Specifics

Host: Dell R720 (2 x Xeon E5-2620 v2 @ 2.1 GHz, 64 GB RAM)

SR: Hardware HBA using FibreChannel to a single LUN on a Pure Storage 420 SAN

VMs: Debian 6.0 32-bit

I/O pattern in each VM: 4 MB sequential reads (O_DIRECT, queue-depth 1, single thread). The graph above has a similar shape for smaller block sizes and for writes.

About the author

Jonathan is a Principal Software Engineer at Citrix where he is the lead engineer for XenServer's Performance Team. This team has oversight of the performance and scalability of all aspects of XenServer.

Comments
6

Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is particularly important when planning for projects involving large numbers of VMs, such as for XenDesktop configurations. One thing to obviously keep in mind is that at some point, the LUN itself will run out of the ability to properly handle the increased I/O load, and then of course, it's a good time to create a separate LUN using totally different disks to handle the expansion. It is also important that different types of storage (iSCSI, NFS, etc.) and the underlying disks (SAS vs. SATA, and spinning disk vs. SSD, etc.) and of course the configuration (RAID type, number of disks in the RAID configuration, any cache or SDS options, etc.) will all ultimately dictate what the limits might be for an individual configuration. Running tests to evaluate one's specific configurations will always be a good idea and naturally, the XenServer dom0 settings also might need some changes at some point.

0

Very nice, Jonathan, and it is always good to raise discussions about standards that are known to change over time. This is particularly important when planning for projects involving large numbers of VMs, such as for XenDesktop configurations. One thing to obviously keep in mind is that at some point, the LUN itself will run out of the ability to properly handle the increased I/O load, and then of course, it's a good time to create a separate LUN using totally different disks to handle the expansion. It is also important that different types of storage (iSCSI, NFS, etc.) and the underlying disks (SAS vs. SATA, and spinning disk vs. SSD, etc.) and of course the configuration (RAID type, number of disks in the RAID configuration, any cache or SDS options, etc.) will all ultimately dictate what the limits might be for an individual configuration. Running tests to evaluate one's specific configurations will always be a good idea and naturally, the XenServer dom0 settings also might need some changes at some point.

Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (port). The number of I/O requests per port that can be handled will greatly affect performance. Storage ports will have vastly different queue depths, anywhere from typically a few hundred queues per port to a several thousand. The number of initiators a single port can support depends on the number of available queues. A typical LUN queue depth is 32 to 64, so you can either balance this with a small number of initiators and many LUNs or a large number of LUNs and a small number of initiators (such as an HBA or iSCSI connection). No matter what the combination is, exceeding the queue depth will rapidly result in degraded performance.
In this particular case, perhaps this limit was not exceeded, hence there was no evidence of degraded performance?

0

Indeed, depending on the specific characteristics of each storage array there will be some maximum queue depth per connection (port). The number of I/O requests per port that can be handled will greatly affect performance. Storage ports will have vastly different queue depths, anywhere from typically a few hundred queues per port to a several thousand. The number of initiators a single port can support depends on the number of available queues. A typical LUN queue depth is 32 to 64, so you can either balance this with a small number of initiators and many LUNs or a large number of LUNs and a small number of initiators (such as an HBA or iSCSI connection). No matter what the combination is, exceeding the queue depth will rapidly result in degraded performance.
In this particular case, perhaps this limit was not exceeded, hence there was no evidence of degraded performance?

Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And now that XenServer doesn't get in the way and impose an additional scalability limitation, it's the only remaining consideration.

0

Thanks for your comments, Tobias and John. You're absolutely right -- the LUN's capabilities are an important consideration. And now that XenServer doesn't get in the way and impose an additional scalability limitation, it's the only remaining consideration.

What utility can I use to see VM throughput in my own environment like the graph you have above? I am running a XenServer 6.5 pool with 10 hosts and approximately 120 Windows 2008 R2 VMs. All of the VMs live in one SR, but am curious is having more than one SR in the pool would provide a performance benefit.

0

What utility can I use to see VM throughput in my own environment like the graph you have above? I am running a XenServer 6.5 pool with 10 hosts and approximately 120 Windows 2008 R2 VMs. All of the VMs live in one SR, but am curious is having more than one SR in the pool would provide a performance benefit.

Testimonial

"Our job is to accommodate all the faculties’ needs as much as possible so we needed to find a solution that could support a large number of applications as well as save storage space and staff resources. This is where Citrix stepped in."

Jose ChanHead of IT DepartmentMacau Polytechnic Institute

"Virtual machines are part of the Grupo Martins IT management culture because the time it takes to create one with XenServer is about 20 minutes."

Flavio Lucio Borges Martins da SilvaCIOGrupo Martins

"We expect that total costs for server infrastructure will be reduced by more than 35% because of XenServer."