With storage providers adding better functionality to provide features like QoS, fast snapshot & clone and with the advent of storage-as-a-service, we are interested in the ability to utilize these features from XenServer. VMware’s VVols offering already allows integration of vendor provided storage features into their hypervisor. Since most storage allows operations at the granularity of a LUN, the idea is to have a one-to-one mapping between a LUN on the backend and a virtual disk (VDI) on the hypervisor. In this post we are going to talk about the supplemental pack that we have developed in order to enable VDI-per-LUN.

Xenserver Storage

To understand the supplemental pack, it is useful to first review how XenServer storage works. In XenServer, a storage repository (SR) is a top-level entity which acts as a pool for storing VDIs which appear to the VMs as virtual disks. XenServer provides different types of SRs (File, NFS, Local, iSCSI). In this post we will be looking at iSCSI based SRs as iSCSI is the most popular protocol for remote storage and the supplemental pack we developed is targeted towards iSCSI based SRs. An iSCSI SR uses LVM to store VDIs over logical volumes (hence the type is lvmoiscsi). For instance:

Here c67132ec-0b1f-3a69-0305-6450bfccd790 is the UUID of the SR. Each VDI is represented by a corresponding LV which is of the format VHD-. Some of the LVs have a small size of 8MB. These are snapshots taken on XenServer. There is also a LV named MGT which holds metadata about the SR and the VDIs present in it. Note that all of this is present in an SR which is a LUN on the backend storage.

Now XenServer can attach a LUN at the level of an SR but we want to map a LUN to a single VDI. In order to do that, we restrict an SR to contain a single VDI. Our new SR has the following LVs:

If a snapshot or clone of the LUN is taken on the backend, all the unique identifiers associated with the different entities in the LUN also get cloned and any attempt to attach the LUN back to XenServer will result in an error because of conflicts of unique IDs.

Resignature and supplemental pack

In order for the cloned LUN to be re-attached, we need to resignature the unique IDs present in the LUN. The following IDs need to be resignatured

LVM UUIDs (PV, VG, LV)

VDI UUID

SR metadata in the MGT Logical volume

We at CloudOps have developed an open-source supplemental pack which solves the resignature problem. You can find it here. The supplemental pack adds a new type of SR (relvmoiscsi) and you can use it to resignature your lvmoiscsi SRs. After installing the supplemental pack, you can resignature a clone using the following command

Here, instead of creating a new SR, the supplemental pack re-signatures the provided LUN and detaches it (the error is expected as we don’t actually create an SR). You can see from the error message that the SR has been re-signed successfully. Now the cloned SR can be introduced back to XenServer without any conflicts using the following commands:

This supplemental pack can be used in conjunction with an external orchestrator like CloudStack or OpenStack which can manage both the storage and compute. Working with SolidFire we have implemented this functionality, available in the next release of Apache CloudStack. You can check out a preview of this feature in a screencast here.

With the exciting release of the latest XenServer Dundee beta, the immediate reaction is to download it to give it a whirl to see all the shiny new features (and maybe to find out if your favourite bug has been fixed!). Unfortunately, it's not something that can just be installed, tested and uninstalled like a normal application - you'll need to find yourself a server somewhere you're willing to sacrifice in order to try it out. Unless, of course, you decide to use the power of virtualisation!

XenServer as a VM

Nested virtualisation - running a VM inside another VM - is not something that anyone recommends for production use, or even something that works at all in some cases. However, since Xen has its origins way back before hardware virtualisation became ubiquitous in intel processors, running full PV guests (that don't require any HW extensions) when XenServer is running as a VM actually works very well indeed. So for the purposes of evaluating a new release of XenServer it's actually a really good solution. It's also ideal for trying out many of the Unikernel implementations, such as Mirage or Rump kernels as these are pure PV guests too.

XenServer works very nicely when run on another XenServer, and indeed this is what we use extensively to develop and test our own software. But once again, not everyone has spare capacity to do this. So let's look to some other virtualisation solutions that aren't quite so server focused and that you might well have installed on your own laptop. Enter Oracle's VirtualBox.

VirtualBox, while not as performant a virtualization solution as Xen, is a very capable platform that runs XenServer without any problems. It also has the advantage of being easily installable on your own desktop or laptop. Therefore it's an ideal way to try out these betas of XenServer in a quick and convenient way. It also has some very convenient tools that have been built around it, one of which is Vagrant.

Vagrant

Vagrant is a tool for provisioning and managing virtual machines. It targets several virtualization platforms including VirtualBox, which is what we'll use now to install our XenServer VM. The model is that it takes a pre-installed VM image - what Vagrant calls a 'box' - and some provisioning scripts (using scripts, Salt, Chef, Ansible or others), and sets up the VM in a reproducible way. One of its key benefits is it simplifies management of these boxes, and Hashicorp run a service called Atlas that will host your boxes and metadata associated with them. We have used this service to publish a Vagrant box for the Dundee Beta.

Try the Dundee Beta

Once you have Vagrant installed, trying the Dundee beta is as simple as:

vagrant init xenserver/dundee-betavagrant up

This will download the box image (about 1 Gig) and create a new VM from this box image. As it's booting it will ask which network to bridge onto, which if you want your nested VMs to be available on the network should be a wired network rather than wireless.

The XenServer image is tweaked a little bit to make it easier to access - for example, it will by default DHCP all of the interfaces, which is useful for testing XenServer, but wouldn't be advisable for a real deployment. To connect to your XenServer, we need to find the IP address, so the simplest way of doing this is to ssh in and ask:

So you should be able to connect using one of those IPs via XenCenter or via a browser to download XenCenter (or via any other interface to XenServer).

Going Deeper

Let's now go all Inception and install ourselves a VM within our XenServer VM. Let's assume, for the sake of argument, and because as I'm writing this it's quite true, that we're not running on a Windows machine, nor do we have one handy to run XenCenter on. We'll therefore restrict ourselves to using the CLI.

As mentioned before, HVM VMs are out so we're limited to pure PV guests. Debian Wheezy is a good example of one of these. First, we need to ssh in and become root:

The VM doesn't have any network connection yet, so we'll need to add a VIF. We saw the IP addresses of the network interfaces above, and in my case eth1 corresponds to the bridged network I selected when starting the XenServer VM using Vagrant. So I need the uuid of the network, so I'll list the networks:

A few keystrokes later and we've got ourselves a nice new VM all set up and ready to go.

All of the usual operations will work; start, shutdown, reboot, suspend, checkpoint and even, if you want to set up two XenServer VMs, migration and storage migration. You can experiment with bonding, try multipathed ISCSI, check that alerts are generated, and almost anything else (with the exception of HVM and anything hardware specific such as VGPUs, of course!). It's also an ideal companion to the Docker build environment I blogged about previously, as any new things you might be experimenting with can be easily built using Docker and tested using Vagrant. If anything goes wrong, a 'vagrant destroy' followed by a 'vagrant up' and you've got a completely fresh XenServer install to try again in less than a minute.

The Vagrant box is itself created using Packer, a tool often used to create Vagrant boxes. The configuration for this is available on github, and feedback on this box is very welcome!

In a future blog post, I'll be discussing how to use Vagrant to manage XenServer VMs.

By Tobias Kreidl and Duane Booher

Northern Arizona University, Information Technology Services

Over eight years ago, back in the days of XenServer 5, not a lot of backup and restore options were available, either as commercial products or as freeware, and we quickly came to the realization that data recovery was a vital component to a production environment and hence we needed an affordable and flexible solution. The conclusion at the time was that we might as well build our own, and though the availability of options has grown significantly over the last number of years, we’ve stuck with our own home-grown solution which leverages Citrix XenServer SDK and XenAPI (http://xenserver.org/partners/developing-products-for-xenserver.html). Early versions were created from the contributions of Douglas Pace, Tobias Kreidl and David McArthur. During the last several years, the lion’s share of development has been performed by Duane Booher. This article discusses the latest 3.0 release.

A Bit of History

With the many alternatives now available, one might ask why we have stuck with this rather un-flashy script and CLI-based mechanism. There are clearly numerous reasons. For one, in-house products allow total control over all aspects of their development and support. The financial outlay is all people’s time and since there are no contracts or support fees, it’s very controllable and predictable. We also found from time-to-time that various features were not readily available in other sources we looked at. We also felt early on as an educational institution that we could give back to the community by freely providing the product along with its source code; the most recent version is available via GitHub at https://github.com/NAUbackup/VmBackup for free under the terms of the GNU General Public License. There was a write-up at https://www.citrix.com/blogs/2014/06/03/another-successful-community-xenserver-sdk-project-free-backup-tools-and-scripts-naubackup-restore-v2-0-released/ when the first GitHub version was published. Earlier versions were made available via the Citrix community site (Citrix Developer Network), sometimes referred to as the Citrix Code Share, where community contributions were published for a number of products. When that site was discontinued in 2013, we relocated the distribution to GitHub.

Because we “eat our own dog food,” VMbackup gets extensive and constant testing because we rely on it ourselves as the means to create backups and provide for restores for cases of accidental deletion, unexpected data corruption, or in the event that disaster recovery might be needed. The mechanisms are carefully tested before going into production and we perform frequent tests to ensure the integrity of the backups and that restores really do work. A number of times, we have relied on resorting to recovering from our backups and it has been very reassuring that these have been successful.

What VMbackup Does

Very simply, VMbackup provides a framework for backing up virtual machines (VMs) hosted on XenServer to an external storage device, as well as the means to recover such VMs for whatever reason that might have resulted in loss, be it disaster recovery, restoring an accidentally deleted VM, recovering from data corruption, etc.

The VMbackup distribution consists of a script written in Python and a configuration file. Other than a README document file, that’s it other than the XenServer SDK components which one needs to download separately; see http://xenserver.org/partners/developing-products-for-xenserver.html for details. There is no fancy GUI to become familiar with, and instead, just a few simple things that need to be configured, plus a destination for the backups needs to be made accessible (this is generally an NFS share, though SMB/CIFS will work, as well). Using cron job entries, a single host or an entire pool can be set up to perform periodic backups. Configurations on individual hosts in a pool are needed in that the pool master performs the majority of the work and it can readily change to a different XenServer, while individual host-based instances are also needed when local storage is also made use of, since access to any local SRs can only be performed from each individual XenServer. A cron entry and numerous configuration examples are given in the documentation.

To avoid interruptions of any running VMs, the process of backing up a VM follows these basic steps:

A snapshot of the VM and its storage is made

Using the xe utility vm-export, that snapshot is exported to the target external storage

The snapshot is deleted, freeing up that space

In addition, some VM metadata are collected and saved, which can be very useful in the event a VM needs to be restored. The metadata include:

An additional option is to create a backup of the entire XenServer pool metadata, which is essential in dealing with the aftermath of a major disaster that affects the entire pool. This is accomplished via the “xe pool-dump-database” command.

In the event of errors, there are automatic clean-up procedures in place that will remove any remnants plus make sure that earlier successful backups are not purged beyond the specified number of “good” copies to retain.

There are numerous configuration options that allow to specify which VMs get backed up, how many backup versions are to be retained, whether the backups should be compressed (1) as part of the process, as well as optional report generation and notification setups.

New Features in VMbackup 3.0

A number of additional features have been added to this latest 3.0 release, adding flexibility and functionality. Some of these came about because of the sheer number of VMs that needed to be dealt with, SR space issues as well as with changes coming to the next XenServer release. These additions include:

VM “preview” option: To be able to look for syntax errors and ensure parameters are being defined properly, a VM can have a syntax check performed on it and if necessary, adjustments can then be made based on the diagnosis to achieve the desired configuration.

Support for VMs containing spaces: By surrounding VM names in the configuration file with double quotes, VM names containing spaces can now be processed.

Wildcard suffixes: This very versatile option permits groups of VMs to be configured to be handled similarly, eliminating the need to create individual settings for every desired VM. Examples include “PRD-*”, “SQL*” and in fact, if all VMs in the pool should be backed up, even “*”. There are however, a number of restrictions on wildcard usage (2).

Exclude VMs: Along with the wildcard option to select which VMs to back up, clearly a need arises to provide the means to exclude certain VMs (in addition to the other alternative, which is simply to rename them such that they do not match a certain backup set). Currently, each excluded VM must be named separately and any such VMs should de defined at the end of the configuration file.

Export the OS disk VDI, only: In some cases, a VM may contain multiple storage devices (VDIs) that are so large that it is impractical or impossible to take a snapshot of the entire VM and its storage. Hence, we have introduced the means to backup and restore only the operating system device (assumed to be Disk 0). In addition to space limitations, some storage, such as DB data, may not be able to be reliably backed up using a full VM snapshot. Furthermore, the next XenServer release (Dundee) will likely support up to as many as perhaps 255 storage devices per VM, making a vm-export even more involved under such circumstances. Another big advantage here is that currently, this process is much more efficient and faster than a VM export by a factor of three or more!

Root password obfuscation: So that clear-text passwords associated with the XenServer pool are not embedded in the scripts themselves, the password can be basically encoded into a file.

The mechanism for a running VM from which only the primary disk is to be backed up is similar to the full VM backup. The process of backing up such a VM follows these basic steps:

A snapshot of just the VM's Disk 0 storage is made

Using the xe utility vdi-export, that snapshot is exported to the target external storage

The snapshot is deleted, freeing up that space

As with the full VM export, some metadata for the VM are also collected and saved for this VDI export option.

These added features are of course subject to change in future releases, though typically later editions generally encompass the support of previous versions to preserve backwards compatibility.

Comment lines start with a hash mark and may be contained anywhere with the file. The hash mark must appear as the first character in the line. Note that the default number of retained backups is set here to four. The destination directory is set next, indicating where the backups will be written to. We then see a case where only the OS disk is being backed up for the specific VM "PROD-CentOS7-large-user-disks" and below that, all VMs beginning with “PROD” are backed up using the default settings. Just below that, a definition is created for all VMs starting with "DEV-RH" and the default number of backups is reduced for all of these from the global default of four down to three. Finally, we see two excludes for specific VMs that fall into the “PROD*” group that should not be backed up at all.

To launch the script manually, you would issue from the command line:

./VmBackup.py password weekend.cfg

To launch the script via a cron job, you would create a single-line entry like this:

This would run the task at ten minutes past midnight on Saturday and create a log entry called VmBackup.log. This cron entry would need to be installed on each host of a XenServer pool.

Additional Notes

It can be helpful to break up when backups are run so that they don’t all have to be done at once, which may be impractical, take so long as to possibly impact performance during the day, or need to be coordinated with when is best for specific VMs (such as before or after patches are applied). These situations are best dealt with by creating separate cron jobs for each subset.

There is a fair load on the server, comparable to any vm-export, and hence the queue is processed linearly with only one active snapshot and export sequence for a VM being run at a time. This is also why we suggest you perform the backups and then asynchronously perform any compression on the files on the external storage host itself to alleviate the CPU load on the XenServer host end.

For even more redundancy, you can readily duplicate or mirror the backup area to another storage location, perhaps in another building or even somewhere off-site. This can readily be accomplished using various copy or mirroring utilities, such as rcp, sftp, wget, nsync, rsync, etc.

This latest release has been tested on XenServer 6.5 (SP1) and various beta and technical preview versions of the Dundee release. In particular, note that the vdi-export utility, while dating back a while, is not well documented and we strongly recommend not trying to use it on any XenServer release before XS 6.5. Doing so is clearly at your own risk.

In Conclusion

This is a misleading heading, as there is not really a conclusion in the sense that this project continues to be active and as long as there is a perceived need for it, we plan to continue working on keeping it running on future XenServer releases and adding functionality as needs and resources dictate. Our hope is naturally that the community can make at least as good use of it as we have ourselves.

Footnotes:

Alternatively, to save time and resources, the compression can potentially be handled asynchronously by the host onto which the backups are written, hence reducing overhead and resource utilization on the XenServer hosts, themselves.

Certain limitations exist currently with how wildcards can be utilized. Leading wildcards are not allowed, nor are multiple wildcards within a string. This may be enhanced at a later date to provide even more flexibility.

Building bits of XenServer outside of Citrix has in the past been a bit of a challenging task, requiring careful construction of the build environment to replicate what 'XenBuilder', our internal build system, puts together. This has meant using custom DDK VMs or carefully installing by hand a set of packages taken from one of the XenServer ISOs. With XenServer Dundee, this will be a pain of the past, and making a build environment will be just a 'docker run' away.

Part of the work that's being done for XenServer Dundee has been moving things over to using standard build tools and packaging. In previous releases there have been a mix of RPMs, tarballs and patches for existing files, but for the Dundee project everything installed into dom0 is now packaged into an RPM. Taking inspiration and knowledge gained while working on xenserver/buildroot, we're building most of these dom0 packages now using mock. Mock is a standard tool for building RPM packages from source RPMs (SRPMS), and it works by constructing a completely clean chroot with only the dependencies defined by the SRPM. This means that everything needed to build a package must be in an RPM, and the dependencies defined by the SRPM must be correct too.

From the point of view of making reliably reproducible builds, using mock means there is very little possibility of having the build dependent upon the the environment. But there is also a side benefit of this work: If you actually want to rebuild a bit of XenServer you just need to have a yum repository with the XenServer RPMs in, and use 'yum-builddep' to put in place all of the build dependencies, and then building should be as simple as cloning the repository and typing 'make'.

The simplest place to do this would be in the dom0 environment itself, particularly now that the partition size has been bumped up to 20 gigs or so. However, that may well not be the most convenient. In fact, for a use case like this, the mighty Docker provides a perfect solution. Docker can quickly pull down a standard CentOS environment and then put in the reference to the XenServer yum repository, install gcc, OCaml, git, emacs and generally prepare the perfect build environment for development.

In fact, even better, Docker will actually do all of these bits for you! The docker hub has a facility for automatically building a Docker image provided everything required is in repository on Github. So we've prepared a repository containing a Dockerfile and associated gubbins that sets things up as above, and then the docker hub builds and hosts the resulting docker image.

Let's dive in with an example on how to use this. Say you have a desire to change some aspect of how networking works on XenServer, something that requires a change to the networking daemon itself, 'xcp-networkd'. We'll start by rebuilding that from the source RPM. Start the docker container and install the build dependencies:

To patch this, it's just the same as for CentOS, Fedora, and any other RPM based distro, so follow one of the manyguidesavailable.

Alternatively, you can compile straight from the source. Most of our software is hosted on github, either under the xapi-project or xenserver organisations. xcp-networkd is a xapi-project repository, so we can clone it from there:

Most of our RPMs have version numbers constructed automatically containing useful information about the source, and where the source is from git repositories the version information comes from 'git describe'.

The important part here is the hash, in this case '96c3fcc'. Comparing with the SRPM version, we can see these are identical. We can now just type 'make' to build the binaries:

[root@15729a23550b xcp-networkd]# make

this networkd binary can then be put onto your XenServer and run.

The yum repository used by the container is being created directly from the snapshot ISOs uploaded to xenserver.org, using a simple bash script named update_xs_yum.sh available on github. The container default will be to use the most recently available release, but the script can be used by anyone to generate a repository from the daily snapshots too, if this is required. There’s still a way to go before Dundee is released, and some aspect of this workflow are in flux – for example, the RPMs aren’t currently signed. However, by the time Dundee is out the door we hope to make many improvements in this area. Certainly here in Citrix, many of us have switched to using this for our day-to-day build needs, because it's simply far more convenient than our old custom chroot generation mechanism.

A few weeks ago, I received an invitation to participate in the first new XenServer class to be rolled out in over three years, namely CXS-300: Citrix XenServer 6.5 SP1 Administration. Those of you with good memories may recall that XenServer 6.0, on which the previous course was based, was officially released on September 30, 2011. Being an invited guest in what was to be only the third time the class had been ever held was something that just couldn’t be passed up, so I hastily agreed. After all, the evolution of the product since 6.0 has been enormous. Plus, I have been a huge fan of XenServer since first working with version 5.0 back in 2008. Shortly before the open-sourcing of XenServer in 2013, I still recall the warnings of brash naysayers that XenServer was all but dead. However, things took a very different turn in the summer of 2013 with the open-source release and subsequent major efforts to improve and augment product features. While certain elements were pulled and restored and there was a bit of confusion about changes in the licensing models, things have stabilized and all told, the power and versatility of XenServer with the 6.5 SP1 release is at a level now some thought it would never reach.

FROM 6.0 TO 6.5 – AND BEYOND

XenServer (XS for short) 6.5 SP1 made its debut on May 12, 2015. The feature set and changes are – as always – incorporated within the release notes. There are a number of changes of note that include an improved hotfix application mechanism, a whole new XenCenter layout (since 6.5), increased VM density, more guest OS support, a 64-bit kernel, the return of workload balancing (WLB) and the distributed virtual switch controller (DVSC) appliance, in-memory read caching, and many others. Significant improvements have been made to storage and network I/O performance and overall efficiency. XS 6.5 was also a release that benefited significantly from community participation in the Creedence project and the SP1 update builds upon this.

One notable point is that XenServer has been found to now host more XenDesktop/XenApp (XD/XA) instances than any other hypervisor (see this reference). And, indeed, when XenServer 6.0 was released, a lot of the associated training and testing on it was in conjunction with Provisioning Services (PVS). Some users, however, discovered XenServer long before this as a perfectly viable hypervisor capable of hosting a variety of Linux and Windows virtual machines, without having even given thought to XenDesktop or XenApp hosting. For those who first became familiar with XS in that context, the added course material covering provisioning services had in reality relatively little to do with XenServer functionality as an entity. Some viewed PVS an overly emphasized component of the course and exam. In this new course, I am pleased to say that XS’s original roots as a versatile hypervisor is where the emphasis now lies. XD/XA is of course discussed, but the many features available that are fundamental to XS itself is what the course focuses on, and it does that well.

COURSE MATERIALS: WHAT’S INCLUDED

The new “mission” of the course from my perspective is to focus on the core product itself and not only understand its concepts, but to be able to walk away with practical working knowledge. Citrix puts it that the course should be “engaging and immersive”. To that effect, the instructor-led course CXS-300 can be taken in a physical classroom or via remote GoToMeeting (I did the latter) and incorporates a lecture presentation, a parallel eCourseware manual plus a student exercise workbook (lab guide) and access to a personal live lab during the entire course. The eCourseware manual serves multiple purposes, providing the means to follow along with the instructor and later enabling an independent review of the presented material. It adds a very nice feature of providing an in-line notepad for each individual topic (hence, there are often many of these on a page) and these can be used for note taking and can be saved and later edited. In fact, a great takeaway of this training is that you are given permanent access to your personalized eCourseware manual, including all your notes.

The course itself is well organized; there are so many components to XenServer that five days works out in my opinion to be about right – partly because often question and answer sessions with the instructor will take up more time than one might guess, and also, in some cases all participants may have already some familiarity with XS or other hypervisor that makes it possible to go into some added depth in some areas. There will always need to be some flexibility depending on the level of students in any particular class.

A very strong point of the course is the set of diagrams and illustrations that are incorporated, some of which are animated. These compliment the written material very well and the visual reinforcement of the subject matter is very beneficial. Below is an example, illustrating a high availability (HA) scenario:

The course itself is divided into a number of chapters that cover the whole range of features of XS, enforced by some in-line Q&A examples in the eCourseware manual and with related lab exercises. Included as part of the course are not only important standard components, such as HA and Xenmotion, but some that require plugins or advanced licenses, such as workload balancing (WLB), the distributed virtual switch controller (DVSC) appliance and in-memory read caching. The immediate hands-on lab exercises in each chapter with the just-discussed topics are a very strong point of the course and the majority of exercises are really well designed to allow putting the material directly to practical use. For those who have already some familiarity with XS and are able to complete the assignments quickly, the lab environment itself offers a great sandbox in which to experiment. Most components can readily be re-created if need be, so one can afford to be somewhat adventurous.

The lab, while relying heavily on the XenCenter GUI for most of the operations, does make a fair amount of use of the command line interface (CLI) for some operations. This is a very good thing for several reasons. First off, one may not always have access to XenCenter and knowing some essential commands is definitely a good thing in such an event. The CLI is also necessary in a few cases where there is no equivalent available in XenCenter. Some CLI commands offer some added parameters or advanced functionality that may again not be available in the management GUI. Furthermore, many operations can benefit from being scripted and this introduction to the CLI is a good starting point. For Windows aficionados, there are even some PowerShell exercises to whet their appetites, plus connecting to an Active Directory server to provide role-based access control (RBAC) is covered.

THE INSTRUCTOR

So far, the materials and content have been the primary points of discussion. However, what truly can make or break a class is the instructor. The class happened to be quite small, and primarily with individuals attending remotely. Attendees were in fact from four different countries in different time zones, making it a very early start for some and very late in the day for others. Roughly half of those participating in the class were not native English speakers, though all had admirable skills in both English and some form of hypervisor administration. Being all able to keep up a common general pace allowed the class to flow exceptionally well. I was impressed with the overall abilities and astuteness of each and every participant.

The instructor, Jesse Wilson, was first class in many ways. First off, knowing the material and being able to present it well are primary prerequisites. But above and beyond that was his ability to field questions related to the topic at hand and even to go off onto relevant tangential material and be able to juggle all of that and still make sure the class stayed on schedule. Both keeping the flow going and also entertaining enough to hold students’ attention are key to holding a successful class. When elements of a topic became more of a debatable issue, he was quick to not only tackle the material in discussion, but to try this out right away in the lab environment to resolve it. The same pertained to demonstrating some themes that could benefit from a live demo as opposed to explaining them just verbally. Another strong point was his adding his own drawings to material to further clarify certain illustrations, where additional examples and explanations were helpful.

SUMMARY

All told, I found the course well structured, very relevant to the product and the working materials to be top notch. The course is attuned to the core product itself and all of its features, so all variations of the product editions are covered.

Positive points:

Good breadth of material

High-quality eCourseware materials

Well-presented illustrations and examples in the class material

Q&A incorporated into the eCourseware book

Ability to save course notes and permanent access to them

Relevant lab exercises matching the presented material

Real-life troubleshooting (nothing ever runs perfectly!)

Excellent instructor

Desiderata:

More “bonus” lab materials for those who want to dive deeper into topics

More time spent on networking and storage

A more responsive lab environment (which was slow at times)

More coverage of more complex storage Xenmotion cases in the lecture and lab

In short, this is a class that fulfills the needs of anyone from just learning about XenServer to even experienced administrators who want to dive more deeply into some of the additional features and differences that have been introduced in this latest XS 6.5 SP1 release. CXS-300: Citrix XenServer 6.5 SP1 Administration represents a makeover in every sense of the word, and I would say the end result is truly admirable.

SUMMARY

The configuration of a XenApp virtual machine (VM) hosted on XenServer that supports two concurrent graphics processing engines in passthrough mode is shown to work reliably and provide the opportunity to give more flexibility to a single XenApp VM rather than having to spread the access to the engines over two separate XenApp VMs. This in turn can provide more flexibility, save operating system licensing costs and ostensibly, could be extended to incorporate additional GPU engines.

INTRODUCTION

A XenApp virtual machine (VM) that supports two or more concurrent graphics processing units (GPUs) has a number of advantages over running separate VM instances, each with its own GPU engine. For one, if users happen to be unevenly relegated to particular XenApp instances, some XenApp VMs may idle while other instances are overloaded, to the detriment of users associated with busy instances. It is also simpler to add capacity to such a VM as opposed to building and licensing yet another Windows Server VM. This study made use of an NVIDIA GRID K2 (driver release 340.66), comprised of two Kepler GK104 engines and 8 GB of GDDR5 RAM (4 GB per GPU). It is hosted in a base system that consists of a Dell R720 with dual Intel Xeon E5-2680 v2 CPUs (40 VCPUs, total, hyperthreaded) hosting XenServer 6.2 SP1 running XenApp 7.6 as a VM with 16 VCPUs and 16 GB of memory on Windows 2012 R2 Datacenter.

PROCEDURE

It is important to note that these steps constitute changes that are not officially supported by Citrix or NVIDIA and are to be regarded as purely experimental at this stage.

Registry changes to XenApp were made according to these instructions provided in the Citrix Product Documentation.

Get the address of the existing GPU engine, if one is currently associated:# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667cvgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)(Note: ignore any vgpu_pci parameters that are irrelevant now to this process, but may be left over from earlier procedures and experiments.)

Dissociate the GPU via XenCenter or via the CLI, set GPU type to “none”.Then, add both GPU engines following the recommendations in assigning multiple GPUs to a VM in XenServer using the other-config:pci parameter:# xe vm-param-set uuid=934c889e-ebe9-b85f-175c-9aab0628667c other-config:pci=0/0000:44:0.0,0/0000:45:0.0In other words, do not use the vgpu_pci parameter at all.

Check if the new parameters took hold:# xe vm-param-get param-name=other-config uuid=934c889e-ebe9-b85f-175c-9aab0628667c params=allvgpu_pci: 0/0000:44:00.0; pci: 0/0000:44:0.0,0/0000:45:0.0; mac_seed: d229f84d-73cc-e5a5-d105-f5a3e87b82b7; install-methods: cdrom; base_template_name: Windows Server 2012 (64-bit)Next, turn GPU passthrough back on for the VM in XenCenter or via the CLI and start up the VM.

On the XenServer you should now see no GPUs available:# nvidia-smiFailed to initialize NVML: Unknown ErrorThis is good, as both K2 engines now have been allocated to the XenApp server.On the XenServer you can also run “xn –v pci-list 934c889e-ebe9-b85f-175c-9aab0628667c” (the UUID of the VM) and should see the same two PCI devices allocated:# xn -v pci-list 934c889e-ebe9-b85f-175c-9aab0628667cid pos bdf0000:44:00.0 2 0000:44:00.00000:45:00.0 1 0000:45:00.0More information can be gleaned from the “xn diagnostics” command.

Next, log onto the XenApp VM and check settings using nvidia-smi.exe. The output will resemble that of the image in Figure 1.

Figure 1. Output From the nvidia-smi utility, showing the allocation of both K2 engines.

Note the output shows correctly that 4096 MiB of memory are allocated for each of the two engines in the K2, totaling its full capacity of 8196 MiB. XenCenter will still show only one GPU engine allocated (see Figure 2) since it is not aware that both are allocated to the XenApp VM and has currently no way of making that distinction.

Figure 2. XenCenter GPU allocation (showing just one engine – all XenServer is currently capable of displaying)

So, how can you tell if it is really using both GRID engines? If you run the nvidia-smi.exe program on the XenApp VM itself, you will see it has two GPUs configured in passthrough mode (see the earlier screenshot in Figure 1). Depending on how apps are launched, you will see one or the other or both of them active. As a test, we ran two concurrent Unigine "Heaven" benchmark instances and both came out with the same metrics within 1% of each other as well as when just one instance was run, and both engines showed as being active. Displayed in Figure 3 is a sample screenshot of the Unigine ”Heaven” benchmark running with one active instance; note that it sees both K2 engines present, even though the process is making use of just one.

Figure 3. A sample Unigine “Heaven” benchmark frame. Note the two sets of K2 engine metrics displayed in the upper right corner.

It is evident from the display in the upper right hand corner that one engine has allocated memory and is working, as evidenced by the correspondingly higher temperature reading and memory frequency. The result of a benchmark using openGL and a 1024x768 pixel resolution is seen in Figure 4. Note again the difference between what is shown for the two engines, in particular the memory and temperature parameters.

Figure 4. Outcome of the benchmark. Note the higher memory and temperature on the second K2 engine.

When another instance is running concurrently, you see its memory and temperature also rise accordingly in addition to the load evident on the first engine, as well as activity on both engines in the output from the nvidia-smi.exe utility (Figure 5).

Figure 5. Two simultaneous benchmarks running, using both GRID K2 engines, and the nvidia-smi output.

You can also see with two instances running concurrently how the load is affected. Note in the performance graphs from XenCenter shown in Figure 6 how one copy of the “Heaven” benchmark impacts the server and then about halfway across the graphs, a second instance is launched.

Figure 6. XenCenter performance metrics of first one, then a second concurrent Unigine “Heaven” benchmark.

CONCLUSIONS

The combination of two GRID K2 engines associated with a single, hefty XenApp VM works well for providing adequate capacity to support a number of concurrent users in GPU passthrough mode without the need of hosting additional XenApp instances. As there is a fair amount of leeway in the allocation of CPUs and memory to a virtualized instance under XenServer (up to 16 vCPUs and 128 GB of memory under XenServer 6.2 when these tests were run), one XenApp VM should be able to handle a reasonably large number of tasks. As many as six concurrent sessions of this high-demand benchmark with 800x600 high-resolution settings have been tested with the GPUs still not saturating. A more typical application, like Google Earth, consumes around 3 to 5% of the cycles of a GRID K2 engine per instance during active use, depending on the activity and size of the window, so fairly minimal. In other words, twenty or more sessions could be handled by each engine, or potentially 40 or more for the entire GRID K2 with a single XenApp VM, provided of course that the XenApp’s memory and its own CPU resources are not overly taxed.

XenServer 6.2 already supports as many as eight physical GPUs per host, so as servers expand, one could envision having even more available engines that could be associated with a particular VM. Under some circumstances, passthrough mode affords more flexibility and makes better use of resources compared to creating specific vGPU assignments. Windows Server 2012 R2 Datacenter supports up to 64 sockets and 4 TB of memory, and hence should be able to support a significantly larger number of associated GPUs. XenServer 6.2 SP1 has a processor limit of 16 VCPUs and 128 GB of virtual memory. XenServer 6.5, officially released in January 2015, supports up to four K2 GRID cards in some physical servers and up to 192 GB of RAM per VM for some guest operating systems as does the newer release documented in the XenServer 6.5 SP1 User's Guide, so there is a lot of potential processing capacity available. Hence, a very large XenApp VM could be created that delivers a lot of raw power with substantial Microsoft server licensing savings. The performance meter shown above clearly indicates that VCPUs are the primary limiting factor in the XenApp configuration and with just two concurrent “Heaven” sessions running, about a fourth of the available CPU capacity is consumed compared to less than 3 GB of RAM, which is only a small additional amount of memory above that allocated by the first session.

These same tests were run after upgrading to XenServer 6.5 and with newer versions of the NVIDIA GRID drivers and continue to work as before. At various times, this configuration was run for many weeks on end with no stability issues or errors detected during the entire time.

ACKNOWLEDGEMENTS

I would like to thank my co-worker at NAU, Timothy Cochran, for assistance with the configuration of the Windows VMs used in this study. I am also indebted to Rachel Berry, Product Manager of HDX Graphics at Citrix and her team, as well as Thomas Poppelgaard and also Jason Southern of the NVIDIA Corporation for a number of stimulating discussions. Finally, I would like to greatly thank Will Wade of NVIDIA for making available the GRID K2 used in this study.

INTRODUCTION

There are a number of ways to connect storage devices to XenServer hosts and pools, including local storage, HBA SAS and fiber channel, NFS and iSCSI. With iSCSI, there are a number of implementation variations including support for multipathing with both active/active and active/passive configurations, plus the ability to support so-called “jumbo frames” where the MTU is increased from 1500 to typically 9000 to optimize frame transmissions. One of the lesser-known and somewhat esoteric iSCSI options available on many modern iSCSI-based storage devices is Asymmetric Logical Unit Access (ALUA), a protocol that has been around for a decade and is furthermore mysterious and intriguing because of its ability to be used not only with iSCSI, but also with fiber channel storage. The purpose of this article is an attempt to both clarify and outline how ALUA can be used more flexibly now with iSCSI on XenServer 6.5.

HISTORY

ALUA support on XenServer goes way back to XenServer 5.6 and initially only with fiber channel devices. The support of iSCSI ALUA connectivity started on XenServer 6.0 and was initially limited to specific ALUA-capable devices, which included the EMC Clariion, NetApp FAS as well as the EMC VMAX and VNX series. Each device required specific multipath.conf file configurations to properly integrate with the server used to access them, XenServer being no exception. The upstream XenServer code also required customizations. The "How to Configure ALUA Multipathing on XenServer 6.x for Enterprise Arrays" article CTX132976 (March 2014, revised March 2015) currently only discusses ALUA support through XenServer 6.2 and only for specific devices, stating: “Most significant is the usability enhancement for ALUA; for EMC™ VNX™ and NetApp™ FAS™, XenServer will automatically configure for ALUA if an ALUA-capable LUN is attached”.

It was announced in the XenServer 6.5 Release Notes that XenServer will automatically connect to one of these aforementioned documented devices and it is now running the updated device mapper multipath (DMMP) version 0.4.9-72. This rekindled my interest in ALUA connectivity and after some research and discussions with Citrix and Dell about support, it appeared this might now be possible specifically for the Dell MD3600i units we have used on XenServer pools for some time now. What is not stated in the release notes is that XenServer 6.5 now has the ability to connect generically to a large number of ALUA-capable storage arrays. This will be gone into detail later. It is also of note that MPP-RDAC support is no longer available in XenServer 6.5 and DMMP is the exclusive multipath mechanism supported. This was in part because of support and vendor-specific issues (see, for example, the XenServer 6.5 Release Notes or this document from Dell, Inc.).

But first, how are ALUA connections even established? And perhaps of greater interest, what are the benefits of ALUA in the first place?

ALUA DEFINITIONS AND SETTINGS

As the name suggests, ALUA is intended to optimize storage traffic by making use of optimized paths. With multipathing and multiple controllers, there are a number of paths a packet can take to reach its destination. With two controllers on a storage array and two NICs dedicated to iSCSI traffic on a host, there are four possible paths to a storage Logical Unit Number (LUN). On the XenServer side, LUNs then are associated with storage repositories (SRs). ALUA recognizes that once an initial path is established to a LUN that any multipathing activity destined for that same LUN is better served if routed through the same storage array controller. It attempts to do so as much as possible, unless of course a failure forces the connection to have to take an alternative path. ALUA connections fall into five self-explanatory categories (listed along with their associated hex codes):

Active/Optimized : x0

Active/Non-Optimized : x1

Standby : x2

Unavailable : x3

Transitioning : xf

For ALUA to work, it is understood that an active/active storage path is required and furthermore that an asymmetrical active/active mechanism is involved. The advantage of ALUA comes from less fragmentation of packet traffic by routing if at all possible both paths of the multipath connection via the same storage array controller as the extra path through a different controller is less efficient. It is very difficult to locate specific metrics on the overall gains, but hints of up to 20% can be found in on-line articles (e.g., this openBench Labs report on Nexsan), hence this is not an insignificant amount and potentially more significant that gains reached by implementing jumbo frames. It should be noted that the debate continues to this day regarding the benefits of jumbo frames and to what degree, if any, they are beneficial. Among numerous articles to be found are: The Great Jumbo Frames Debate from Michael Webster, Jumbo Frames or Not - Purdue University Research, Jumbo Frames Comparison Testing, and MTU Issues from ESNet. Each installation environment will have its idiosyncrasies and it is best to conduct tests within one's unique configuration to evaluate such options.

The SCSI Architecture Model version defines these SCSI Primary Commands (SPC-3) used to determine paths. The mechanism by which this is accomplished is target port group support (TPGS). The characteristics of a path can be read via an RTPG command or set with an STPG command. With ALUA, non-preferred controller paths are used only for fail-over purposes. This is illustrated in Figure 1, where an optimized network connection is shown in red, taking advantage of routing all the storage network traffic via Node A (e.g., storage controller module 0) to LUN A (e.g., 2).

Figure 1. ALUA connections, with the active/optimized paths to Node A shown as red lines and the active/non-optimized paths shown as dotted black lines.

Various SPC commands are provided as utilities within the sg3_utils (SCSI generic) Linux package.

There are other ways to make such queries, for example, VMware has a “esxcli nmp device list” command and NetApp appliances support “igroup” commands that will provide direct information about ALUA-related connections.

Let us first examine a generic Linux server containing ALUA support connected to an ALUA-capable device. In general, note that this will entail a specific configuration to the /etc/multipath.conf file and typical entries, especially for some older arrays or XenServer versions, will use one or more explicit configuration parameters such as:

hardware_handler ”1 alua”

prio “alua”

path_checker “alua”

Consulting the Citrix knowledge base article CTX132976, we see for example the EMC Corporation DGC Clariion device makes use of an entry configured as:

To investigate the multipath configuration in more detail, we can make use of the TPGS setting. The TPGS setting can be read using the sg_rtpg command. By using multiple “v” flags to increase verbosity and “d” to specify the decoding of the status code descriptor returned for the asymmetric access state, we might see something like the following for one of the paths:

Noting the boldfaced characters above, we see here specifically that target port ID 1 is an active/non-optimized ALUA path, both from the “target port group id” line as well as from the “status code”. We also see there are two paths identified, with target port IDs 1,1 and 1,2.

There are a slew of additional “sg” commands, such as the sg_inq command, often used with the flag “-p 0x83” to get the VPD (vital product data) page of interest, sg_rdac, etc. The sg_inq command will in general return, in fact, TPGS > 0 for devices that support ALUA. More on that will be discussed later on in this article. One additional command of particular interest, because not all storage arrays in fact support target port group queries (more also on this important point later!), is sg_vpd (sg vital product data fetcher), as it does not require TPG access. The base syntax of interest here is:

sg_vpd –p 0xc9 –hex /dev/…

where “/dev/…” should be the full path to the device in question. Looking at an example of the output of a real such device, we get:

If one reads the source code for various device handlers (see the multipath tools hardware table for an extensive list of hardware profiles as well as the Linux SCSI device handler regarding how the data are interpreted through the device handler), one can determine that the value of interest here is that of avte_cvp (part of the RDAC c9_inquiry structure), which is the sixth hex value, and will indicate if the connected device is using ALUA (if shifted right five bits together with a logical AND with 0x1, in the RDAC world, known as IOSHIP mode), AVT, or Automatic Volume Transfer mode (if shifted right seven bits together with a logical AND with 0x1), or otherwise defaults in general to basic RDAC (legacy) mode. In the case above we see “61” returned (indicated in boldface), so (0x61 >> 5 & 0x1) is equal to 1, and hence the above connection is indeed an ALUA RDAC-based connection.

I will revisit sg commands once again later on. Do note that the sg3_utils package is not installed on stock XenServer distributions and as with any external package, the installation of external packages may void any official Citrix support.

MULTIPATH CONFIGURATIONS AND REPORTS

In addition to all the information that various sg commands provide, there is also an abundance of information available from the standard multipath command. We saw a sample multipath.conf file earlier, and at least with many standard Linux OS versions and ALUA-capable arrays, information on the multipath status can be more readily obtained using stock multipath commands.

For example, on an ALUA-enabled connection we might see output similar to the following from a “multipath –ll” command (there will be a number of variations in output, depending on the version, verbosity and implementation of the multipath utility):

Recalling the device sde from the section above, note that it falls under a path with a lower priority of 10, indicating it is part of an active, non-optimized network connection vs. 50, which indicates being in an active, optimized group; a priority of “1” would indicate the device is in the standby group. Depending on what mechanism is used to generate the priority values, be aware that these priority values will vary considerably; the most important point is that whatever path has a higher “prio” value will be the optimized path. In some newer versions of the multipath utility, the string “hwhandler=1 alua” shows clearly that the controller is configured to allow the hardware handler to help establish the multipathing policy as well as that ALUA is established for this device. I have read that the path priority will be elevated to typically a value of between 50 and 80 for optimized ALUA-based connections (cf. mpath_prio_alua in this Suse article), but have not seen this consistently.

The multipath.conf file itself has traditionally needed tailoring to each specific device. It is particularly convenient, however, that using a generic configuration is now possible for a device that makes use of the internal hardware handler and is rdac-based and can auto-negotiate an ALUA connection. The italicized entries below represent the specific device itself, but others should now work using this generic sort of connection:

THE CURIOUS CASE OF DELL MD32XX/36XX ARRAY CONTROLLERS

The LSI controllers incorporated into Dell’s MD32xx and MD36xx series of iSCSI storage arrays represent an unusual and interesting case. As promised earlier, we will get back to looking at the sg_inq command, which queries a storage device for several pieces of information, including TPGS. Typically, an array that supports ALUA will return a value of TPGS > 0, for example:

Highlighted in boldface, we see in this case above that TPGS is reported to have a value of 1. The MD36xx has supported ALUA since RAID controller firmware 07.84.00.64 and NVSRAM N26X0-784890-904, however, even with that (or newer) revision level, an sg_inq returns the following for this particular storage array:

Various attempts to modify the multipath.conf file to try to force TPGS to appear with any value greater than zero all failed. Above all, it seemed that without access to the TPGS command, there was no way to query the device for ALUA-related information. Furthermore, the command mpath_prio_alua and similar commands appear to have been deprecated in newer versions of the device-mapper-multipath package, and so offer no help.

This proved to be a major roadblock in making any progress. Ultimately it turned out that the key to looking for ALUA connectivity in this particular case comes oddly from ignoring what TPGS reports, and rather focusing on what the MD36xx controller is doing. What is going on here is that the hardware handler is taking over control and the clue comes from the sg_vpd output shown above. To see how a LUN is mapped for these particular devices, one needs to hunt back through the /var/log/messages file for entries that appear when the LUN was first attached. To investigate this for the MD36xx array, we know it uses the internal “rdac” connection mechanism for the hardware handler, so a Linux grep command for “rdac” in the /var/log/messages file around the time the connection was established to a LUN should reveal how it was established.

Sure enough, if one looks at a case where the connection is known to not be making use of ALUA, you might see entries such as these:

In contrast, an ALUA-based connection to LUNs shown below on an MD3600i that has new enough firmware to support ALUA and using an appropriate client that also supports ALUA and has a properly configured entry in the /etc/multipath.conf file will instead show the IOSHIP connection mechanism (see p. 124 of this IBM System Storage manual for more on I/O Shipping):

The even better news is that not only is ALUA now functional in XenServer 6.5 but should, in fact, work now with a large number of ALUA-capable storage arrays, both with custom configuration needs as well as potentially many that may work generically. Another surprising find was that for the MD3600i arrays tested, it turns out that even the “stock” version of the MD36xxi multipath configuration entry provided with XenServer 6.5 creates ALUA connections. The reason for this is that the hardware handler is being used consistently, provided no specific profile overrides are intercepted, and so primarily the storage device is doing the negotiation itself instead of being driven by the file-based configuration. This is what made the determination of ALUA connectivity more difficult, namely that the TPGS setting was never changed from zero and could consequently not be used to query for the group settings.

CONCLUSIONS

First off, it is really nice to know now that many modern storage devices support ALUA and that XenServer 6.5 now provides an easier means to leverage this protocol. It is also a lesson that documentation can be either hard to find and in some cases, is in need of being updated to reflect the current state. Individual vendors will generally provide specific instructions regarding iSCSI connectivity, and should of course be followed. Experimentation is best carried out on non-production servers where a major faux pas will not have catastrophic consequences.

To me, this was also a lesson in persistence as well as an opportunity to share the curiosity and knowledge among a number of individuals who were helpful throughout this process. Above all, among many who deserve thanks, I would like to thank in particular Justin Bovee from Dell and Robert Breker of Citrix for numerous valuable conversations and information exchanges.

It's coming up on time for OpenStack Summit Vancouver where OpenStack developers and administrators will come together to discuss what it means and takes to run a successful cloud based on OpenStack technologies. As in past Summits, there will be a realistic focus on KVM based deployments due to KVM, or more precisely libvirt, having "Group A" status within the compute driver test matrix. XenServer currently has "Group B" status, and when you note that the distinction between A and B really boils down to which can gate a commit, there is no logical reason why XenServer shouldn't be a more prevalent option.

Having XenServer be thought of as completely appropriate for OpenStack deployments is something I'm looking to increase, and I'm asking for your help. The OpenStack Summit organizers want to ensure the content matches the needs of the community. In order to help ensure this, they invite their community to vote on the potential merit of all proposals. This is pretty cool since it helps ensure that the audience gets what they want, but it also makes it a bit harder if you're not part of the "mainstream". That's where I reach out to you in the XenServer community. If you're interested in seeing XenServer have greater mindshare within OpenStack, then please vote for one or both of my submissions. If your personal preference is for another cloud solution, I hope that you agree with me that increasing our install base strengthens both our community and XenServer, and will still take the time to vote. Note that you may be required to create an account, and that voting closes on February 23rd.

Now that Creedence has shipped as XenServer 6.5, and we've even addressed some early issues with hotfixes (in record time no less), it was time to give xenserver.org a bit of an update as well. All of the content you've known to be on xenserver.org is still here, but this face lift is the first in a series of changes you'll see coming over the next few months.

Our Role

The role of xenserver.org will be shifting slightly from what we did in 2014 with an objective that by the end of 2015 it is the portal virtualization administrators use to find the information they need to be successful with XenServer. That's everything from development blogs, pre-release information, but also deeper technical content. Not everything will be hosted on xenserver.org, but we'll be looking for the most complete and accurate content available. Recognizing that commercial support is a critical requirement for production use of any technology, if we list a solution we'll also state clearly if its use is commercially supportable by Citrix or whether it could invalidate your support contract. In the end, this about successfully running a XenServer environment, so some practices presented might not be "officially sanctioned" and tested to the same level as commercially supported features, but are known by the community to work.

Community Content

The new xenserver.org will also have prominent community content. By its very nature, XenServer lives in a data center ecosystem populated by third party solutions. Some of those solutions are commercial in nature, and because commercial solutions should always retain "supported environment" status for a product, we've categorized them all under the "Citrix Ready" banner. Details on Citrix Ready requirements can be found on their FAQ page. Other solutions can be found within open source projects. We on the XenServer team are active in many, and we're consolidating information you'll need to be successful with various projects under the "Community" banner.

Commercial Content

We've always promoted commercial support on xenserver.org, and that's not changing. If anything, you'll see us bias a bit more towards promoting both support and some of the premium features within XenServer. After-all there is only one XenServer and the only difference between the installer you get from xenserver.org and from citrix.com is the EULA. Once you apply a commercial license, or use XenServer as part of an entitlement within XenDesktop, you are bound by the same commercial EULA regardless of where the installation media originated.

Contributing Content

Public content contributions to xenserver.org have always been welcome, and with our new focus on technical information to run a successful XenServer installation, we're actively seeking more content. This could be in the form of article or blog submissions, but I'm willing to bet the most efficient way will be just letting us know about content you discover. If you find something, tweet it to me @XenServerArmy and we'll take a look at the content. If it is something we can use, we'll write a summary blog or article and link to it. Of course before that can happen we'll need to verify if the content could create an unsupported configuration and warn users upfront if it does.

What kind of content are we looking for? That's simple, anything you find useful to manage your XenServer installation. It doesn't matter how big or small that might be, or what tooling you have in place, if it helps you to be productive, we think that's valuable stuff for the community at large.

Today the entire XenServer team is very proud to announce that Creedence has officially been released as XenServer 6.5. It is available for download from xenserver.org, and is recommended for all new XenServer installs. We're so confident in what has been produced that I'm encouraging all XenServer 6.2 users to upgrade at their earliest convenience. So what have we actually accomplished?

The headline features

Every product release I've ever done, and there have been quite a large number over the years, has had some headline features; but Creedence is a bit different. Creedence wasn't about new features, and Creedence wasn't about chasing some perceived competitor. Creedence very much was about getting the details right for XenServer. It was about creating a very solid platform upon which anyone can comfortably, and successfully, build a virtualized data center regardless of workload. Creedence consisted of a lot of mundane improvements whose combination made for one seriously incredible outcome; Creedence restored the credibility of XenServer within the entire virtualization community. We even made up some t-shirts that the cool kids want ;)

So let's look at some of those mundane improvements, and see just how significant they really are.

64 bit dom0 freed us from the limitations of dreaded Linux low memory, but also allows us to use modern drivers and work better with modern servers. From personal experience, when I took alpha.2 and installed it on some of my test Dell servers, it automatically detected my hardware RAID without my having to jump through any driver disk hoops. That was huge for me.

The move to a 3.10 kernel from kernel.org meant that we were out of the business of having a completely custom kernel and corresponding patch queue. Upstream is goodness.

The move to the Xen Project hypervisor 4.4 meant that we're now consuming the most stable version of the core hypervisor available to us.

We've updated to an ovs 2.10 virtual switch giving us improved network stability when the virtual switch is under serious pressure. While we introduced the ovs way back in December of 2010, there remained cases where the legacy Linux bridge worked best. With Creedence, those situations should be very few and far between

Network datapath optimizations allow us to drive line rate for 10Gbps NICs, and we're doing pretty well with 40Gbps NICs.

Storage was improved through an update to tapdisk3, and the team did a fantastic job of engaging with the community to provide performance details. Overall we've seen very significant improvements in aggregate disk throughput, and when you're virtualizing it's the aggregate which matters more than the single VM case.

What this really means for you is that XenServer 6.5 has a ton more headroom than 6.2 ever did. If you happen to be on even older versions, you'll likely find that while 6.5 looks familiar, it's not quite like any other XenServer you've seen. As has been said multiple times in blog comments, and by multiple people, this is going to be the best release ever. In his blog, Steve Wilson has a few performance graphs to share for those doubters.

The future

While today we've officially released Creedence, much more work remains. There is a backlog of items we really want to accomplish, and you've already provided a pretty long list of features for us to figure out how to make. The next project will be unveiled very soon, and you can count on having access to it early and being able to provide feedback just as the thousands of pre-release participants did for Creedence. Creedence is very much a success of the community as it is an engineering success.

Over the year end break, there were a couple of posts to the list which asked a very important question: "Does the DVSC work with the Release Candidate?" The answer was a resounding "maybe", and this post is intended to help clarify some of the distinction between what you get from xenserver.org, what you get from citrix.com, and how everything is related.

At this point most of us are already familiar with XenServer virtualization being "open source", and that with XenServer 6.2 there was no functional difference between the binary you could download from citrix.com and that from xenserver.org. Logically, when we started the Creedence pre-release program, many assumed that the same download would exist in both locations, and that everything which might be part of a "XenServer" would also always be open source. That would be really cool for many people, and potentially problematic for others.

The astute follower of XenServer technology might also have noticed that several things commonly associated with the XenServer product never had their source released. StorageLink is a perfect example of this. Others will have noticed that the XenServer Tech Preview run on citrix.com included a number of items which weren't present in any of the xenserver.org pre-release builds, and for which the sources aren't listed on xenserver.org. There is of course an easy explanation for this, but it goes to the heart of what we're trying to do with xenserver.org.

xenserver.org is first and foremost about the XenServer platform. Everyone associated with xenserver.org, and by extension the entire team, would love for the data centers of the world to standardize on this platform. The core platform deliverable is called main.iso, and that's the thing from which you install a XenServer host. The source for main.iso is readily available, and other than EULA differences, the XenServer host will look and behave identically regardless of whether main.iso came from xenserver.org or citrix.com. The beauty of this model is that when you grow your XenServer based data center to the point where commercial support makes sense, the software platform you'd want supported is the same.

All of which gets me back to the DVSC (and other similar components). DVSC, StorageLink and certain other "features" include source code which Citrix has access to under license. Citrix provides early access to these feature components to those with a commercial relationship. Because there is no concept of a commercial relationship with xenserver.org, we can't provide early access to anything which isn't part of the core platform. Now of course we do very much want everyone to obtain the same XenServer software from both locations, so when an official release occurs, we mirror it for your convenience.

I hope this longish explanation helps clarify why when questions arise about "features" not present in main.iso that the response isn't as detailed as some might like. It should also help explain why components obtained from prior "Tech Preview" releases might not work with newer platform builds obtained as part of a public pre-release program.

Over the past few weeks, and particularly as part of the Creedence World Tour, I've been getting questions about precisely when Creedence will be released. To the best of my ability, I've tried to take those questions head on, but the reality is we haven't been transparent about what happens when we release XenServer, and that's part of the problem. I'm going to try and address some of that in this post.

Now before I get into too much detail, it's important to note that XenServer is a packaged product which Citrix sells, and which is also available freely as open source. Citrix is a public company, so there is often a ton more detail I have, but which isn't appropriate for public disclosure. A perfect case in point is the release date. Since conceivably someone could change a PO based on this information, disclosing that can impact revenue and, well, I like my pay-cheque so I hope you understand when I'm quiet on some things.

So back to the question of what happens during finalization of a release, and how that can create a void. The first thing we do is take a look at all the defects coming in from all sources; with bugs.xenserver.org being one of many. We look at the nature of any open issues and determine what the potential for us to have a bad release from them. Next we create internal training to be delivered to the product support teams. These two tasks typically occur with either a final beta, or first release candidate. Concurrent to much of this work is finalization of documentation, and defining the performance envelope of the release. With each release, we have a "configuration limits" document, and the contents of that document represent both what Citrix is willing to deliver support on and what constitutes the limits of a stable XenServer configuration. For practical purposes, many of you have pushed Creedence in some way beyond what we might be comfortable defining as a "long term stable configuration", so its entirely possible the final performance could differ from what you've experienced so far.

Those are the technical bits behind the release, but this is also something which needs to be sold and that means we need to prepare for that as well. In the context of XenServer, selling means both why XenServer is great with XenDesktop, but also why it's great for anyone who is tired of paying more for their core virtualization requirements than really necessary. Given how many areas of data center operations XenServer touches, and the magnitude of the changes in Creedence, getting this right is critical. Then of course there is all the marketing collateral, and you get a sense of how much work is involved in getting XenServer out the door.

Of course, it can be argued that much of this "readiness" stuff could be handled in parallel, and for another project you'd be right. The reality is XenServer has had its share of releases which should've had a bit more bake time. I hope you agree with me that Creedence is better because we haven't rushed it, and that with Creedence we have a solid platform upon which to build. So with that in mind, I'll leave you with it hopefully being obvious that we intend to make a big splash with Creedence. Such a splash can't occur if we release during a typical IT lockdown period, and will need a bit larger stage than the one I'm currently on.

A very big thank you for everyone who participated in the Creedence Alpha/Beta programme! The programme was very successful and raised a total of 177 issues, of which 138 were resolved during the Alpha/Beta period. We are reviewing how the pre-release process can be improved and streamlined going forward.

The Creedence Alpha/Beta programme has now come to an end with the focus of nightly snapshots moving on to the next version of XenServer.

Next week the world of server virtualization and cloud will turn its attention to the Moscone Center in San Fransisco and VMworld 2014 to see what VMware has planned for its offerings in 2015. As the leader in closed source virtualization refines its "No Limits" message; I wish my friends, and former colleagues, now at VMware a very successful event. If you're attending VMworld, I also wish you a successful event, and hope that you'll find in VMware what you're looking for. I personally won't be at VMworld this year, and while I'll miss opportunities to see what VMware has planned to push vSphere forward, how VMware NSX for multi-hypervisors is evolving, and whether they're expanding support for XenServer in vCloud Automation Center, I'll be working hard ensuring that XenServer Creedence delivers clear value to its community. Of course, I'll probably have a live stream of the keynotes; but that's not quite the same ;)

If you're attending VMworld and have an interest in seeing an open source choice in a VMware environment, I hope you'll take the time to ask the various vendors about XenServer; and most importantly to encourage VMware to continue supporting XenServer in some of its strategic products. No one solution can ever hope to satisfy everyone's needs and choice is an important thing. So while you're benefiting from the efforts VMware has put into informing and supporting their community, I hope they realize that with choice everyone is stronger, and embracing other communities only benefits the user.

Overview

In this blog post, I introduce a new feature of XenServer Creedence alpha.4, in-memory read caching, the technical details, the benefits it can provide, and how best to use it.

Technical Details

A common way of using XenServer is to have an OS image, which I will call the golden image, and many clones of this image, which I will call leaf images. XenServer implements cheap clones by linking images together in the form of a tree. When the VM accesses a sector in the disk, if a sector has been written into the leaf image, this data is retrieved from that image. Otherwise, the tree is traversed and data is retrieved from a parent image (in this case, the golden image). All writes go into the leaf image. Astute readers will notice that no writes ever hit the golden image. This has an important implication and allows read caching to be implemented.

tapdisk is the storage component in dom0 which handles requests from VMs (see here for many more details). For safety reasons, tapdisk opens the underlying VHD files with the O_DIRECT flag. The O_DIRECT flag ensures that dom0's page cache is never used; i.e. all reads come directly from disk and all writes wait until the data has hit the disk (at least as far as the operating system can tell, the data may still be in a hardware buffer). This allows XenServer to be robust in the face of power failures or crashes. Picture a situation where a user saves a photo and the VM flushes the data to its virtual disk which tapdisk handles and writes to the physical disk. If this write goes into the page cache as a dirty page and then a power failure occurs, the contract between tapdisk and the VM is broken since data has been lost. Using the O_DIRECT flag allows this situation to be avoided and means that once tapdisk has handled a write for a VM, the data is actually on disk.

Because no data is ever written to the golden image, we don't need to maintain the safety property mentioned previously. For this reason, tapdisk can elide the O_DIRECT flag when opening a read-only image. This allows the operating system's page cache to be used which can improve performance in a number of ways:

The number of physical disk I/O operations is reduced (as a direct consequence of using a cache).

Latency is improved since the data path is shorter if data does not need to be read from disk.

Throughput is improved since the disk bottleneck is removed.

One of our goals for this feature was that it should have no drawbacks when enabled. An effect which we noticed initially was that data appeared to be read twice from disk which increases the number of I/O operations in the case where data is only read once from the VM. After a little debugging, we found that disabling O_DIRECT causes the kernel to automatically turn on readahead. Because data access pattern of a VM's disk tends to be quite random, this had a detrimental effect on the overall number of read operations. To fix this, we made use of a POSIX feature, posix_fadvise, which allows an application to inform the kernel how it plans to use a file. In this case, tapdisk tells the kernel that access will be random using the POSIX_FADV_RANDOM flag. The kernel responds to this by disabling readahead, and the number of read operations drops to the expected value (the same as when O_DIRECT is enabled).

Administration

Because of difficulties maintaining cache consistency across multiple hosts in a pool for storage operations, read caching can only be used with file-based SRs; i.e. EXT and NFS SRs. For these SRs, it is enabled by default. There shouldn't be any performance problems associated with this; however, if necessary, it is possible to disable read caching for an SR:

xe sr-param-set uuid=<UUID> other-config:o_direct=true

You may wonder how read caching differs from IntelliCache. The major difference is that IntelliCache works by caching reads from the network onto a local disk while in-memory read caching caches reads from either into memory. The advantage of in-memory read caching is that memory is still an order of magnitude faster than an SSD so performance in bootstorms and other heavy I/O situations should be improved. It is possible for them both to be enabled simultaneously; in this case reads from the network are cached by IntelliCache to a local disk and reads from that local disk are cached in memory with read caching. It is still advantageous to have IntelliCache turned on in this situation because the amount of available memory in dom0 may not be enough to cache the entire working set and reading the remainder from local storage is quicker than reading over the network. IntelliCache further reduces the load on shared storage when using VMs with disks that are not persistent across reboots by only writing to the local disk, not the shared storage.

Talking of available memory, XenServer admins should note that to make best use of read caching, the amount of dom0 memory may need to be increased. Ideally the amount of dom0 memory would be increased to the size of the golden image so that once cached, no more reads hit the disk. In case this is not possible, an approach to take would be to temporarily increase the amount of dom0 memory to the size of the golden image, boot up a VM and open the various applications typically used, determine how much dom0 memory is still free, and then reduce dom0's memory by this amount.

Performance Evaluation

Enough talk, let's see some graphs!

In this first graph, we look at the number of bytes read over the network when booting a number of VMs on an NFS SR in parallel. Notice how without read caching, the number of bytes read scales proportionately with the number of VMs booted which checks out since each VM's reads go directly to the disk. When O_DIRECT is removed, the number of bytes read remains constant regardless of the number of VMs booted in parallel. Clearly the in-memory caching is working!

How does this translate to improvements in boot time? The short answer: see the graph! The longer answer is that it depends on many factors. In the graph, we can see that there is little difference in boot time when booting less than 4 VMs in parallel because the NFS server is able to handle that much traffic concurrently. As the number of VMs increases, the NFS server becomes saturated and the difference in boot time becomes dramatic. It is clear that for this setup, booting many VMs is I/O-limited so read caching makes a big difference. Finally, you may wonder why the boot time per VM increases slowly as the number of VMs increases when read caching is enabled. Since the disk is no longer a bottleneck, it appears that some other bottleneck has been revealed, probably CPU contention. In other words, we have transformed an I/O-limited bootstorm into a CPU-limited one! This improvement in boot times would be particularly useful for VDI deployments where booting many instances of the same VM is a frequent occurrence.

Conclusions

In this blog post, we've seen that in-memory read caching can improve performance in read I/O-limited situations substantially without requiring new hardware, compromising reliability, or requiring much in the way of administration.

As future work to improve in-memory read caching further, we'd like to remove the limitation that it can only use dom0's memory. Instead, we'd like to be able to use the host's entire free memory. This is far more flexible than the current implementation and would remove any need to tweak dom0's memory.

What is Scientific Linux?

In short, Scientific Linux is an customized RedHat/CentOS Linux distribution provided by CERN and Fermilab: popular in educational institutions as well as laboratory environments. More can be read about Scientific Linux here: https://www.scientificlinux.org/

From my own long-term testing - before XenServer 6.2 and our pre-release/Alpha - Creedence - I have ran both Scientific Linux 5 and Scientific Linux 6 without issues. This article's scope is to show how one can install Scientific Linux and, more specifically, ensure the XenTools Guest Additions for Linux are installed as these do not require any form of "Xen-ified" kernel.

XenServer and Creedence

The following are my own recommendations to run Scientific Linux in XenServer:

I recommend using XenServer 6.1 through any of the Alpha releases due to improvements with XenTools

I recommend using Scientific Linux 5 or Scientific Linux 6

The XenServer VM Template one will need to use will either be of CentOS 5 or CentOS 6: 32 or 64 bit depends on the release of Scientific Linux you will be using

Scientific Linux 5 or 6 Guest VM Installation

With XenCenter, the process of installing Scientific Linux 5.x or Scientific Linux 6 uses the same principles. You need to create a new VM, select the appropriate CentOS template, and define the VM parameters for disk, RAM, and networking:

1. In XenCenter, select "New VM":

2. When prompted for the new VM Template, select the appropriate CentOS-based template (5 or 6, 32 or 64 bit):

4. From the console, follow the steps as to install Scientific Linux 5 or 6 based on your preferences.

5. After rebooting, login as root and execute the following command within the Guest VM:

yum update

6. Once yum has applied any updates, reboot the Scientific Linux 5 or 6 Guest VM by executing the following within the Guest VM:

reboot

7. With the Guest VM back up, login as root and mount the xs-tools.iso within XenCenter:

7. From the command line, execute the following commands to mount xs-tools.iso within the Guest VM as well as to run the install.sh utility:

cd ~mkdir toolsmount /dev/xvdd tools/cd tools/Linux/./install.sh

8. With Scientific Linux 5 you will be prompted to install the XenTools Guest Additions - select yes and when complete, reboot the VM:

reboot

9. With Scientific Linux 6 you will notice the following output:

Fatal Error: Failed to determine Linux distribution and version.

10. This is not a Fatal Error, but an error induced because the distro build and revision are not presented as expected. This means that you will manually need to install the XenTools Guest Additions by executing the following commands and rebooting:

rpm -ivh xe-guest-utilities-xenstore-<version number here.x86_64.rpm

rpm -ivh xe-guest-utilities-<version number here>.x86_64.rpm

reboot

Finally after the last reboot (post guest addition install) one will notice from XenCenter that the network address, stats, and so forth are available (including the ability to migrate the VM):

I hope this article helps any of you out there and feedback is always welcomed!

XenServer.next Alpha Available

The XenServer engineering team is pleased to announce the availability an alpha of the next release of XenServer, code named “Creedence”. XenServer Creedence is intended to represent the latest capabilities in XenServer with a target release date determined by feature completeness. Several key areas have been improved over XenServer 6.2, and singificantly we have also introduced a 64 bit control domain architecture and updated the Xen Project hypervisor to version 4.4. Due to these changes, we are requesting tests using this alpha be limited to core functionality such as the installation process and basic operations like VM creation, start and stop. Performance and scalability tests should be deferred until a later build is nominated to alpha or beta status.

This is pre-release code and as such isn’t appropriate for production use, and is unlikely to function properly with provisioning solutions such as Citrix XenDesktop and Citrix CloudPlatform. It is expected that users of Citrix XenDesktop and Citrix CloudPlatform will be able to begin testing Creedence within the XenServer Tech Preview time-frame announced at Citrix Synergy. In preparation for the Tech Preview, all XenServer users, including those running XenDesktop, are encouraged to validate if Creedence is able to successfully install on their chosen hardware.

Key Questions

When does the alpha period start?

The alpha period starts on May 19th 2014

When does the alpha period end?

There is no pre-defined end to the alpha period. Instead, we’re providing access to nightly builds and from those nightly builds we’ll periodically promote builds to “alpha.x” status. The promotion will occur as key features are incorporated and stability targets are reached. As we progress the alpha period will naturally transition into a beta or Tech Preview stage ultimately ending with a XenServer release. Announcements will be made on xenserver.org when a new build is promoted.

Where do I get the build?

If I encounter a defect, how do I enter it?

Defects and incidents are expected with this alpha, and they can be entered at https://bugs.xenserver.org. Users wishing to submit or report issues are advised to review our submission guidelines to ensure they are collecting enough information for us to resolve any issues.

Where can I find more information on Creedence?

We are pleased to announce a public wiki has been created at https://wiki.xenserver.org to contain key architectural information about XenServer; including details about Creedence.

How do I report compatibility information?

The defect system offers Hardware and Vendor compatibility projects to collect information about your environment. Please report both successes and failures for our review.

What about upgrades?

The alpha will not upgrade any previous version of XenServer, including nightly builds from trunk, and there should be no expectation the alpha can be upgraded.

Do I need a new XenCenter?

Yes, XenCenter has been updated to work with the alpha and can be installed from the installation ISO.

Will I need a new SDK?

If you are integrating with XenServer, the SDK has also been updated. Please obtain the SDK for the alpha from the download page.

Where can I ask questions?

Since the Creedence alpha is being posted to and managed by the xenserver.org team, questions asked on Citrix Support Forums are likely to go unanswered. Those forums are intended for released and supported versions of XenServer. Instead we are inviting questions on the xs-devel mailing list, and via twitter to @XenServerArmy. In order to post questions, you will need to subscribe to the mailing list which can be done here: http://xenserver.org/discuss-virtualization/mailing-lists.html. Please note that the xs-devel mailing list is monitored by the engineering team, but really isn’t intended as a general support mechanism. If your question is more general purpose and would apply to any XenServer version, please validate if the issue being experienced is also present with XenServer 6.2 and if so ask the question on the Citrix support forums. We've also created some guidelines for submitting incidents.

On April 7th, 2014 a security vulnerability in the OpenSSL library was disclosed, and was given the monkier of "HeartBleed". This vulnerability has received a ton of press, and there is a very nice summary of what this all means on heartbleed.com. Since XenServer includes the OpenSSL libraries, there was the potential it could be impacted as well. The good news for anyone using a released version of XenServer is that all supported versions of XenServer use version 0.9.8. So if you have XenServer in production, you can have confidence in that XenServer deployment.

Of course, since XenServer is open source, there are other ways to deploy XenServer than using a released version. The first is to either build from sources or to take xenserver-core and install it on your preferred Linux distribution. If that was your path to creating a XenServer deployment, then you will need to double check if your dom0 distribution is at risk. The second way would be to install XenServer from a nightly snapshot. The bad news is that these nightly snapshots do include a vulnerable version of OpenSSL, but we're working on it. Now of course those snapshots aren't considered production ready, and aren't eligible for support from Citrix, but we all know they could be in labs someplace and still should be checked.

Of course, the action certainly didn’t stop there. Not only were the Windows PV drivers open-sourced, but Paul, Ben and Owen completely overhauled them to make them compatible with upstream xen. Previously the drivers relied on a customisation contained within the xenserver patch queue. Now the drivers should work well, on every xen system.

Virtualising graphics... the right way

In another exciting development, Paul's work on creating multiple device emulators for HVM guests enabled safe sharing of physical GPUs among VMs, a feature we call vGPU. Just as xen allows its components to be isolated in separate VM containers (known as dom0 disaggregation), it’s exciting to see the isolation being taken to the level of individual virtual PCI devices. (I’m hoping to try writing my own virtual PCI device sometime in 2014)

User interfaces

Continuing with the Windows theme, at the top of the xenserver stack, the XenCenter interface has received several great usability enhancements. It has been redesigned to simplify the user experience for navigation between different views of resources and for viewing different types of notifications. This was all thanks to the hard work of Tina (expect another blog on this subject soon!)

Scaling up

2013 was also a great year for xenserver scalability. It’s quite a challenge making a system as complex as xenserver scale well: you have to have a deep understanding of the whole system in order to find -- and fix -- all the important bottlenecks. Thanks to the laser-like focus of Felipe, the storage datapath has been extensively analysed and understood. Meanwhile large increases in basic system resources such as David’s new event channel ABI, reducing the number of grant references needed by disabling receive-side copy and absorbing upstream xen goodness such as Wei’s patch to use poll(2) in consoled have led to big improvements in VM density.

XenServer: the distro

The xenserver distro is the foundation upon which everything else is -- literally -- based. Anyone who has downloaded one of the regular development snapshot builds (thanks to Craig and Peter for organising those) should have noticed that it has been recently rebased on top of CentOS 6.4 with a shiny new Linux 3.x kernel and xen 4.3. This means that we have access to new hardware drivers, access to more modern tools (e.g. newer versions of python) and lots of other great stuff.

(No-one likes) patch queues

Speaking of the distro, I have to mention the “patch queue problem”. Patch queues are a sequence of source code customisations applied to an “upstream” (e.g. the official xen-4.3 release) to produce the version we actually use. Patch queues are important tools for distro builders. They can be used for good (e.g. backporting important security fixes) and for evil (e.g. forward-porting stuff that shouldn’t exist: “technical debt” in its most concrete form). Every time a new upstream release comes out, the patch queue needs careful rebasing against the new release -- this can be very time-consuming. In recent years, the xenserver xen patch queue had grown to such a large size that it was almost blocking us from moving xenserver to more recent versions of xen. I’m happy to report that the past year has seen heroic efforts from Andy, Malcolm and David to reduce it to more manageable levels. Andy tells me that while it took more than 1 year (!) to rebase and fix xenserver from xen 3.4 to 4.1; and then -- a still surprising -- 3 months to get from 4.1 to 4.2; it recently only took 3 days to rebase from 4.2 to 4.3! Phew!

Build and packaging

Our goal is to get to a world where the xenserver.iso is simply a respin of a base (CentOS) distro with an extra repo of packages and overrides on top. Therefore in 2013 we made a concerted effort to clean up our xenserver distro build and packaging more generally. Thanks to Euan, Jon and Frediano we're now using standard build tools like mock and rpmbuild. In the past we cut corners by either leaving files unpackaged (bad) or applying large patch queues in the packages (terrible, as we’ve seen already). To help sort this out, Euan created a set of experimental RPM and .deb packages for the toolstack, shook out the bugs and forced us to fix things properly. As a result we’ve found and fixed lots of portability problems in the upstream software (e.g. hard-coded CentOS paths which break on Debian), which should make the lives of other distro package maintainers easier.

Last, but certainly not least, xenserver gained many, many bug-fixes making it into an even-more robust platform to which you can trust your infrastructure. Working on xenserver in 2013 was really fun and I’m looking forward to (the rest of) 2014!

In a previous article, I described how dom0 event channels can cause a hard limitation on VM density scalability.

Event channels were just one hard limit the XenServer engineering team needed to overcome to allow XenServer 6.2 to support up to 500 Windows VMs or 650 Linux VMs on a single host.

In my talk at the 2013 Xen Developer Summit towards the end of October, I spoke about a further six hard limits and some soft limits that we overcame along the way to achieving this goal. This blog article summarises that journey.

Firstly, I'll explain what I mean by hard and soft VM density limits. A hard limit is where you can run a certain number of VMs without any trouble, but you are unable to run one more. Hard limits arise when there is some finite, unsharable resource that each VM consumes a bit of. On the other hand, a soft limit is where performance degrades with every additional VM you have running; there will be a point at which it's impractical to run more than a certain number of VMs because they will be unusable in some sense. Soft limits arise when there is a shared resource that all VMs must compete for, such as CPU time.

Here is a run-down of all seven hard limits, how we mitigated them in XenServer 6.2, and how we might be able to push them even further back in future:

Cause of limitation: blktap2 only supports up to 1,024 minor numbers, caused by #define MAX_BLKTAP_DEVICE in blktap.h.

Mitigation for XenServer 6.2: We doubled that constant to allow up to 2,048 devices.

Mitigation for future: Move away from blktap2 altogether?

aio requests in dom0

Cause of limitation: Each blktap2 instance creates an asynchronous I/O context for receiving 402 events; the default system-wide number of aio requests (fs.aio-max-nr) was 444,416 in XenServer 6.1.

Mitigation for XenServer 6.2: We set fs.aio-max-nr to 1,048,576.

Mitigation for future: Increase this parameter yet further. It's not clear whether there's a ceiling, but it looks like this would be okay.

dom0 grant references

Cause of limitation: Windows VMs used receive-side copy (RSC) by default in XenServer 6.1. In netbk_p1_setup, netback allocates 22 grant-table entries per virtual interface for RSC. But dom0 only had a total of 8,192 grant-table entries in XenServer 6.1.

Mitigation for XenServer 6.2: We could have increased the size of the grant-table, but for other reasons RSC is no longer the default for Windows VMs in XenServer 6.2, so this limitation no longer applies.

Mitigation for future: Continue to leave RSC disabled by default.

Connections to xenstored

Cause of limitation: xenstored uses select(2), which can only listen on up to 1,024 file descriptors; qemu opens 3 file descriptors to xenstored.

Mitigation for future: We could modify xenstored to accept more connections, but in the future we expect to be using upstream qemu, which doesn't connect to xenstored, so it's unlikely that xenstored will run out of connections.

Okay, so what does this all mean in terms of how many VMs you can run on a host? Well, since some of the limits concern your VM configuration, it depends on the type of VM you have in mind.

Let's take the example of Windows VMs with PV drivers, each with 1 vCPU, 3 disks and 1 network interface. Here are the number of those VMs you'd have to run on a host in order to hit each limitation:

Limitation

XS 6.1

XS 6.2

Future

dom0 event channels

150

570

no limit

blktap minor numbers

341

682

no limit

aio requests

368

869

no limit

dom0 grant references

372

no limit

no limit

xenstored connections

333

500

no limit

consoled connections

no limit

no limit

no limit

dom0 low memory

650

650

no limit

The first limit you'd arrive at in each release is highlighted. So the overall limit is event channels in XenServer 6.1, limiting us to 150 of these VMs. In XenServer 6.2, it's the number of xenstore connections that limits us to 500 VMs per host. In the future, none of these limits will hit us, but there will surely be an eighth limit when running many more than 500 VMs on a host.

What about Linux guests? Here's where we stand for paravirtualised Linux VMs each with 1 vCPU, 1 disk and 1 network interface:

Limitation

XS 6.1

XS 6.2

Future

dom0 event channels

225

1000

no limit

blktap minor numbers

1024

2048

no limit

aio requests

368

869

no limit

dom0 grant references

no limit

no limit

no limit

xenstored connections

no limit

no limit

no limit

consoled connections

341

no limit

no limit

dom0 low memory

650

650

no limit

This explains why the supported limit for Linux guests can be as high as 650 in XenServer 6.2. Again, in the future, we'll likely be limited by something else above 650 VMs.

What about the soft limits?

After having pushed the hard limits such a long way out, we then needed to turn our attention towards ensuring that there weren't any soft limits that would make it infeasible to run a large number of VMs in practice.

Testimonial

"Our job is to accommodate all the faculties’ needs as much as possible so we needed to find a solution that could support a large number of applications as well as save storage space and staff resources. This is where Citrix stepped in."

Jose ChanHead of IT DepartmentMacau Polytechnic Institute

"We expect that total costs for server infrastructure will be reduced by more than 35% because of XenServer."