PCI Pass-Through on XenServer 7.0

Plenty of people have asked me over the years how to pass-through generic PCI devices to virtual machines running on XenServer. Whilst it isn't officially supported by Citrix, it's none the less perfectly possible to do; just note that your mileage may vary, because clearly it's not rigorously tested with all the possible different types of device people might want to pass-through (from TV cards, to storage controllers, to USB hubs...!).

The process on XenServer 7.0 differs somewhat from previous releases, in that the Dom0 control domain is now CentOS 7.0-based, and UEFI boot (in addition to BIOS boot) is supported. Hence, I thought it would be worth writing up the latest instructions, for those who are feeling adventurous.

Of course, XenServer officially supports pass-through of GPUs to both Windows and Linux VMs, hence this territory isn't as uncharted as might first appear: pass-through in itself is fine. The wrinkles will be to do with a particular given piece of hardware.

A Short Introduction to PCI Pass-Through

Firstly, a little primer on what we're trying to do.

Your host will have a PCI bus, with multiple devices hosted on it, each with its own unique ID on the bus (more on that later; just remember this as "B:D.f"). In addition, each device has a globally unique vendor ID and device ID, which allows the operating system to look up what its human-readable name is in the PCI IDs database text file on the system. For example, vendor ID 10de corresponds to the NVIDIA Corporation, and device ID 11b4 corresponds to the Quadro K4200. Each device can then (optionally) have multiple sub-vendor and sub-device IDs, e.g. if an OEM has its own branded version of a supplier's component.

Normally, XenServer's control domain, Dom0, is given all PCI devices by the Xen hypervisor. Drivers in the Linux kernel running in Dom0 each bind to particular PCI device IDs, and thus make the hardware actually do something. XenServer then provides synthetic devices (emulated or para-virtualised) such as SCSI controllers and network cards to the virtual machines, passing the I/O through Dom0 and then out to the real hardware devices.

This is great, because it means the VMs never see the real hardware, and thus we can live migrate VMs around, or start them up on different physical machines, and the virtualised operating systems will be none the wiser.

If, however, we want to give a VM direct access to a piece of hardware, we need to do something different. The main reason one might want to is because the hardware in question isn't easy to virtualise, i.e. the hypervisor can't provide a synthetic device to a VM, and somehow then "share out" the real hardware between those synthetic devices. This is the case for everything from an SSL offload card to a GPU.

Aside: Virtual Functions

There are three ways of sharing out a PCI device between VMs. The first is what XenServer does for network cards and storage controllers, where a synthetic device is given to the VM, but then the I/O streams can effectively be mixed together on the real device (e.g. it doesn't matter that traffic from multiple VMs is streamed out of the same physical network card: that's what will end up happening at a physical switch anyway). That's fine if it's I/O you're dealing with.

The second is to use software to share out the device. Effectively you have some kind of "manager" of the hardware device that is responsible for sharing it between multiple virtual machines, as is done with NVIDIA GRID GPU virtualisation, where each VM still ends up with a real slice of GPU hardware, but controlled by a process in Dom0.

The third is to virtualise at the hardware device level, and have a PCI device expose multiple virtual functions (VFs). Each VF provides some subset of the functionality of the device, isolated from other VFs at the hardware level. Several VMs can then each be given their own VF (using exactly the same mechanism as passing through an entire PCI device). A couple of examples are certain Intel network cards, and AMD's MxGPU technology.

OK, So How Do I Pass-Through a Device?

Step 1

Firstly, we have to stop any driver in Dom0 claiming the device. In order to do that, we'll need to ascertain what the ID of the device we're interested in passing through is. We'll use B:D.f (Bus, Device, function) numbering to specify it.

(What this does is edit /boot/grub/grub.cfg for you, or if you're booting using UEFI, /boot/efi/EFI/xenserver/grub.cfg instead!)

Step 2

Reboot! At the moment, a driver in Dom0 probably still has hold of your device, hence you need to reboot the host to get it relinquished.

Step 3

The easy bit: tell the toolstack to assign the PCI device to the VM. Run:

xe vm-list

And note the UUID of the VM you're interested in, then:

xe vm-param-set other-config:pci=0/0000:<B:D.f> uuid=<vm uuid>

Where, of course, <B.D.f> is the ID of the device you found in step 1 (like 04:00.0), and <vm uuid> corresponds to the VM you care about.

Step 4

Start your VM. At this point if you run lspci (or equivalent) within the VM, you should now see the device. However, that doesn't mean it will spring into life, because...

Step 5

Install a device driver for the piece of hardware you passed-through. The operating system within the VM may already ship with a suitable device driver, but it not, you'll need to go and get the appropriate one from the device manufacturer. This will normally be the standard Linux/Windows/other one that you would use for a physical system; the only difference occurs when you're using a virtual function, where the VF driver is likely to be a special one.

Health Warnings

As indicated above, pass-through has advantages and disadvantages. You'll get direct access to the hardware (and hence, for some functions, higher performance), but you'll forgo luxuries such as the ability to live migrate the virtual machine around (there's state now sitting on real hardware, versus virtual devices), and the ability to use high availability for that VM (because HA doesn't take into account how many free PCI devices of the right sort you have in your resource pool).

In addition, not all PCI devices take well to being passed through, and not all servers like doing so (e.g. if you're extending the PCI bus in a blade system to an expansion module, this can sometimes cause problems). Your mileage may therefore vary.

If you do get stuck, head over to the XenServer discussion forums and people will try to help out, but just note that Citrix doesn't officially support generic PCI pass-through, hence you're in the hands of the (very knowledgeable) community.

Conclusion

Hopefully this has helped clear up how pass-through is done on XenServer 7.0; do comment and let us know how you're using pass-through in your environment, so that we can learn what people want to do, and think about what to officially support on XenServer in the future!

About the author

I head up the Product Management and Partner Engineering teams for XenServer at Citrix, spending lots of time talking to customers and engineering teams at OEMs, IHVs ,and ISVs, as well as being responsible for XenServer's OpenStack engineering team. I work in the Cambridge, UK office, having previously spent some time in academia at the University of Cambridge Computer Laboratory. I'm interested in cloud computing, wireless networks, intelligent transport, network virtualisation, Christianity, Spanish, French, Dutch, Finnish, swimming, cycling, and orienteering. Oh, and my wife and I spend lots of time looking after our two small children! I can be found on Twitter as @DavidCottingham .

Author's recent posts

Comments
16

I have done this lots of times. I use a Z800 server that has onboard Firewire. I ported it into a vm for hooking up a camera so my kids could do stop motion animation from a XenApp session. I also have ported the USB into a vm for a keyfob for licensing. Wish it was more publicly known all the cool stuff like this you can do.

0

I have done this lots of times. I use a Z800 server that has onboard Firewire. I ported it into a vm for hooking up a camera so my kids could do stop motion animation from a XenApp session. I also have ported the USB into a vm for a keyfob for licensing. Wish it was more publicly known all the cool stuff like this you can do.

I need to passthrough a quadro 5000 gpu to one of my vms.
Typing lspci in the Xenserver console gives me back two device ID, one for the actual gpu and one for the audio device built in.
Do i need to use this procedure for each device ID?

0

I need to passthrough a quadro 5000 gpu to one of my vms.
Typing lspci in the Xenserver console gives me back two device ID, one for the actual gpu and one for the audio device built in.
Do i need to use this procedure for each device ID?

If you want both the audio and GPU devices given to your VM, then yes, you need to use the procedure once for each device.

However, if you're not looking to have the audio device passed through, then you _should_ be able to just pass-through the GPU device.

0

If you want both the audio and GPU devices given to your VM, then yes, you need to use the procedure once for each device.
However, if you're not looking to have the audio device passed through, then you _should_ be able to just pass-through the GPU device.

I think virtual functions for IO devices like Intel 10+Gbit cards should be supported by Citrix. I haven't measured how much performance gain it does, but it should definitely help offloading CPU. And it is quite simple to implement. The option should be enabled if every server in cluster have supported card with virtual functions and assign automatically available VFs to VMs

0

I think virtual functions for IO devices like Intel 10+Gbit cards should be supported by Citrix. I haven't measured how much performance gain it does, but it should definitely help offloading CPU. And it is quite simple to implement. The option should be enabled if every server in cluster have supported card with virtual functions and assign automatically available VFs to VMs

Understood: it would be a performance gain for at least some use cases, as you're getting raw access to the NIC. The downside is then that you no longer benefit from migration, ACLs on the OVS, and so on.

In terms of implementation, it's more about the testing: each hardware vendor will need to test/certify that pass-through of VFs works with various guest OSs. Certainly not impossible, just a large test matrix.

But yes, supporting pass-through of SR-IOV VFs is something I'm interested in doing. The more people who can tell me what they'd use it for, the more likely it is that it will go up the priority list ;-).

0

Understood: it would be a performance gain for at least some use cases, as you're getting raw access to the NIC. The downside is then that you no longer benefit from migration, ACLs on the OVS, and so on.
In terms of implementation, it's more about the testing: each hardware vendor will need to test/certify that pass-through of VFs works with various guest OSs. Certainly not impossible, just a large test matrix.
But yes, supporting pass-through of SR-IOV VFs is something I'm interested in doing. The more people who can tell me what they'd use it for, the more likely it is that it will go up the priority list ;-).

As I'm very much interested in AMD's MxGPU, I'd like to know, if pci pass-through is the standard/only way how MxGPU works. Is it planned to integrate the setup of GPU-Passthrough in Xencenter in the future like it works for NVIDIA Grid or Intel Iris Pro? (create new vm --> GPU-Tab --> select GPU)

If this is the standard way, how can AMD provide high availability with this solution? Also the live migration feature would be nice.

Thanks for dealing with my questions!

Best regards,

Stefan

0

Hi David,
great article and very interesting!!
As I'm very much interested in AMD's MxGPU, I'd like to know, if pci pass-through is the standard/only way how MxGPU works. Is it planned to integrate the setup of GPU-Passthrough in Xencenter in the future like it works for NVIDIA Grid or Intel Iris Pro? (create new vm --> GPU-Tab --> select GPU)
If this is the standard way, how can AMD provide high availability with this solution? Also the live migration feature would be nice.
Thanks for dealing with my questions!
Best regards,
Stefan

AMD MxGPU uses SR-IOV VFs, hence yes, it does use pass-through. I'm not aware of whether AMD offer HA capabilities or not with this, but remember that HA is there to automatically restart VMs that are dead (due to a host failure) hence provided that a host has an appropriate GPU card in place, theoretically there's no technical problem with HA just starting up a new copy of the dead VM.

The only thing that would need to be done is to get the HA planner algorithm to point out to the user if they have enough GPUs in hosts in the pool to tolerate failures. Today, XenServer's HA planner doesn't have that capability.

Live migration is more difficult, because there's state in the GPU hardware that somehow needs to be migrated.

In terms of future plans, I can't comment on roadmap, but as you'd expect, we're working with AMD on GPU technologies in XenServer (as we already support pass-through of whole AMD GPUs today).

Hope this helps!

Cheers,

David.

0

Hi Stefan,
Thanks!
AMD MxGPU uses SR-IOV VFs, hence yes, it does use pass-through. I'm not aware of whether AMD offer HA capabilities or not with this, but remember that HA is there to automatically restart VMs that are dead (due to a host failure) hence provided that a host has an appropriate GPU card in place, theoretically there's no technical problem with HA just starting up a new copy of the dead VM.
The only thing that would need to be done is to get the HA planner algorithm to point out to the user if they have enough GPUs in hosts in the pool to tolerate failures. Today, XenServer's HA planner doesn't have that capability.
Live migration is more difficult, because there's state in the GPU hardware that somehow needs to be migrated.
In terms of future plans, I can't comment on roadmap, but as you'd expect, we're working with AMD on GPU technologies in XenServer (as we already support pass-through of whole AMD GPUs today).
Hope this helps!
Cheers,
David.

thank you very much for your explanation! Now it is more clear for me.

I'm really looking forward how things develop in xenserver support of AMD MxGPU, because I personally think, that this technology mixed with interesting hardware costs (without continuous licencing fees) will push forward virtualization without suffering a GPU-lack. Intel Iris Pro is a bit weak in context of scalability, I think, and NVIDIA Grid is to expensive for small companies.

I hope, the integration will lead to a user-friendly way in Xencenter, which prevents faults and is solid for a production environment. Please provide updates of new developments in context of AMD GPU-Virtualization (you're the only one, I've found during my ongoing search of the internet, who gives that much insight what is possible at the moment!!!).

Best regards,

Stefan

0

Hi David,
thank you very much for your explanation! Now it is more clear for me.
I'm really looking forward how things develop in xenserver support of AMD MxGPU, because I personally think, that this technology mixed with interesting hardware costs (without continuous licencing fees) will push forward virtualization without suffering a GPU-lack. Intel Iris Pro is a bit weak in context of scalability, I think, and NVIDIA Grid is to expensive for small companies.
I hope, the integration will lead to a user-friendly way in [b]Xencenter,[/b] which prevents faults and is solid for a production environment. Please provide updates of new developments in context of AMD GPU-Virtualization (you're the only one, I've found during my ongoing search of the internet, who gives that much insight what is possible at the moment!!!).
Best regards,
Stefan

I´ve modified the grub.cfg file to hide certain PCI device, and restarted Xen several times, but when I use the lspci command from the console, I still see the device, so I understand that can´t be assigned directly to a VM, am I right?
What else can I do to make it work?
Thanks

0

I´ve modified the grub.cfg file to hide certain PCI device, and restarted Xen several times, but when I use the lspci command from the console, I still see the device, so I understand that can´t be assigned directly to a VM, am I right?
What else can I do to make it work?
Thanks

In Xenserver 7 , I've have done the steps to pass through a pcie device to a VM. I observe that if the VM is Linux Ubuntu the device driver loads fine. However, when I have Windows running in the VM- the device driver couldnt load. The device driver is built in the Linux/Windows OS.
I did not have any issue in Xenserver 6.2. PCI pass through to Windows VM and windows device driver worked fine. I cannot doubt my hardware.
Wondering if it is the DOM0 CentOS7 Combination.
I would like to move to Xenserver 7 and would appreciate any pointers or workarounds.

0

In Xenserver 7 , I've have done the steps to pass through a pcie device to a VM. I observe that if the VM is Linux Ubuntu the device driver loads fine. However, when I have Windows running in the VM- the device driver couldnt load. The device driver is built in the Linux/Windows OS.
I did not have any issue in Xenserver 6.2. PCI pass through to Windows VM and windows device driver worked fine. I cannot doubt my hardware.
Wondering if it is the DOM0 CentOS7 Combination.
I would like to move to Xenserver 7 and would appreciate any pointers or workarounds.

Hi, i I have tried the steps and at Step 5, my device drivers in VMfailed to load on the device.
The same device and driver worked fine on Xenserver 6.2 and 6.5 . But driver loading fails on Xen7.
I am seeing a problem only when the VM is Windows. In Xen 7 , I could pass through to a linux VM and load the in box driver fine., but Windows did not work
What could have changed in Xen 7 related to this.
Any insight would be appreciated. Thanks!

0

Hi, i I have tried the steps and at Step 5, my device drivers in VMfailed to load on the device.
The same device and driver worked fine on Xenserver 6.2 and 6.5 . But driver loading fails on Xen7.
I am seeing a problem only when the VM is Windows. In Xen 7 , I could pass through to a linux VM and load the in box driver fine., but Windows did not work
What could have changed in Xen 7 related to this.
Any insight would be appreciated. Thanks!

Testimonial

"Virtual machines are part of the Grupo Martins IT management culture because the time it takes to create one with XenServer is about 20 minutes."

Flavio Lucio Borges Martins da SilvaCIOGrupo Martins

"Our job is to accommodate all the faculties’ needs as much as possible so we needed to find a solution that could support a large number of applications as well as save storage space and staff resources. This is where Citrix stepped in."

Jose ChanHead of IT DepartmentMacau Polytechnic Institute

Commercial Support

Do you want professional support and service from Citrix? We can help with installation, technical support and optimization of XenServer. Contact Citrix