NVIDIA’s VGX cards bring big graphics performance to virtual machines

However, the new hardware and software is not without limitations.

Though virtual machines have become indispensable in the server room over the last few years, desktop virtualization has been less successful. One of the reasons has been performance, and specifically graphics performance—modern virtualization products are generally pretty good at dynamically allocating CPU power, RAM, and drive space as clients need them, but graphics performance just hasn't been as good as it is on an actual desktop.

NVIDIA wants to solve this problem with its VGX virtualization platform, which it unveiled at its GPU Technology Conference in May. As pitched, the technology will allow virtual machines to use a graphics card installed in a server to accelerate applications, games, and video. Through NVIDIA's VGX Hypervisor, compatible virtualization software (primarily from Citrix, though Microsoft's RemoteFX is also partially supported) can use the GPU directly, allowing thin clients, tablets, and other devices to more closely replicate the experience of using actual desktop hardware.

Enlarge/ NVIDIA's VGX K1 is designed to bring basic graphics acceleration to a relatively large number of users.

NVIDIA

When last we heard about the hardware that drives this technology, NVIDIA was talking up a board with four GPUs based on its Kepler architecture. That card, now known as the NVIDIA VGX K1, is built to provide basic 3D and video acceleration to a large number of users—up to 100, according to NVIDIA's marketing materials. Each of this card's four GPUs uses 192 of NVIDIA's graphics cores and 4GB of DDR3 RAM (for a total of 768 cores and 16GB of memory), and has a reasonably modest TDP of 150 watts—for reference, NVIDIA's high-end GTX 680 desktop graphics card has a TDP of 195W, and the dual-GPU version (the GTX 690) steps this up to 300W.

Where the VGX K1 serves many users with basic graphics functionality, the all-new VGX K2 can go the opposite direction and give a few users a lot of graphics power. The new card has two GPUs with 1536 cores and 4GB of GDDR5 RAM each, and has a TDP of 225W—the card is being marketed as two of NVIDIA's Quadro K5000 workstation graphics cards stuck together.

Enlarge/ The VGX K2 is designed to bring workstation-class graphics performance to virtual machines, but only for two users at once, and only using Citrix products.

NVIDIA

For the VGX K2, NVIDIA's hypervisor supports a "pass-through mode" that can give full control of one of the GPUs to one virtual machine at a time—this means that, as of this writing, only two users can actually use the VGX K2 in this way at once, but each of those users has the same capabilities and power that they would have if they were sitting in front of an actual workstation.

These functions are enabled using the same basic drivers that NVIDIA uses on the desktop, so when used with compatible virtualization software—again, Citrix products are the main example, particularly XenDesktop 5.6 FP1 and XenServer 6—VMs have access to Direct3D, OpenGL, OpenCL, CUDA, and any other feature you'd normally have on a full-fledged desktop, allowing for full use of CAD, Adobe, and other GPU-accelerated applications from within virtual machines. If you're using Microsoft's HyperV and RemoteFX, the functionality is a bit more limited—you can support multiple users per GPU, but only Direct3D graphics acceleration is supported.

NVIDIA says that the VGX K2 will eventually support pass-through mode features for multiple users per GPU, but that the software to enable that feature still needs work—software updates for NVIDIA's and Citrix's software should enable this feature next year. Obviously, the amount of power available to each user will decrease as the number of users increases, but for people and applications with more modest demands it should increase the bang-to-buck ratio for these cards. Due to its weaker hardware, no support for pass-through mode is planned for the VGX K1 card.

Despite the promised performance, NVIDIA says the bandwidth requirements for the VGX K2's features won't be too onerous. Will Wade, NVIDIA's Senior Product Manager for Quadro Virtualization and Remoting, mentioned on the call that he was currently using a VGX-enabled virtual machine from his home office, and that bandwidth requirements for good performance could be measured in megabits, rather than tens of megabits.

These cards both have a lot of promise—laggy video and nonexistent 3D support has long been a limitation to the user experience in virtual machines, and the VGX K1 looks like a solid and cost-efficient way to improve this situation. The VGX K2, on the other hand, strikes me as a bit more of a niche product—very useful for heavy users who work remotely while traveling or in their home offices, but perhaps less so for on-site users, especially if you have a lot of them. Increased buy-in from vendors would also be nice to see, and NVIDIA says that the company is working on it: while Citrix is the sole vendor of fully compatible solutions today, Wade says that VMware is "right on their heels" and will be announcing compatibility early next year. The company is also working with Microsoft and Red Hat, though those solutions are far off enough that no availability information is available.

NVIDIA is making the VGX K2 available to its OEM partners (including Cisco, Dell, HP, IBM, and Supermicro) now, and servers using the card should begin appearing on the market by the beginning of 2013.

49 Reader Comments

The day I can deliver a fully featured desktop experience via thin client over our School network would be the day I could die happy. Furthermore if the student had good fibre connection and could use their School desktop session at home life would be even more wonderful. Virtualisation is will probably put me out of a job one day, but I still love it.

The day I can deliver a fully featured desktop experience via thin client over our School network would be the day I could die happy. Furthermore if the student had good fibre connection and could use their School desktop session at home life would be even more wonderful. Virtualisation is will probably put me out of a job one day, but I still love it.

The day I can deliver a fully featured desktop experience via thin client over our School network would be the day I could die happy. Furthermore if the student had good fibre connection and could use their School desktop session at home life would be even more wonderful. Virtualisation is will probably put me out of a job one day, but I still love it.

If you can implement virtualization in a way that puts you out of your job, I'm sure you will be in very high demand.

At what price? My IT group has been wanting to try virtualizing the engineering workstations (3D CAD and FEA mainly) for a few years now, but the performance has been unusable. The K2 might solve the performance issue, but at what price? Will it cost more than 2 dedicated workstations?

I'd be interested to learn more about the secret sauce used to make this work on remote ICA sessions on reasonable bandwidth and to carve up the K1 into little chunks.

With recent hypervisor software, and certain chipset features, hanging a physical PCI(e) device directly off a specific VM is doable and standard. Very handy for high speed NICs and such in server virtualization, can also be quite useful for desktop VM situations where you want to use your Windows virtual machine to run some collection of windows-only software and a few windows-only peripheral cards.

However, that doesn't account for what Nvidia is promising here, which presumably incorporated direct peripheral attachment; but then goes on to do something clever with it that allows networked clients to make use of the 3d acceleration functions, and with relatively minimal bandwidth no less. How does that work?

I'd be interested to learn more about the secret sauce used to make this work on remote ICA sessions on reasonable bandwidth and to carve up the K1 into little chunks.

With recent hypervisor software, and certain chipset features, hanging a physical PCI(e) device directly off a specific VM is doable and standard. Very handy for high speed NICs and such in server virtualization, can also be quite useful for desktop VM situations where you want to use your Windows virtual machine to run some collection of windows-only software and a few windows-only peripheral cards.

However, that doesn't account for what Nvidia is promising here, which presumably incorporated direct peripheral attachment; but then goes on to do something clever with it that allows networked clients to make use of the 3d acceleration functions, and with relatively minimal bandwidth no less. How does that work?

At a guess, each VM renders to an independent framebuffer and the hypevisor streams it out with some VNC-esque protocol. The interesting part here is that the rendering is done by the GPU.

Drivers & HW already knows how to context switch between independent applications in a single OS, each rendering to their own buffers to later be composited. The driver in the HV and each client VMs could be extended to do this for 'm' apps running in 'n' VMs, since from the GPU's perspective its still just a bunch of apps submitting rendering commands. You 'just' (famous last words) track VM-ID in addition to the PID for each and keep IOMMUs updated or sanitize DMA commands, etc.

Since NV designed their own chip, my guess is this could them be accelerated by duplicating the front end register page, so that each VM's driver pokes commands into their own HW direct-mapped FIFOs. Scheduling logic on the card itself would know how to arbitrate state and execution across the available processing cores - whether time sliced or the 700-odd CUDA cores are partitioned to users.

I guess each of the 100 clients is guaranteed at least either all ~700 cores for 1% of the time, ~7 cores all the time, or some blend of the two extremes depending on the total system load at that particular instant.

Oh, and you be extra careful about DMA locations and GPU lockups, since it'd likely be less tolerated for faults to escape a VM into another than if you got cross-app leakage/crashes on a standard desktop system.

At least, thats conceptually how I'd do it if I wasn't a SW guy and could design HW!

When are we desktop users gonna get some love? I want a full Win98SE VM that will give me enough performance i can play all the classic Win9x games with full graphics!

You already can with VMware workstation. It passes D3D and OGL calls through a wrapper to your normal graphics card. Many, though not all, games run just fine.

Also the assumption that you don't have video / 3D acceleration in VDI VMs is a bit false. VMware, HyperV, and Citrix XenDesktop do a great job at running Aero and mid-grade video inside virtual machines. This covers about 80% or more of desktop computing in the workplace. What they don't do is full screen HD video and workstation class 3D for visualization, rendering, and engineering tasks. Everything else gets 3D acceleration through very efficient software emulation just fine.

Most proper VDI setups I have worked with get better Aero graphics scores than cheap low end desktops with integrated graphics (pre-Ivy Bridge).

Running Windows 9x-era games at decent frame rates should already be possible. Around 2003 I used to play the PC version of "Escape from Monkey Island" on a 900 MHz G3 iBook (in VPC 6 if I remember correctly). Modern emulators even do "3D acceleration" (and I'm not talking about the "3D acceleration" that "Escape from Monkey Island" supports ;-)

The dirty secret of "virtual desktops" is that they end up costing as much as a "real" desktop, when all is said and done.

The only real advantage of virtual desktops is easier management of those desktops. You can do neat things like installing software on one virtual desktop, and automatically have it installed on ALL virtual desktops. No "deployment" or screwing around with installers or making your own .msi packages. That's pretty nice. But it ain't cheap.

At what price? My IT group has been wanting to try virtualizing the engineering workstations (3D CAD and FEA mainly) for a few years now, but the performance has been unusable. The K2 might solve the performance issue, but at what price? Will it cost more than 2 dedicated workstations?

We have thought similar things.. However we (IT guys) think that doing so would be bad for business. We have lots of engineers distributed geographically all of the US. So the high end CAD stations cost us $6k.. But, what happens when/if the network goes down and you have all the 6 figure engineers sitting there at their boxes not able to access the VM'ed workstation? Desktop, normal user/shop floor thin client, sure yeah I can see that but high end engineering done in VM? I cant say that I would support such a thing. I would like to see a demo of such a thing.

What solutions are there for "home" users to run thin clients that are still powerful enough to play Youtube videos and local MKV files? I don't want to game on the thin clients, but I definitely want as near "dummy" machines as possible that can still watch lag-free video and play local video files.

I'm certain Microsoft and Citrix would love to sell me a solution, but this is for home use and I'm looking to mostly save money vs. spend a bunch more.

The dirty secret of "virtual desktops" is that they end up costing as much as a "real" desktop, when all is said and done.

The only real advantage of virtual desktops is easier management of those desktops. You can do neat things like installing software on one virtual desktop, and automatically have it installed on ALL virtual desktops. No "deployment" or screwing around with installers or making your own .msi packages. That's pretty nice. But it ain't cheap.

Frankly, in almost every case, Terminal Services is better than VDI.

I thought the end of the article was a bit odd. This would be *especially* useful when you have tons of on-site users. You get all the flexibility of VMs, and all the graphics performance of workstation-class cards.

When are we desktop users gonna get some love? I want a full Win98SE VM that will give me enough performance i can play all the classic Win9x games with full graphics!

I run Windows 7 in a virtual machine and can play every modern game with no frame rate issues so you can certainly play Win9x era games. We've been able to do VGA passthrough with Linux hypervisors for a long time now.

My HD5770 runs 20-year old DOS classics beautifully under Win7x64 (through DOS box, of course) and some 99%+ of those games run great at 1920x1200... For the 16-bit Windows games that you presumably want to run, what about installing a separate boot partition with Win7 x86? It's been awhile since I thought about it, but don't the 32-bit OS versions for Vista on up support Win98-era Win16-bit executables natively?

Heck, even direct GLIDE emulation under D3d is no problem these days (with nGlide.) Current middle-of-the-road machines have so much more gpu and cpu horsepower than the older machines that even emulating the most demanding of Win98-era d3d games should barely cause a current machine (built within the last 3-5 years) to break a sweat...

In fact, in terms of my gaming, lately I've been doing little besides running much older games--lots of fun, there. I won't go out of my way to run an emulation--but these days running DOS & Win16 games, even GLIDE games, requires very little setup on the part of the end user.

Finally. I wanted this card two years ago. It solves the fundamental problem of playing two games on one machine. Seems like a novelty but even an SLi card your can't do that despite having twice the power.

With the Oculus Rift around the corner this might just be the thing to have in a gaming setup that can drive the two independent video channels. So I hope they go mainstream with this.

When my previous employer was testing servers for doing 3D work on (fluid dynamics pre-processing, structural FE modelling, and so forth), we found that the display lag made it very uncomfortable to work with 3D graphics. Spinning models, zooming, basic things like that were ... just not pleasant. Probably partly it was the weaker graphics on the server, but I suspect probably the network to a great extent - I don't believe it's reasonable to expect the same 'speed' in screen refreshes over a network as over a local display cable *.

How does the sort of card this article's about handle that? Is it relevant?

I also have some ideas for web apps that need server side rendering. This might make them possible. :-)

You can already do this pretty easily now without buying a crazy expensive VGX card. I no longer dual boot Linux and Windows. I just run Windows 7 in a virtual machine and all of my games run great. Usually, there's no performance difference but if there is its less than 5%. Games that are more CPU bound take the bigger hit.

So is there anything new in the hardware or is this just Nvidia finding yet another way to charge extra for a sli or dual sli card with some different drivers ?

My bet is on the card having multiple instantiations of the front-end registers, FIFOs, etc - one mapped into each guest VM. This way the client drivers can directly bang on the HW to get stuff done. The HW, perhaps under guidance of, or in conjunction with the HV then schedules the work to be done by the shared backend of the card (all the cuda cores, etc).

Kind of like intel's hyperthreading or AMD's new shared FPU stuff - two frontends allow the OS to schedule two independent threads, and then the HW decides how to map them on to the shared pool of ALUs.

This removes the need for each guest OS to ask the HV to submit work to the card on his behalf (performance++), and could explain why there's an upper limit on the number of users. If it's just driver trickery you can have as many users as you have GPU time to share and RAM to store their contexts you must manually manage.

Running Windows 9x-era games at decent frame rates should already be possible. Around 2003 I used to play the PC version of "Escape from Monkey Island" on a 900 MHz G3 iBook (in VPC 6 if I remember correctly). Modern emulators even do "3D acceleration" (and I'm not talking about the "3D acceleration" that "Escape from Monkey Island" supports ;-)

Seems this would be perfect for your own private LAN version of OnLive.

This. So this. I wish nVidia would enable something just like this. You have your display-less gaming server that streams out gaming to each of your HTPC's, your ultrathin laptop, your ITX computer in your bedroom, your tablet, to whatever.

You put all the noise and heat in one box in a room (or closet) you never go into. You have your silence and convenience everywhere else. You only upgrade one box to play games instead of trying to keep every device ahead of the curve.

And you stream the games over your personal network so you incur a lot less latency than with something like OnLive or Gaikai.

Yeah, I wish they would have the vision to see how this keeps people buying their discrete GPU's forever.

It sounds great on the workstation point of view, and I do agree with people complaining about desktop virtualization performances.When I run a Windows XP VM in VMware Fusion on top of Mac OS X 10.6.8, on my Mac Pro (Xeon 2.8GHz, 12 GB RAM), I can't play any games I want. I've tried Half Life Lost Coast, and the very recent beta of Firefall (red 5 studios), and none would be useable at all. I'm not in the mood for a Left 4 Dead 2 try, and I would really love playing Dishonored without having to reboot my +40 days up Mac onto Windows.Modern 3D games over virtualized OS is just not possible yet, even on top of good hardware, and I'm waiting for the day I'll be able to use a video card at full power into a VM.My Dream Workstation runs a powerfull baremetal hypervisor, lets me run few VMs at the same time (FreeBSD, Windows, Mac OS…), and allows me to use the video card as if there was no virtualization at all. The VGX K2 looks promising.

That's been done with Xen and VMware for at least two years. Not sure if it works well enough with KVM yet.Virtualbox doesn't have [workable] 3D yet AFAIK.

That's the problem I'm having. Ubuntu is my primary OS and for some reason VM Ware doesn't compile for me (maybe it doesn't like my hardware?), so I'm stuck with VirtualBox. I can't do Xen because that's basically an OS. I'd have to either dual boot or migrate my Ubuntu to a VM and I don't like either of those solutions. I've never used Xen before so I don't know much about it.

Is this something that will support VMWare Fusion & Parallels eventually?

I don't think so.- the VGX K1's target is the datacenter, the VGX K2's target is probably the datacenter too, or at least a powerfull shared server.- the price of VGX will probably exceed the price of a decent PC + a good graphic card, users of Fusion would probably rather buy a small dedicated PC for their alternate OS, instead of buying a VGX K2.- for now, VGX K2 is OEM-only, and Apple is not listed here, so unless you buy a PC server (HP, Supermicro, IBM, Cisco…) you won't be able to get one. None of these will run Mac OS X, so none of them will run VMWare Fusion.

Andrew Cunningham / Andrew has a B.A. in Classics from Kenyon College and has over five years of experience in IT. His work has appeared on Charge Shot!!! and AnandTech, and he records a weekly book podcast called Overdue.