At the release of Windows Server 2019 last year, we announced support for a set of hardware devices in Windows containers. One popular type of device missing support at the time: GPUs. We’ve heard frequent feedback that you want hardware acceleration for your Windows container workloads, so today, we’re pleased to announce the first step on that journey: starting in Windows Server 2019, we now support GPU acceleration for DirectX-based apps and frameworks in Windows containers.

The best part is, you can use the Windows Server 2019 build you have today—no new OS patches or configuration is necessary. All you need is a new build of Docker and the latest display drivers. Read on for detailed requirements and to learn how you can get started with GPU accelerated DirectX in Windows containers today.

Background: Why GPU acceleration?

Containers are an excellent tool for packaging and deploying many kinds of workloads. For many of these, traditional CPU compute resources are sufficient. However, for a certain class of workload, the massively parallel compute power offered by GPUs (graphics processing units) can speed up operations by orders of magnitude, bringing down cost and improving throughput immensely.

GPUs are already a common tool for many popular workloads, from traditional rendering and simulation to machine learning training and inference. With today’s announcement, we’re unlocking new app scenarios for Windows containers and enabling more applications to be successfully shifted into Windows containers.

GPU-accelerated DirectX, Windows ML, and more

For some users, DirectX conjures associations with gaming. But DirectX is about more than games—it also powers a large ecosystem of multimedia, design, computation, and simulation frameworks and applications.

As we looked at adding GPU support to Windows containers, it was clear that starting with the DirectX APIs—the foundation of accelerated graphics, compute, and AI on Windows—was a natural first step.

By enabling GPU acceleration for DirectX, we’ve also enabled GPU acceleration for the frameworks built on top of it. One such framework is Windows ML, a set of APIs providing fast and efficient AI inferencing capabilities. With GPU acceleration in Windows containers, developers now have access to a first-class inferencing runtime that can be accelerated across a broad set of capable GPU acceleration hardware.

Usage

On a system meeting the requirements (see below), start a container with hardware-accelerated DirectX support by specifying the --device option at container runtime, as follows:

Note that this does not assign GPU resources exclusively to the container, nor does it prevent GPU access on the host. Rather, GPU resources are scheduled dynamically across the host and containers in much the same way as they are scheduled among apps running on your personal device today. You can have several Windows containers running on a host, each with hardware-accelerated DirectX capabilities.

Requirements

For this feature to work, your environment must meet the following requirements:

The container host must be running Windows Server 2019 or Windows 10, version 1809 or newer.

The container base image must be mcr.microsoft.com/windows:1809 or newer. Windows Server Core and Nano Server container images are not currently supported.

The container must be run in process isolation mode. Hyper-V isolation mode is not currently supported.

The container host must be running Docker Engine 19.03 or newer.

The container host must have a GPU running display drivers version WDDM 2.5 or newer.

To check the WDDM version of your display drivers, run the DirectX Diagnostic Tool (dxdiag.exe) on your container host. In the tool’s “Display” tab, look in the “Drivers” section as indicated below.

Getting started

Operating system support for this feature is already complete and broadly available as part of Windows Server 2019 and Windows 10, version 1809. Formal Docker support is scheduled for the upcoming Docker EE Engine 19.03 release. Until then, if you’re eager to try out the feature early, you can check out our sample on GitHub and follow the README instructions to get started. We’ll show you how to acquire a nightly build of Docker and use it to run a containerized Windows ML inferencing app with GPU acceleration.

Going forward

We look forward to getting your feedback on this experience. Please leave a comment below or tweet us with your thoughts. What are the next things you’d like to be able to do with GPU acceleration in containers on Windows?

does it supports hardware acceleration for video encoding with Microsoft Media Foundation and hardware acceleration for Video decoding with DXVA 2.0 ?

Thanks for your question! Currently Windows containers do support hardware-accelerated video decode using DXVA, but do not support hardware-accelerated video encode using Media Foundation Transforms. We're investigating enabling the latter in an upcoming release.

Tensorflow itself is just an ML framework that you can accelerate with a GPU run time as the back-end (so you could, for example, run Tensorflow right now in a Windows container and have it use the CPU--but that's probably not very interesting to you). In the context of running Tensorflow workloads on the GPU, which GPU back-end is of interest to you?

@Craig Wilhite Hello. Are you providing GPU to the container or just doing some king of mapping for DirectX? I mean is GPU available inside container with this technology? For example is it possible to add GPU drivers to container and use specific API, like NVIDIA Video Codec SDK?

@Craig Wilhite Hello. Are you providing GPU to the container or just doing some king of mapping for DirectX? I mean is GPU available inside container with this technology? For example is it possible to add GPU drivers to container and use specific API, like NVIDIA Video Codec SDK?

Hi Ivan. Thank you for your question. The technology enabling this configuration is essentially "providing a GPU to the container;" it's not just a DirectX API forwarding layer. As of today we only officially support GPU for DirectX inside a Windows container, but we understand there are plenty of container workloads that use non-DirectX APIs. So we're actively investigating support for those non-DirectX APIs, such as NVIDIA's as you mentioned.

@rickman_MSFTHi. Thank you for this explanation. You say "officially support GPU for DirectX", so "unofficially" I could try to use this technology with other API? Or you restrict this somehow at Docker level? Just want to do some experiments with other API.

"unofficially" I could try to use this technology with other API? Or you restrict this somehow at Docker level? Just want to do some experiments with other API.

I'm not aware of anything at the Docker or OS level intentionally restricting GPU acceleration with these other APIs, however I would not expect them to work. I believe there is missing OS and/or driver functionality that would be required to make these work; this is the focus of our investigations into enabling them.

invalid argument "class/5B45201D-F2F2-4F3B-85BB-30FF1F953599" for "--device" flag: class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 is not an absolute pathSee 'docker run --help'.

How to set the --device?

This error occurs if you're not running a Docker client/engine that supports '--device' arg for Windows containers. This functionality is not yet available in non-edge editions of Docker Desktop for Windows. Have you verified you're running the latest version of Docker Desktop for Windows Edge?

@docker1575, process isolation on Windows 10 should work for dev/test workflows. We no longer outright block process isolation mode on Windows 10 client SKUs in Docker, but it's not a production-supported scenario. The point is, everything described in this blog post should work on Windows 10 if you're running a version 1809 or newer host with the latest Docker engine.

Are you asking about a performance comparison between DirectX and CUDA? Performance depends on a number of different factors (the model being evaluated, input types, device hardware, graphics drivers, etc.) so results tend to be specific to a developer's unique scenario. However, the developers behind DirectX and the Windows AI stack (WinML, DirectML, and related technologies) work extremely closely with hardware vendors to ensure consistent results and performance across the broad range of Windows devices and GPUs.

2. Our code is all in `TensorFlow`, running inside `nvidia-docker`. Can you please elaborate on how hard would it be to port over the models?

Again, this depends on the details of your unique situation, but Microsoft does provide tools for porting models to ONNX, the Open Neural Network Exchange format. You can learn more about model conversion here: Convert ML models to ONNX with WinMLTools.

Does it support to install NVIDIA Driver in the container to using NVIDIA??

Thanks for your question. I can see two possible ways to interpret your question, so I will answer both interpretations:

1. Does this enable my Windows containers to get hardware acceleration on NVIDIA GPUs?

Yes. If you have NVIDIA drivers installed on the container host (that meet the requirements described in the blog post), then when you run a container with the --device parameter as described in the blog post, your apps can get hardware acceleration whenever they use the DirectX graphics and compute APIs inside the container. You don't even need to install those drivers in the container; Docker automatically makes the right drivers from the host available to the container.

2. Does this enable my Windows containers to get hardware-accelerated CUDA?

No. Hardware acceleration is currently only supported for the DirectX APIs (and higher-level APIs built on DirectX) but does not include CUDA or similar APIs. We've heard lots of feedback that customers are interested in those, and we're actively investigating support for those.

I second the need to support CUDA in docker. Our researchers need to accelerate machine learning code written e.g. in tensorflow using CUDA, or run scientific simulations that use the GPU. It would be great if CUDA and GPU passthrough could be supported in docker and in WSL 2. Currently we are forced to used linux because of this, but would prefer to use Windows Server, if GPU passthrough becomes possible.

Kubernetes seems to launch containers already in process isolation mode, corresponding to the --isolation process option you are mentioning in the article. But what is the Kubernetes equivalent of the --device class/5B45201D-F2F2-4F3B-85BB-30FF1F953599 option? We see no need for a Kubernetes device plugin, which seems to be the wrong way anyway, since it would assign GPU resources exclusively to a container.

We can successfully run our application in a GPU accelerated Windows container via docker run. We can also create Windows containers via Kubernetes, but without GPU access yet. Optimistically hoping that it is only a small thing we are missing.

Is it possible to launch such GPU accelerated Windows containers via Kubernetes?

Unfortunately, Kubernetes does not yet support resource allocation and enablement of GPU acceleration for Windows containers. It's something we're actively looking into, however, as we know many container customers prefer to deploy their containers using Kubernetes.