DirectX 12 Detailed; DX11 GPU Compatibility & More

Microsoft has taken the covers off of their DirectX 12 API and it features several surprises. Among them, backwards compatibility with DX11 GPUs.

DirectX 11 has been with us for a long time. It was first rolled out in 2009 alongside Windows 7 but it took a while to gain traction as compatible graphics processors and supporting game titles slowly made their way to market. This gradual progression from launch to broad-scale acceptance proved to be a challenge for Microsoft and developers alike. Now, the upcoming DirectX 12 is looking to improve upon the lessons learned with DX11 and offer developers what they've been asking for: easier system-level access and an immediate installed user base.

Let's start with DX12's goals before getting into how this will affect gamers. First and foremost, Microsoft's new API was designed over the course of four years with the continual input from game developers, programmers and hardware manufacturers. This was a truly collaborative effort that required the buy-in of everyone rather than Microsoft simply saying "here's a new API, it's what we think you want".

Game developers are a fickle bunch since they've always had a list of demands that were impossible to accomplish with previous iterations of DirectX and Direct3D. Primary among their requests has always been more direct control over a system's hardware resources through a thinner, efficient, more transparent API. DX11 took some steps towards this addressing those shortcomings but DX12 has been designed from the ground up to provide an infrastructure that can better access multi core CPUs and advanced GPU hardware features.

DirectX's reduction of resource overhead is accomplished through the use of a minimalist application level programming interface along with new features designed to streamline the rendering pipeline. This will not only bring about significantly improved multithread scaling and CPU utilization but also allow modern graphics cards to better interact with other system components. If all of this sounds familiar, that's because AMD's Mantle pretty much promises the same thing, but is exclusive to Radeon GPUs.

As with any current application, DirectX 12 has some serious cross-platform aspirations. With the direct involvement of NVIDIA, AMD, Intel, Qualcomm and other industry players from all walks of life, Microsoft is targeting the PC, mobile, tablet and console markets in one swoop. This is particularly important for the XBox One since the native efficiencies built into DirectX 12 could address its somewhat lackluster hardware specs which have already become a bone of contention in the battle against Sony's Playstation 4.

So how does Microsoft plan to get DX12 into the hands of as many gamers as possible when it launches sometime in late 2015 (likely during the holiday season)? They're offering complete backwards compatibility with current generation DX11 graphics cards. That's right: NVIDIA's Fermi, Kepler and Maxwell GPUs along with AMD's GCN-based Radeon cards will all support DX12 through future driver updates. That's HUGE news folks!

Thus far so no operating system support has been announced but considering DX12's late 2015 launch date coinciding with Microsoft's Windows 9 roadmap, we'll likely see it rolled into both the Windows 8 and 9 OSes.

According to additional press releases rolled out today, currently all DX12 systems at the Game Developer's Conference are running with TITAN Black cards, pointing towards a close relationship between Microsoft and NVIDIA. However, from what we hear, AMD has also been front and center during the DirectX 12's evolution, providing key input and several integral features from Mantle will be integrated directly into Microsoft's new API. It should be noted however that DX12 and by extension Direct3D 12 is NOT Mantle. Rather, while their goals are the same, the way each interface managed by developers is quite different. We'll have a more thorough analysis at a later date.

Below we have reposted Microsoft's detailed rundown from the MSDN blog. It's certainly an interesting read and if you're interested in a more nuanced look at the technology, we highly suggest you look it over.

What's the big deal?
DirectX 12 introduces the next version of Direct3D, the graphics API at the heart of DirectX. Direct3D is one of the most critical pieces of a game or game engine, and we’ve redesigned it to be faster and more efficient than ever before. Direct3D 12 enables richer scenes, more objects, and full utilization of modern GPU hardware. And it isn’t just for high-end gaming PCs either – Direct3D 12 works across all the Microsoft devices you care about. From phones and tablets, to laptops and desktops, and, of course, Xbox One, Direct3D 12 is the API you’ve been waiting for.

What makes Direct3D 12 better? First and foremost, it provides a lower level of hardware abstraction than ever before, allowing games to significantly improve multithread scaling and CPU utilization. In addition, games will benefit from reduced GPU overhead via features such as descriptor tables and concise pipeline state objects. And that’s not all – Direct3D 12 also introduces a set of new rendering pipeline features that will dramatically improve the efficiency of algorithms such as order-independent transparency, collision detection, and geometry culling.

Of course, an API is only as good as the tools that help you use it. DirectX 12 will contain great tools for Direct3D, available immediately when Direct3D 12 is released.

We think you’ll like this part: DirectX 12 will run on many of the cards gamers already have.

Is this marketing spin? We (the product team) read the comments on twitter and game development/gamer forums and many of you have asked if this is real or if our marketing department suddenly received a budget infusion. Everything you are reading is coming directly from the team who has brought you almost 20 years of DirectX.

It’s our job to create great APIs and we have worked closely with our hardware and software partners to prove the significant performance wins of Direct3D 12. And these aren’t just micro-benchmarks that we hacked up ourselves – these numbers are for commercially released game engines or benchmarks, running on our alpha implementation. The screenshots below are from real Direct3D 12 app code running on a real Direct3D 12 runtime running on a real Direct3D 12 driver.

3DMark – Multi-thread scaling + 50% better CPU utilization If you’re a gamer, you know what 3DMark is – a great way to do game performance benchmarking on all your hardware and devices. This makes it an excellent choice for verifying the performance improvements that Direct3D 12 will bring to games. 3DMark on Direct3D 11 uses multi-threading extensively, however due to a combination of runtime and driver overhead, there is still significant idle time on each core. After porting the benchmark to use Direct3D 12, we see two major improvements – a 50% improvement in CPU utilization, and better distribution of work among threads.

Direct3D 11:

Direct3D 12:

Forza Motorsport 5 Tech Demo – console-level efficiency on PC Forza Motorsport 5 is an example of a game that pushes the Xbox One to the limit with its fast-paced photorealistic racing experience. Under the hood, Forza achieves this by using the efficient low-level APIs already available on Xbox One today. Traditionally this level of efficiency was only available on console – now, Direct3D 12, even in an alpha state, brings this efficiency to PC and Phone as well. By porting their Xbox One Direct3D 11.X core rendering engine to use Direct3D 12 on PC, Turn 10 was able to bring that console-level efficiency to their PC tech demo.

Where does this performance come from? Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go closer to the metal than ever before. We accomplished this by overhauling numerous areas of the API. We will provide an overview of three key areas: pipeline state representation, work submission, and resource access.

Pipeline state objects Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a convenient, relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.

Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.

Command lists and bundles In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.

Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.

In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.

Descriptor heaps and tables Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create “view” objects of resources, then bind those views to several “slots” at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.

Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.

In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.

MSI's Gaming 6G may not be the fastest GTX 980 Ti we've ever come across but it happens to be one of the best. Performance, cooling, overclocking and acoustics have all been perfected with this card....