Windows 8.1 is bringing a new incremental update to the driver model to WDDM 1.3, which will enable incremental new GPU computing functionality. One of the important pieces is the ability to "map default buffer" (which I will call as MDB), which should be particularly interesting for compute shaders running on APUs/SoCs which combine CPU and GPU on a single chip.

We can explain the feature as follows. In a typical discrete card, GPU has it's own onboard graphics memory. The application allocates memory on the GPU buffer, and the shaders read/write data from this memory. The buffers allocated in GPU memory are called "default buffers" in Direct3D parlance. Let us assume the GPU shader has written some output that you want to read on the CPU. Currently this is done in multiple stages. First, the application allocates a "staging buffer", which is allocated by the Direct3D driver in a special area of system memory such that the GPU can transfer data between the GPU default buffers and staging buffers over the PCI Express bus efficiently. GPU copies the data from GPU buffer to the staging buffer. The CPU then issues a "map" command that allows the CPU to read/write from the staging buffer. This multi-stage process is inefficient for APUs/SoCs where the GPU shares the physical memory with the CPU. In Direct3D 11.2, the staging buffer and the extra copy operation will no longer be required on supported hardware and the CPU will be able to access the GPU buffers directly. Thus, MDB will be a big win for many GPU computing scenarios due to the reduced copy overhead on APUs/SoCs.

Intel recently rolled it's own extension called InstantAccess for Haswell. My understanding is that InstantAccess is a bit more general than MDB because InstantAccess allows mapping of textures as well as buffers whereas D3D 11.2 only allows mapping of default buffers but not textures. Extensions similar to MDB are also common in OpenCL. Both Intel and AMD allow the CPU to read/write from OpenCL GPU buffers. In addition, Intel also exposes some ability for the GPU to read/write from preallocated CPU memory which afaik is not allowed in Direct3D yet. The efficiency of different solutions is still a question that we don't know much about. For example, AMD's OpenCL extension allows the CPU to access GPU memory on Llano, but the CPU reads the data from GPU memory at a very slow speed while writing the data is still pretty fast.

UPDATE: Intel confirmed support for MDB on Ivy Bridge onwards.

At this time, there is no official confirmation about which hardware will support MDB. My expectation is that MDB will likely be available on all recent single chip CPU/GPU systems such as AMD's Trinity and Kabini as well as Intel's Haswell and Ivy Bridge. AMD has already rolled out WDDM 1.3 drivers but curiosly those do not work on Llano and Zacate APUs so I am a little pessimistic about whether those APUs will support this new feature. Microsoft for its part only stated that they expect it to be "broadly available" once WDDM 1.3 drivers are rolled out. I will update the article when we get official word from the vendors about the hardware support status.

Apart from MDB, Microsoft has also added support for runtime shader linking. This will be quite useful for both compute and graphics shaders. The idea is that one can precompile functions in the shader before hand and ship the compiled code, while linking can be done at runtime. Separate compilation and linking has been available under CUDA 5 and OpenCL 1.2 as well. Runtime shader linking is a software feature and will be available on all hardware on Windows 8.1.

C++ AMP, Microsoft's C++ extension for GPU computing, has also been updated with the upcoming VS2013. I think the biggest feature update is that C++ AMP programs will also gain a shared memory feature on APUs/SoCs where the compiler and runtime will be able to eliminate extra data copies between CPU and GPU. This feature will also be available only on Windows 8.1 and it is likely built on top of the "map default buffer" as Microsoft's AMP implementation uses Direct3D under the hood. C++ AMP also brings some other nice additions including enhanced texture support and better debugging abilities.

In addition to compute, Microsoft also introduced a number of graphics updates such as tiled resources but we will likely cover those separately. More information about Direct3D changes can be found in preliminary docs for D3D 11.2 and a talk at BUILD.

He was talking about upgrading to 8.1. Not about upgrading hardware. Although some features are only available on newer hardware.

Everyone who has Win 8 will get WDDM 1.3 and DX11.2 though as its a free upgrade from 8 > 8.1. Win 8 already has 13% of Steam users and is already the second most used OS on Steam so gamers are taking to it.Reply

Love my job, since I've been bringing in $82h… I sit at home, music playing while I work in front of my new iMac that I got now that I'm making it online. (Home more information)http://goo.gl/xp09cReply

Would you, as a company that intends to make a profit and so needs to manage its development resources with care, develop for a program you are phasing out? I wouldn't likely do that myself.

I'm not saying you are entirely wrong here, I'm just saying there is likely more to the picture than just selling a newer product.

Don't cut your nose off to spite your face here. :) It's a great benefit, and not just for SoC type uses either, it makes the job of managing graphics engineering better for everyone, and so better for any end-user in terms of getting more from their budget spent on anything involved with making what appears on your screen look its best.

The improvements in efficiency that came from DX11 itself is one of the things that allowed Blizzard to upgrade World of Warcraft's engine and still let lower-end computers handle the game decently. WoW is still mostly a DX9 game, as are most of the games that have some DX11 functionality (it might help to understand that DirectX, and OpenGL, are collections of APIs that a developer can pretty much pick and choose from, they don't have to use all of it to benefit). This change will allow for more of the same kind of thing.Reply

I've never really thought about how a APU might fare in a GPU computing environment, I sort of ignored it thinking 'integrated graphics == rubbish' But that isn't really true anymore. With the reduced memory overhead I think this could actually be quite useful;

In my experience GPU computing is often not applicable unless you are using very big data sets (literally millions of elements) between the transfer overhead and slower clock speed, a GPU only wins out when all of it's processing cores are in use for an extended period. With memory overhead reduced I can see the threshold for porting computation to the GPU lowering quite a bit on some systems.Reply

Ummm. A 1080p screen is millions of elements. Any sort of video editing benefits, even without the MDB.Likewise for editing those high res pics from just about any camera, converting to and from JPEG, etc.. Data sets with millions of elements are really quite common.Reply