Archives

Sharing resources between DirectX and OpenGL

I’ve recently had a need to simultaneously render using both DirectX and OpenGL. There are a number of reasons why this is useful, particularly in tools where you may wish to compare multiple rendering engines simultaneously. With this technique it is also possible to efficiently perform some rendering operations on one API to a render target, and switch to the other API to continue rendering to that render target. It can also be used to perform all rendering in a specific API, while presenting that final render target using another API. Providing direct access to textures and render targets in graphics memory regardless of API has the potential of efficiently pipelining surfaces through multiple discrete renderers.

Generally speaking it would be advantageous to share resources (particularly surface data) between both applications. This can be achieved many ways with varying degrees of performance. However the ideal scenario would be to load only a single copy of the texture data into graphics memory, and sample from that same texture data using both DX and OGL calls at the same time. Fortunately there is an OpenGL extension to do exactly this, called WGL_NV_DX_interop for DX9, and WGL_NV_DX_interop2 for DX10+.

This extension has historically had spotty hardware vendor support. It was initially proposed by Nvidia and support was later added to AMD graphics drivers. With the latest drivers for Intel’s current chips (HD Graphics 4200+), compatibility for this extension has been added. I’ve tested this code on an Intel HD 4400, AMD 6950, and Nvidia GTX 570. One major caveat is that I have not been able to successfully present a shared render target that is simultaneously being sampled, even with the synchronization objects available in the extension. The render target will be updated, and the sample will happen correctly, but some unknown limitation prevents the original render target from being presented. The code samples below perform a GPU copy from video memory to video memory in order to present both the DX and GL rendered results. This GPU copy is significantly faster than any CPU copy (e.g. memcpy), and is only possible through this sharing extension. However, an ideal implementation should be able to avoid this copy.

Let’s look at the special considerations for initializing DX and OGL. This demo uses DX9, however DX10+ support should also be available (I have not personally tested this). For DX9, it is important that a D3D9EX device is used (instead of a D3D9 device). D3D9EX is available in Windows 7+ and adds a new more efficient flip mode; but more importantly it enables the creation of shared resources. These resources are indicated by the handles returned by CreateOffscreenPlainSurface or CreateTexture. In the code below, CreateOffscreenPlainSurface in InitDX() will generate a shared handle, that is later registered in InitGL() using the extension function wglDXSetResourceShareHandleNV. This is what makes the GL renderer aware of the DX resource. This handle represents the GPU video memory that backs the respective DX/GL textures. In this demo, the DX renderer’s shared texture is called g_pSharedSurface and it’s handle is called g_hSharedSurface, while the GL renderer’s texture is called g_GLTexture and the corresponding handle is called g_hGLSharedTexture. Before registering the shared resource with GL, the DX device must be associated using wglDXOpenDeviceNV.

// g_pSharedSurface should be able to be opened in OGL via the WGL_NV_DX_interop extension// Vendor support for various textures/surfaces may vary
hr = g_pDevice->CreateOffscreenPlainSurface(rtDesc.Width,
rtDesc.Height,
rtDesc.Format,
D3DPOOL_DEFAULT, &g_pSharedSurface, &g_hSharedSurface);

After the textures are created as shared surfaces and these surfaces are registered with the GL renderer, we can begin rendering. The most important consideration here is proper synchronization of reads/writes of the shared surface between the two renderers. This example shows the DX renderer writing a triangle to a render target, which is then copied to the shared surface, and the GL renderer reading from that shared surface and texturing a quad with it. Because of the potential read-after-write hazards, it is necessary for the GL renderer to acquire a lock on the shared surface via wglDXLockObjectsNV. This lock does not result in the surface being copied to CPU space (as would happen with a normal DX9 Lock operation. Instead, this lock triggers the GPU to perform the necessary flushing and stalling to guarantee that the surface has finished being written to before reading from it. This is necessary even if the DX and GL renderers operate sequentially (i.e. not multi-threaded), because the rendering commands scheduled on the GPU execute asynchronously.

// Copy the render target to the shared surface// StretchRect between two D3DPOOL_DEFAULT surfaces will be a GPU Blt.// Note that GetRenderTargetData() cannot be used because it is intended to copy from GPU to CPU.
hr = g_pDevice->StretchRect(g_pSurfaceRenderTarget, NULL, g_pSharedSurface, NULL, D3DTEXF_NONE);
hr = g_pDevice->EndScene();
hr = g_pDevice->Present(NULL, NULL, NULL, NULL);}

The final tear down step is extremely straight forward. The call to wglDXUnregisterObjectNV will disassociate the shared resource with GL, and wglDXCloseDeviceNV will close the device that created the shared surface.