Name
WGL_NV_gpu_affinity
Name Strings
WGL_NV_gpu_affinity
Contact
Barthold Lichtenbelt, NVIDIA (blichtenbelt 'at' nvidia.com)
Notice
Copyright NVIDIA Corporation, 2005-2006.
Status
Completed.
Version
Last Modified Date: 11/08/2006
Author revision: 11
Number
355
Dependencies
WGL_ARB_extensions_string is required.
This extension interacts with WGL_ARB_make_current_read.
This extension interacts with WGL_ARB_pbuffer.
This extension interacts with GL_EXT_framebuffer_object
Overview
On systems with more than one GPU it is desirable to be able to
select which GPU(s) in the system become the target for OpenGL
rendering commands. This extension introduces the concept of a GPU
affinity mask. OpenGL rendering commands are directed to the
GPU(s) specified by the affinity mask. GPU affinity is immutable.
Once set, it cannot be changed.
This extension also introduces the concept called affinity-DC. An
affinity-DC is a device context with a GPU affinity mask embedded
in it. This restricts the device context to only allow OpenGL
commands to be sent to the GPU(s) in the affinity mask.
Handles for the GPUs present in a system are enumerated with the
command wglEnumGpusNV. An affinity-DC is created by calling
wglCreateAffinityDCNV. This function takes a list of GPU handles,
which make up the affinity mask. An affinity-DC can also
indirectly be created by obtaining a DC from a pBuffer handle, by
calling wglGetPbufferDC, which in turn was created from an
affinity-DC by calling wglCreatePbuffer.
A context created from an affinity DC will inherit the GPU
affinity mask from the DC. Once inherited, it cannot be changed.
Such a context is called an affinity-context. This restricts the
affinity-context to only allow OpenGL commands to be sent to those
GPU(s) in its affinity mask. Once created, this context can be
used in two ways:
1. Make the affinity-context current to an affinity-DC. This
will only succeed if the context's affinity mask is the same
as the affinity mask in the DC. There is no window
associated with an affinity DC, therefore this is a way to
achieve off-screen rendering to an OpenGL context. This can
either be rendering to a pBuffer, or an application created
framebuffer object. In the former case, the affinity-mask of
the pBuffer DC, which is obtained from a pBuffer handle,
will be the same affinity-mask as the DC used to created the
pBuffer handle. In the latter case, the default framebuffer
object will be incomplete because there is no window-system
created framebuffer. Therefore, the application will have to
create and bind a framebuffer object as the target for
rendering.
2. Make the affinity-context current to a DC obtained from a
window. Rendering only happens to the sub rectangles(s) of
the window that overlap the parts of the desktop that are
displayed by the GPU(s) in the affinity mask of the context.
Sharing OpenGL objects between affinity-contexts, by calling
wglShareLists, will only succeed if the contexts have identical
affinity masks.
It is not possible to make a regular context (one without an
affinity mask) current to an affinity-DC. This would mean a way
for a context to inherit affinity information, which makes the
context affinity mutable, which is counter to the premise of this
extension.
New Procedures, Functions and Structures:
DECLARE_HANDLE(HGPUNV);
typedef struct _GPU_DEVICE {
DWORD cb;
CHAR DeviceName[32];
CHAR DeviceString[128];
DWORD Flags;
RECT rcVirtualScreen;
} GPU_DEVICE, *PGPU_DEVICE;
BOOL wglEnumGpusNV(UINT iGpuIndex,
HGPUNV *phGpu);
BOOL wglEnumGpuDevicesNV(HGPUNV hGpu,
UINT iDeviceIndex,
PGPU_DEVICE lpGpuDevice);
HDC wglCreateAffinityDCNV(const HGPUNV *phGpuList);
BOOL wglEnumGpusFromAffinityDCNV(HDC hAffinityDC,
UINT iGpuIndex,
HGPUNV *hGpu);
BOOL wglDeleteDCNV(HDC hdc);
New Tokens
New error codes set by wglShareLists, wglMakeCurrent and
wglMakeContextCurrentARB:
ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV 0x20D0
New error codes set by wglMakeCurrent and
wglMakeContextCurrentARB:
ERROR_MISSING_AFFINITY_MASK_NV 0x20D1
Additions to the WGL Specification
GPU Affinity
To query handles for all GPUs in a system call:
BOOL wglEnumGpusNV(UINT iGpuIndex, HGPUNV *phGPU);
is an index value that specifies a GPU.
upon return will contain a handle for GPU number
. The first GPU will be index 0.
By looping over wglEnumGpusNV and incrementing ,
starting at index 0, all GPU handles can be queried. If the
function succeeds, the return value is TRUE. If the function
fails, the return value is FALSE and will be unmodified.
The function fails if is greater or equal than the
number of GPUs supported by the system.
To retrieve information about the display devices supported by a
GPU call:
BOOL wglEnumGpuDevicesNV(HGPUNV hGpu,
UINT iDeviceIndex,
PGPU_DEVICE lpGpuDevice);
is a handle to the GPU to query.
is an index value that specifies a display device,
supported by , to query. The first display device will be
index 0.
pointer to a GPU_DEVICE structure which will receive
information about the display device at index .
By looping over the function wglEnumGpuDevicesNV and incrementing
, starting at index 0, all display devices can be
queried. If the function succeeds, the return value is TRUE. If
the function fails, the return value is FALSE and
will be unmodified. The function fails if is
greater or equal than the number of display devices supported by
.
The GPU_DEVICE structure has the following members:
typedef struct _GPU_DEVICE {
DWORD cb;
CHAR DeviceName[32];
CHAR DeviceString[128];
DWORD Flags;
RECT rcVirtualScreen;
} GPU_DEVICE, *PGPU_DEVICE;
is the size of the GPU_DEVICE structure. Before calling
wglEnumGpuDevicesNV, set to the size, in bytes, of
GPU_DEVICE.
is a string identifying the display device name. This
will be the same string as stored in the field of the
DISPLAY_DEVICE structure, which is filled in by
EnumDisplayDevices.
is a string describing the GPU for this display
device. It is the same string as stored in the
field in the DISPLAY_DEVICE structure that is filled in by
EnumDisplayDevices when it describes a display adapter (and not a
monitor).
Indicates the state of the display device. It can be a
combination of any of the following:
DISPLAY_DEVICE_ATTACHED_TO_DESKTOP If set, the device is part
of the desktop.
DISPLAY_DEVICE_PRIMARY_DEVICE If set, the primary
desktop is on this device. Only one device in the system can have
this set.
specifies the display device rectangle, in
virtual screen coordinates. The value of is
undefined if the device is not part of the desktop, i.e.
DISPLAY_DEVICE_ATTACHED_TO_DESKTOP is not set in the
field.
The function wglEnumGpuDevicesNV can fail for a variety of
reasons. Call GetLastError to get extended error information.
Possible errors are as follows:
ERROR_INVALID_HANDLE is not a valid GPU handle.
A new type of DC, called an affinity-DC, can be used to direct
OpenGL commands to a specific GPU or set of GPUs. An affinity-DC
is a device context with a GPU affinity mask embedded in it. This
restricts the device context to only allow OpenGL commands to be
sent to the GPU(s) in the affinity mask. An affinity-DC can be
created directly, using the new function wglCreateAffinityDCNV and
also indirectly by calling wglCreatePbufferARB followed by
wglGetPbufferDCARB. To create an affinity-DC directly call:
HDC wglCreateAffinityDCNV(const HGPUNV *phGpuList);
is a NULL-terminated array of GPU handles to which the
affinity-DC will be restricted. If an element in the list is not a
GPU handle, as returned by wglEnumGpusNV, it is silently ignored.
If successful, the function returns an affinity-DC. If it fails,
NULL will be returned.
To create an affinity-DC indirectly, first call
wglCreatePbufferARB passing it an affinity-DC. Next, pass the
handle returned by the call to wglCreatePbufferARB to
wglGetPbufferDCARB to create an affinity-DC for the pBuffer. The
DC returned by wglGetPbufferDCARB will have the same affinity mask
as the DC used to create the pBuffer handle by calling
wglCreatePbufferARB.
An affinity-DC has no window associated with it, and therefore it
has no default window-system-provided framebuffer. (Note: This is
terminology borrowed from EXT_framebuffer_object). A context made
current to an affinity-DC will only be able to render into an
application-created framebuffer object, or a pBuffer. The default
window-system-framebuffer object, when bound, will be incomplete.
The EXT_framebuffer_object specification defines what 'incomplete'
means exactly.
A context created from an affinity-DC, by calling wglCreateContext
and passing it an affinity-DC, is called an affinity-context. This
context will inherit the affinity mask from the DC. This affinity-
mask cannot be changed. The affinity mask restricts the affinity-
context to only allow OpenGL commands to be sent to those GPU(s)
in its affinity mask.
The function wglCreateAffinityDCNV can fail for a variety of
reasons. Call GetLastError to get extended error information.
Possible errors are as follows:
ERROR_NO_SYSTEM_RESOURCES Insufficient resources exist to
create the affinity-DC.
ERROR_INVALID_DATA is empty or contains no
valid GPU handles
An affinity-context can only be made current to an affinity-DC
with the same affinity-mask, otherwise wglMakeCurrent and
wglMakeContextCurrentARB will fail and return FALSE. In the case
of wglMakeContextCurrentARB, the affinity masks of both the "read"
and "draw" DCs need to match the affinity-mask of the context.
If a context that has no affinity mask is made current to an
affinity-DC, wglMakeCurrent and wglMakeContextCurrentARB will fail
and return FALSE. In the case of wglMakeContextCurrentARB it will
fail if either the "read" or "draw" DC is an affinity-DC.
If an affinity-context is made current to a DC obtained from a
window, by calling GetDC, then rendering will only happen to the
subrectangle(s) of the window that overlap the parts of the
desktop that are displayed by the GPU(s) in the affinity-mask of
the context. Note that a DC obtained from a window does not have
an affinity mask set.
The following error codes are added to the description of
wglMakeCurrent and wglMakeContextCurrentARB:
ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV The device context(s) and
rendering context have non-matching affinity masks.
ERROR_MISSING_AFFINITY_MASK_NV The rendering context does
not have an affinity mask set.
Sharing OpenGL objects between affinity-contexts, by calling
wglShareLists, will only succeed if the contexts have identical
affinity masks. The following error codes are added to the
description of wglShareLists:
ERROR_INCOMPATIBLE_AFFINITY_MASKS_NV The contexts have non-
matching affinity masks.
To delete an affinity-DC call:
BOOL wglDeleteDCNV(HDC hdc)
Is a handle of an affinity-DC to delete.
If the function succeeds, TRUE is returned. If the function fails,
FALSE is returned. Call GetLastError to get extended error
information. Possible errors are as follows:
ERROR_INVALID_HANDLE is not a handle of an affinity-DC.
To retrieve a list of GPU handles that make up the affinity-mask
of an affinity-DC, call:
BOOL wglEnumGpusFromAffinityDCNV(HDC hAffinityDC,
UINT iGpuIndex,
HGPUNV *phGpu);
is a handle of the affinity-DC to query.
is an index value of the GPU handle in the affinity
mask of to query.
upon return will contain a handle for GPU number
. The first GPU will be at index 0.
By looping over wglEnumGpusFromAffinityDCNV and incrementing
, starting at index 0, all GPU handles associated with
the DC can be queried. If the function succeeds, the return value
is TRUE. If the function fails, the return value is FALSE and
will be unmodified. The function fails if is
greater or equal than the number of GPUs associated with
.
Call GetLastError to get extended error information. Possible
errors are as follows:
ERROR_INVALID_HANDLE is not a handle of an
affinity-DC.
Interactions with WGL_ARB_make_current_read
If the make current read extension is not supported, all language
referring to wglMakeContextCurrentARB is deleted.
Interactions with WGL_ARB_pbuffer
If the pbuffer extension is not supported, all language referring
to puffers, wglGetPbuferDC and wglCreatePbuffer are deleted.
Interactions with GL_EXT_framebuffer_object
If the framebuffer object extension is not supported, all language
referring to framebuffer objects is deleted.
Usage examples
// Example 1 - Normal window creation, DC setup and
// context creation.
PIXELFORMATDESCRIPTOR pfd;
int pf;
HDC hDC;
HGLRC hRC;
HWND hWnd;
hWnd = CreateWindow(...);
hDC = GetDC(hWnd);
memset(&pfd, 0, sizeof(pfd));
pfd.nSize = sizeof(pfd);
pfd.nVersion = 1;
pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL;
pfd.iPixelType = PFD_TYPE_RGBA;
pfd.cColorBits = 32;
// Note, for ease of code reading no error checking is done.
pf = ChoosePixelFormat(hDC, &pfd);
SetPixelFormat(hDC, pf, &pfd);
DescribePixelFormat(hDC, pf, sizeof(PIXELFORMATDESCRIPTOR),
&pfd);
hRC = wglCreateContext(hDC);
wglMakeCurrent(hDC, hRC);
// Example 2 - Offscreen rendering to one GPU using a FBO
// It is assumed that a context already has been created (and
// possibly destroyed) and was used to query the proc addresses
// of the WGL affinity related entrypoints.
#define MAX_GPU 4
PIXELFORMATDESCRIPTOR pfd;
int pf, gpuIndex = 0;
HGPUNV hGPU[MAX_GPU];
HGPUNV GpuMask[MAX_GPU];
HDC affDC;
HGLRC affRC;
// Get a list of the first MAX_GPU GPUs in the system
while ((gpuIndex < MAX_GPU) && wglEnumGpusNV(gpuIndex,
&hGPU[gpuIndex])) {
gpuIndex++;
}
// Create an affinity-DC associated with the first GPU
GpuMask[0] = hGPU[0];
GpuMask[1] = NULL;
affDC = wglCreateAffinityDCNV(GpuMask);
// Set a pixelformat on the affinity-DC
pf = ChoosePixelFormat(affDC, &pfd);
SetPixelFormat(affDC, pf, &pfd);
DescribePixelFormat(affDC, pf, sizeof(PIXELFORMATDESCRIPTOR),
&pfd);
affRC = wglCreateContext(affDC);
wglMakeCurrent(affDC, affRC);
// Make a previously created FBO current so we have something
// to render into. Since there's no window, the default system
// created FBO is incomplete.
glBindFramebufferEXT(GL_FRAMEBUFFER_EXT, fb);
// Example 3 - Offscreen rendering to one GPU using a pBuffer
// It is assumed that a context already has been created (and
// possibly destroyed) and was used to query the proc addresses
// of the WGL affinity and pbuffer related entrypoints.
#define MAX_GPU 4
int gpuIndex = 0;
HGPUNV hGPU[MAX_GPU];
HGPUNV GpuMask[MAX_GPU];
HDC affDC, pBufferAffDC;
HGLRC affRC;
// Get a list of the first MAX_GPU GPUs in the system
while ((gpuIndex < MAX_GPU) && wglEnumGpusNV(gpuIndex,
&hGPU[gpuIndex])) {
gpuIndex++;
}
// Create an affinity-DC associated with the first GPU
GpuMask[0] = hGPU[0];
GpuMask[1] = NULL;
affDC = wglCreateAffinityDCNV(GpuMask);
// Setup desired pixelformat attributes for the pbuffer
// including WGL_DRAW_TO_PBUFFER_ARB.
HPBUFFERARB handle;
int width = 512, height = 512, format = 0;
unsigned int nformats;
int attribList[] =
{
WGL_RED_BITS_ARB, 8,
WGL_GREEN_BITS_ARB, 8,
WGL_BLUE_BITS_ARB, 8,
WGL_ALPHA_BITS_ARB, 8,
WGL_STENCIL_BITS_ARB, 0,
WGL_DEPTH_BITS_ARB, 0,
WGL_DRAW_TO_PBUFFER_ARB, true,
0,
};
wglChoosePixelFormatARB(affDC, attribList, NULL, 1,
&format, &nformats);
handle = wglCreatePbufferARB(affDC, format, width, height, NULL);
// pbufferAffDC will have the same affinity-mask as affDC.
pBufferAffDC = wglGetPbufferDCARB(handle);
// affRC will inherit the affinity-mask from pBufferAffDC.
affRC = wglCreateContext(pBufferAffDC);
wglMakeCurrent(pBufferAffDC, affRC);
Issues
1) Do we really need an affinity-DC, or can we do with just an
affinity context?
DISCUSSION: If affinity is not part of a DC, a new function will
need to be defined to create an affinity-context or set an
affinity-mask for an existing context. Passing NULL as a HDC to
wglMakeCurrent will then be one way to create an off-screen
rendering context, where rendering will have to go to a FBO. If
the HDC passed to wglMakeCurrent is one for a pBuffer, the
affinity-mask in the affinity-context dictates where rendering is
direct to. This might mean pBuffer resources will have to move, or
alternatively, duplicated across all GPUs in a system. That is
counter to the whole idea of this extension. Thus an affinity-DC
is definitely needed for a pBuffer.
Thus the question reduces to, do we need an affinity-DC in order
to facilitate off-screen rendering to a FBO? Having an affinity-DC
has the following advantages:
a) It is consistent with making current to a pBuffer or window,
that does need a DC.
b) passing NULL as a HDC to wglMakeCurrent might be filtered out
by the MS layer on future OSes.
c) The driver implementation might benefit from knowing at DC
creation time what the affinity-mask is, rather than at
wglMakeCurrent time.
RESOLUTION: Yes.
2) Should the GPU affinity concept also apply to D3D and/or GDI
commands?
DISCUSSION: It could be especially desirable to apply the
affinity concept to D3D. However, D3D is sufficiently different
that this extension doesn't directly apply.
RESOLUTION: That falls outside this extension.
3) Should setting a pixelformat on an affinity-DC be required?
DISCUSSION: Setting a pixelformat on an affinity-DC is not
strictly necessary if the application does off-screen rendering to
a FBO. However, the Microsoft layer of wglMakeCurrent requires
that the pixelformats of the DC and RC passed to it match. This
becomes an issue when making an affinity-context current to a DC
obtained from a window. The DC has a pixelformat set by the
application, and therefore the affinity-context needs to have the
same pixelformat. This means the affinity-DC, that the affinity-
context is created from, needs to have the same pixelformat set.
RESOLUTION: YES. Setting a pixelformat on an affinity-DC is
required.
4) Is it allowed to make an affinity-context current to an
affinity-DC where the mask of the context spans more GPUs than the
mask in the DC?
5) Is it allowed to make an affinity-context current to an
affinity-DC where the mask of the context spans less GPUs than the
mask in the DC?
DISCUSSION: Issues 4 and 5 are lumped together in this discussion.
For example, is this scenario something we want to support: An
application wants to share objects across two contexts and have
these two contexts each render to a different GPU. It can do this
by creating two affinity-DCs. One has an affinity mask for the
first GPU, the other for the second GPU. It also creates two
affinity-contexts that both have an affinity-mask that spans both
GPUs. Making one context current to the first affinity-DC will
lock the context to the GPU in the mask of that affinity-DC. Make
another context current to the second affinity-DC will lock that
context to the second GPU. This is effectively what issue 4) is
asking. . The simplest solution is to disallow these cases, and
that is how the spec is currently written.
RESOLUTION: NO, we will not allow this to keep the spec simple. If
necessary, these restrictions can always be lifted later.
6) What should an application do if the enum functions that return
BOOL fail for another reason than they are done? For example, if
they fail because they run out of memory?
RESOLUTION: An application will have to call GetLastError to find
out the reason of failure.
7) The "Enum" API commands in this extension assume that the list
of things being enumerated does not change dynamically. Is that
reasonable?
DISCUSSION: Display devices, and possibly GPUs in the future, can
be changed dynamically and/or hotplugged. Thus yes, this is a
potential issue. Existing OS functionality like EnumDisplayDevices
and even wglMakeCurrent will suffer from this too. In the latter
case, the application could make a context current to a device
that was removed from the system. A possible solution would be
some sort of notification mechanism to the application. Possibly
combined with being able to snapshot state first, then enumerate
that snapshot. That snapshot of state might immediately become
invalid, but at least the enumeration will walk a consistent list.
RESOLUTION: This is a wider issue than just this specification,
and not currently addressed.
8) How do I transfer data efficiently between two affinity-
contexts?
DISCUSSION: It is desired for an application to render in one
context, and transfer the result of that rendering to another
context. These two contexts can be on different GPUs. If they are,
how does the application efficiently transfer this data? Currently
OpenGL provides two mechanisms, neither of which are ideal:
1) The application can do a ReadPixels followed by a DrawPixels /
TexImage call. This involves transfer through host memory, which
can be slow.
2) The application can share objects among the two contexts using
wglShareLists(). This will work, but is counter to the premise of
this extension where each GPU has its own set of resources, not
shared with another GPU.
RESOLUTION: This is a hole which needs to be addressed separately.
Revision history
None