Threaded View

Work groups with priorities

Problem:
Currently, you have no way of telling/hinting the GL (or the driver) which of your commands are important (real time critical) and which ones are not. You want to keep the command queue full, but at the same time you do not want to compete for resources (GPU time, memory, bandwidth) with time critical tasks.

Examples:

You are running a complicated physical simulation (in an old ping-pong pixel shader, in a compute shader, or as an OpenCL kernel) which is calculated at 10Hz. The renderer runs at typical 60Hz, limited by vsync, and interpolates the simulation results over several frames. The simulation takes considerable time (say, 5-10ms), but it suffices if the result is ready 5 to 6 frames (= 83 to 100ms) in the future.

You are calculating a histogram of the previous frame to do tonemapping. The calculation could start as soon as the frame is available as texture (at the same time as tonemapping/displaying it) and could execute while the GPU is not doing anything (such as during vsync), but it should not compete with tonemapping/blitting the previous frame or delay swapping buffers.

You are doing a non-trivial amount of render-to-texture (say, to display a "page of text" out of a book in an e-reader, or in a game). The frame rate should be constant, as it would be disturbing to see all other animations "freeze" for a moment when one opens a book. On the other hand, nobody would notice if 2-3 frames passed before the book is opened or a page is flipped -- as long as everything stays "smooth".

Your worker thread has just finished loading a texture from disk into a mapped buffer object. Now you would like to use it (next frame). So you unmap the buffer and call glTexImage to allocate storage and define the texture's contents. You want to do this early to give the driver a chance to asynchronously upload the data, but you do not want to compete with the rendering (frame time budget!) for PCIe or or GPU memory. You certainly do not want to stall for half a millisecond while the GL or driver is doing memory allocator work (and maybe even kick a texture that is still needed later this frame!) to make room for the new texture.

You have no way of telling the GL that you don't need the physics immediately. You have no way of telling the GL to start calculating the histogram but not to compete with the rendering -- or worse, wait for histogram calculation to complete before swapping buffers. You have no way of telling the GL to allocate and upload the texture whenever there is time (i.e. generally as soon as possible), but not at the cost of something that must finish this frame.

Yes, swapping buffers likely won't be delayed by "unimportant" tasks since most implementations render 2-3 frames ahead anyway, so there is no clean-cut end of frame. But still, you cannot be certain of this implementation detail, and you do not even have a way of hinting as to what's intended.
The driver must assume that anything you pass to GL (or... CL) is equally important, and anything you submit should be ready as soon as possible. At the same time, you want to push as many tasks to the GL as fast as you can, as to prevent the GPU from going idle.

With some luck, the driver is smart enough (or lucky enough) to get it just right, but ideally you would be able to hint it, so it can do a much better job.

Proposal:
Commands submitted to the GL are grouped into work groups (name it differently if you like). There is a single default workgroup with "normal" priority to accomodate programs that are not workgroup-aware.
A subset of commands can be enclosed in a different workgroup with a different (lower) priority using a begin/end command pair (say, glBeginGroup(GLenum priority); and glEndGroup();). Implementations that are unwilling to implement the feature simply treat the begin/end function calls as no-op.

(As a more complicated alternative, one could consider "workgroup objects" much like query objects or buffer objects. This would allow querying the workgroup's status and/or synchronizing with its completion, and one might change a workgroup's priority at a later time, or even cancel the entire workgroup. However, the already present synchronization mechanisms in OpenGL are actually entirely sufficient, and it's questionable whether changing priorities and cancelling workgroups are really advantageous features. They might add more complexity than they are worth.)

An elaborate system of priorities (with dozens/hundreds of priorities as offered by operating systems) is needlessly complex and has no real advantage -- a simple system with less than half a dozen possible levels, maybe only 2 or 3, would be more than enough.

For example:
GL_PRIORITY_NORMAL --> default, want to see this ready as soon as possible
GL_PRIORITY_END_OF_FRAME --> not immediately important, best start when done with this frame (or when main task is stalled)
GL_PRIORITY_NEXT_FRAME --> don't care if this is ready now or the next frame (or in 2 frames), but still want result in "finite time"
GL_PRIORITY_LOW --> rather than going idle, process this task -- otherwise do a higher priority one