Contents

Asynchronous action

OpenGL rendering commands are assumed to be asynchronous. If you call any of the glDraw* to initiate rendering, it is not at all guaranteed that the rendering has finished by the time the call returns. Indeed, it is perfectly legitimate for rendering to not even have started when this function returns. The OpenGL specification allows implementations the freedom to get around to rendering commands whenever it is best for them.

This is not a weakness of OpenGL; it is a strength. It allows for many optimizations of the rendering command pathway. Issuing a command to the internal rendering command buffer can be a fairly slow process, due to a CPU transition (on x86 hardware) out of protected mode and into unprotected mode. This transition eats up a lot of cycles, so if the internal driver can store 30 rendering commands and then issue all of them with only one transition, this is faster than making one transition per rendering function call.

Legacy Note: If you are using client-side vertex arrays, drawing commands are required to pull all vertex data from your arrays before the glDraw* command returns. This is usually done as a simple mem-copy from your client arrays into renderer-side memory. Indeed, any function that involves client-side memory must finish using that client-side memory before it returns.

All of this means that OpenGL is a very asynchronous renderer.

Command state

An OpenGL rendering command can be in one of three conceptual states: unissued, issued but not complete, and complete.

A command is unissued if the command has been given to the OpenGL driver, but the driver has not yet given the command to the hardware to actually execute. When the rendering hardware starts running out of actually issued commands to process, the OpenGL driver can take some of the unissued commands and issue them.

An issued but not complete command is one that has been given to the hardware, but the full results of the command are not yet ready. The hardware has a queue of these commands; unless there is a hardware fault of some kind, the hardware will execute all of the commands in that queue.

A command is complete when it is out of the pipeline entirely. For rendering commands, this means that its effects have been written to the framebuffer or transform feedback buffers, as appropriate to the current state. For pixel transfers to buffer objects, this means that the pixel data is now stored in the buffer object as requested. For pixel transfers from buffer objects, this means that the pixel data is now stored in the texture object that was uploaded to. And so forth.

Synchronization options

Asynchronous rendering is nice. However, it is often useful to synchronize your actions with OpenGL. And OpenGL provides several alternatives for doing so.

Conceptually, the GPU has something called a "command queue". This is a list of commands written by the OpenGL driver at the behest of the user. Just about every OpenGL function will map to one or more commands that will be added to the command queue. Any command that is placed in the command queue will be read by the GPU and executed.

However, the command queue has a finite length. If you add too many commands in a short space of time, the driver cannot write them all to the GPU's command queue. What the driver can do is write them to internal memory. These commands are in the "unissued" state. Sometime later, the unissued commands are added to the GPU's queue. When the driver does this is the question.

The driver may set up some kind of asynchronous message that tells it when the GPU's queue is nearly empty so that it can add more if possible. However, this is generally not the case. OpenGL allows the driver to have the freedom to not make this check until you actually execute an OpenGL call. And even then, not all calls will make this check.

What this means is that it is theoretically possible for OpenGL to be sitting there, with lots of unissued commands in the driver's buffer, but with the GPU command queue being totally empty. The driver knows that there is work, but if you don't execute another command (any command), it can never verify this check.

Normally this isn't a problem. But imagine this circumstance. You render a lot of stuff, all in a short space of time. You sort your data to achieve maximum efficiency, so submitting the data takes less time than rendering it.

Because you add a lot of commands in a short space of time, the driver has to buffer many of these commands. But if you don't make any OpenGL calls after you have submitted all of the data, then the driver never has the chance to push these buffered commands into the command queue. Obviously, you will be issuing commands next frame, but you'd probably rather not wait that long.

The purpose of glFlush is to tell OpenGL to sit there and wait until all commands have been added to the GPU's command queue. This won't take as long as glFinish, but it can still be time consuming.

Implicit synchronization

Some operations implicitly force a full glFinish synchronization, while others force the user to halt until all commands up to a certain point have completed. And some force a glFlush.

Swapping the back and front buffers on the default framebuffer may cause some form of synchronization (though the actual moment of synchronization event may be delayed until later GL commands), if there are still commands affecting the default framebuffer that have not yet completed. Swapping buffers only technically needs to sync to the last command that affects the default framebuffer, but it may perform a full glFinish. However, it will at least glFlush to the last command that affected that framebuffer.

Any attempt to read from a framebuffer to CPU memory (not to a buffer object) will halt until all rendering commands affecting that framebuffer have completed. Most attempts to write to a buffer object, either with glBufferSubData or mapping, will halt until all rendering commands reading from that buffer object have completed. However, if you use glBufferData(target, NULL), this allows the implementation to allocate new storage for the buffer and simply orphan the old one (deleting it when it is no longer used). You can do something similar by using the GL_MAP_INVALIDATE_BUFFER_BIT with glMapBufferRange. For more details, see this page on buffer streaming.

If you use the GL_MAP_UNSYNCHRONIZED_BIT with glMapBufferRange, OpenGL will forego any synchronization operations; you are on your own however as to the consequences of modifying any parts of that buffer that may be in use.

Similarly, attempts to change texture data from CPU memory with commands like glTexSubImage2d can block until commands that use that texture have finished. They may not block, as some implementations will just allocate some CPU memory and copy the user's pixel data into that. They will do the DMA directly to the texture some time later. Changing textures from buffer objects will not force a synchronization.

There may be a few commands that, on some implementations, cause a synchronization to some point in the command stream. OpenGL does not require these commands to do so, but implementations are free to do so if they deem it necessary. Framebuffer object binding and rendering may cause a sync to the last command that affected the previously bound framebuffer object.

Sync Objects

glFinish is a decent start on synchronization. However, it is often useful to be able to do the kind of synchronization that OpenGL itself does implicitly. That is, being able to sync to a specific point in the command stream.

You do this by creating a fence object. This is a token in the command stream that you can test to see if it has been completed. Since the stream is an ordered list, if the fence has completed, then every command issued before that fence was issued has also completed.