Sync finished callback

It would be great if we could register a callback function that gets called by a driver thread when a GLsync object becomes signaled, instead of having to poll or stall on the GLsync object until it's finished.

It would be great if we could register a callback function that gets called by a driver thread when a GLsync object becomes signaled, instead of having to poll or stall on the GLsync object until it's finished.

I'm not sure of the practicality of such a feature.

If the callback is to be invoked in the thread which called glFenceSync(), that implies that the driver performs the equivalent of glClientWaitSync() on every GL command (and if that thread isn't executing GL commands, there's no way for the driver to invoke the callback in that thread).

If the callback is invoked in some other thread, it will need a context, and that can't be a context which is current in any other thread. Also, if this is being done transparently by the driver, the thread would presumably be a plain OS thread, which won't necessarily interact well with languages or toolkits which have their own thread management (e.g. Qt doesn't like it if you call Qt functions from a thread which isn't a QThread).

I'm not sure if a suggestion that is targetted at resolving one very specific problem that occurs in a scenario where you're trying to do something that goes against vendor advice to begin with has much merit.

GClements : The rules for the callback can follow the same ideas as when GL_DEBUG_OUTPUT_SYNCHRONOUS_ARB is disabled for the debug callback. There is no guarantee which thread will call the debug callback message, and there is no guarantee about an active GL context in during the callback. I think this is fine. The idea with the callback would be to queue up some task that would executed in the correct thread that does have a GL context (if a GL context is even needed for what needs to happen). Whatever issues there are with other languages/toolkits would also be a problem with the ARB_debug_output extension. Are there such problems?

mhagain: It is true that particular problem brought up the need, but I see other needs also. I'm confused which vendor advice I'm ignoring though? Can you enlighten me?

Other example usage case: Say I'm downloading video frames using the Nvidia Dual Copy Engines (I.E downloading frames in another thread with a 2nd context). If in my main thread I don't want to stall waiting for a download to finish, I need to poll the state of the download sync object at arbitrary times until it says it's done, then I can process the frame however I need. It would be cleaner if a callback occurred when the download is done so I can then queue a task to be executed in the main task or otherwise.
Also, the callback mechanism removes the need for both a CPU and GPU sync object the way we need right now (wait for CPU sync object to signal that the GLsync has been created, then wait on the GPU GLsync). I just think it's cleaner regardless.

Nvidia told me this was because DeviceContexts are actually thread-affine, and GDI functions should only be called on them in the threads that created them. Quote from Nvidia "Calling GDI functions using an HDC from the non-hdc-affine thread has always been wrong (OpenGL is a GDI api and GDI objects like HDCs are thread-affine), but the failure cases are in general hard to repro."

(my emphasis)

Functionality added to OpenGL to enable you to work-around something that's wrong to begin with doesn't seem like a good thing. Furthermore, this is GDI behaviour, not OpenGL behaviour, so any fix for it has it's proper place in GDI.

But I'm trying to follow that advice... I'm trying to ensure that I call SwapBuffers from the thread that created the DC, the window's thread. Your remarks bring up another example actually, which to me further says how useful this could be, since it allows GL to interact with non-GL APIs more easily.

Further example:
Say I'm using the DX_interop extension. One thread is doing 100% DX work, another is doing 100% OpenGL work. The OpenGL thread renders to a texture, and the DX will be consuming this texture. With this callback the DX thread can be alerted that the texture is ready to be used without the OpenGL thread polling the GLsync object at all. The GL thread can queue up it's commands, create the sync, and it doesn't need to do anything else.

What are your thoughts about that case and my dual copy engine example case? We should try to keep this thread on subject about the general usability of this.

Example 1. Right now to synchronize two threads using two GL contexts we need to use both an OS CPU event, and a GLsync object. Both threads need to use both objects.
Thread1:
Create GLsync
Signal OSEvent

Thread2:
Wait On OSEvent
Wait On GLSync

This is needed to ensure Thread2 doesn't start waiting on a GLsync that doesn't exist yet.
With a callback mechanism this becomes simpler
Thread1:
Create GLsync -> the callback will signal the OSEvent when the GLsync becomes signaled.

Thread2:
Wait on OSEvent

Example 2:

One OpenGL thread that is creating data, and one thread that knows nothing about OpenGL consuming the data

Thread 1:
Initiate download of data into a PBO created with STREAM_READ hint.
Map the PBO using the UNSYCHRONIZED_BIT option with READ_ONLY
Pass this pointer off to thread 2
Create GLsync -> the callback will signal an OSEvent

Thread 2:
Wait on OSEvent
Consume Pointer

The point being here that it allows GL to produce data to be consumed by a thread that knows nothing about GL. This also makes the producer and consume entirely modular. Neither needs to know what API the other is using the produce or consume data, since the only synchronization primitive used between each other is an OS event. You can drop in/out different producers and consumers that use different APIs.
Yes, this is doable with the current sync by the producer polling the GLsync object. But the callback can potentially give us the lowest latency possible since the consumer can begin consuming the data immediately.