Details

This is an early WIP of the GLX backend. I go through that with a chainsaw
at the moment and hope for some feedback by people knowing all the history
about it and stop me before accidentally cutting off the branch we are
sitting on. ;)

Overall goals:

Make the code less convoluted.

Sync paint with vblanks if extensions required for that are available

Reduce delay by pushing paint to end of vblank cycle, see [1].

Currently doing:

Remove triple buffering. A compositor does not need this when it doesn't screw up in some other way.

Remove vsync option. Always sync with vblanks.

Replace GLX_MESA_, GLX_EXT_. and GLX_SGI_swap_control with GLX_OML_sync_control only, since this is the standard nowadays and we don't need to optimize for every legacy X device in the world.

Remove blocksForRetrace, instead use the Intel swap event or glFinish to sync with vblank. glFinish should block till vblank, see [2][3]. (doesn't seem to be true per se, maybe driver dependent)

In endRenderingFrame don't present a second time. Why was it there in the first place? -> Thanks to @fredrik's comments this is now clear: in beginRenderingFrame it's called if we don't block on retrace and if do block in endRenderingFrame and according to comment because damage gets reset in the later case it would be noop in beginRenderingFrame. Needs to get replaced though to make the code less complex.

Added 19-08-13:

When swap events are available don't use the fallback timer.

Delay timer reduces latency between paint and vsync

Added 19-08-20:

Move present to endRenderingFrame to have less than one frame of latency.

Added 19-08-21:

For now perform compositing always from event loop. Without it Wayland session crashes. Probably otherwise hidden issue in Effects.

Cleanup: Remove more plumbing for vsync and retrace blocking and swap-profiler

Open questions:

Is glFinish really blocking till vblank? X Present extension provides events to get notified about vblank, but since we channel through GLX this is not available directly to my knowledge. So it's kind of weird that there is only the Intel swap event to wrap that behavior explicitly.

Compositor has a compositing timer. It seems we always wait for it, but if we sync with vblank there is no reason to do it this way. Instead only use is as a fallback in case sync with vblank is not supported?

I could test how it works on amdgpu (RX560), if you need help with that, when it's ready to test

Thank you, I have an AMD setup here as well. But of course more testing on other setups is always better! If you have a git checkout of KWin you can test now already (with arc do arc patch D23105 to easily apply the diff to your local checkout). It should work just fine.

I've briefly tested it on my NVidia GTX 550Ti with nvidia driver. Nothing appears to be horribly broken. Anything in particular I should look out for?

That's awesome! Thank you very much. There is nothing particular to look out for. Just the standard stuff: no tearing, no lag. I have sometimes a freeze directly after startup on an Intel-device on X11 (Wayland session works fine though). So needs some more work for sure, but we'll get there.

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

But with KWIN_USE_INTEL_SWAP_EVENT=0 or when unset, I see serious "lags" when dragging the window, it becomes very hard to drag window on screen, looks like some mouse movements are dropped(?) Resizing for example is still fine:

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

Ok and in other case the back buffer also has what's currently on the front buffer and we can paint it partly over and then swap?

No, but the buffer age extension makes it possible to query the number of frames that have elapsed since the current back buffer was the front buffer. Based on that we can calculate how much we need to repaint to bring it up-to-date.

In practice we don't call sleep of course; instead we start the composite timer with the timeout set to just before the next vblank, and return to the event loop.

But the only driver where SwapBuffers() behaves like this is the NVIDIA driver, and its not the default behavior. It's enabled by setting __GL_MaxFramesAllowed to 1 before initializing OpenGL.

The reason the next frame is rendered immediately after the buffer swap instead of doing it as late as possible is to avoid missing the vblank when there is a sudden increase in render time. Also if we miss the vblank by one millisecond, the main thread will be blocked in SwapBuffers for 15 milliseconds, which is less than ideal.

But there's obviously a trade-off here between smoothness and latency.

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

I want KWin to run v-synced always. I don't think it makes sense to make this configurable by the user. It's the default in our Wayland session and on X11 the compositor should always run v-synced as well. Or is there a valid use case for running it async in your opinion?

Thank you very much for the comprehensive explanation. I see the logic behind it now when the SwapBuffers call is assumed to be blocking. When it's not it seems just unnecessary complicated to me. Just do the SwapBuffers directly after paint() for the next frame at some point before the upcoming vblank. Then on buffer swap event or vblank interval timer timeout repeat for the next frame.

Also if we miss the vblank by one millisecond, the main thread will be blocked in SwapBuffers for 15 milliseconds, which is less than ideal.

But only if the driver blocks on SwapBuffers and as you said no current driver behaves like this per default, right? Otherwise if we miss we "just" have a delay of one frame for the next paint result on screen (what we have with the current code on master all the time).

But with KWIN_USE_INTEL_SWAP_EVENT=0 or when unset, I see serious "lags" when dragging the window, it becomes very hard to drag window on screen, looks like some mouse movements are dropped(?) Resizing for example is still fine:

Thanks for testing! Yea, I currently trying to move the present call to endRenderingFrame and I also notice lags and sometimes freezes on opening a window while using the timer. There are no problems with the swap event.

This just means on damage we paint always with one frame later this damage. We could improve this by checking if the Compositor was idle for longer than one frame when compositing or not and only have the timer run when below this time.

I tested it now with Nvidia proprietary driver (v430, Neon unstable, GTX 950) and as expected the fallback timer is used since OML sync control and intel swap event are not available. This means there is no synchronization / latency minimization through delay timer in relation to the vblanks. There is of course the fallback timer providing a gap of ~16ms and the result looks still ok for me. Did not notice any tearing or freezes.

@alexeymin Can you test again with the update? I just accidentally had the fallback timer disabled before. Now works fine for me without swap event.

Now it works fine without KWIN_USE_INTEL_SWAP_EVENT or with KWIN_USE_INTEL_SWAP_EVENT=0.

But because test plan still mentions testing with KWIN_USE_INTEL_SWAP_EVENT=1, I tested it too. Window movement is somewhat intermittent, it moves like with some small hops. I guess I should not use intel swap event on amdgpu, right? 😉

NVIDIA doesn't support the OML extensions. They can't be implemented efficiently on their hardware IIRC.
[...]

That's good to know. Thanks! I believe we can let in some of these extensions again without increasing the complexity too much as long as the SGI ones is ignored and we have no manual control of vsync. The complexity in the old code came mostly from that.

That would leave the user without a way to override the driver's default vsync setting.

I want KWin to run v-synced always. I don't think it makes sense to make this configurable by the user. It's the default in our Wayland session and on X11 the compositor should always run v-synced as well. Or is there a valid use case for running it async in your opinion?

That's not my point though. My point is that if you don't set the swap interval, you leave it up to the driver, and not all drivers enable vsync by default.

@alexeymin Can you test again with the update? I just accidentally had the fallback timer disabled before. Now works fine for me without swap event.

Now it works fine without KWIN_USE_INTEL_SWAP_EVENT or with KWIN_USE_INTEL_SWAP_EVENT=0.

But because test plan still mentions testing with KWIN_USE_INTEL_SWAP_EVENT=1, I tested it too. Window movement is somewhat intermittent, it moves like with some small hops. I guess I should not use intel swap event on amdgpu, right? 😉

It should work with all Mesa drivers. The extension is just called GLX_INTEL_swap_event because Intel developed it.

That's not my point though. My point is that if you don't set the swap interval, you leave it up to the driver, and not all drivers enable vsync by default.

Ok, I understood what you said differently. But good to know, then I'll not remove the call to the extensions. Although I will thin it out: GLX_OML_sync_control for mesa and GLX_EXT_swap_control for Nvidia should be enough. Also it will be a simple call to activate vsync and no interface to switch it off again.

I will upload smaller patches now with changes factored out from this playground. First one will be removal of triple buffering functionality.

Regarding the fallback to the composite timer when using the proprietary NVIDIA driver, while it doesn't really support anything equivalent to GLX_INTEL_swap_event, there is https://www.khronos.org/registry/OpenGL/extensions/NV/GLX_NV_delay_before_swap.txt which has the similar goal of reduced input latency. See the extension description for details, but essentially it allows the application to block until a specified time before the next swap. I'm not sure if that's exactly fits what KWin needs here, but maybe worth considering?.

Also, I'd be able to test this patch on a PRIME (hybrid graphics) system if nobody has done so yet. Given that there have been synchronization-related issues there before, it's probably worth checking.

However, the patch doesn't seem to apply cleanly to HEAD any longer. Would you be able to post a rebased version?