Issue description

There is a very easy way to achieve nearly perfect VSYNC synchronization in Chrome under Windows, that:
- wakes up animation code in around 60 to 120 microseconds of true vsync
- works on Vista, Win7, Win8, etc
- does not spin wait, so is very CPU efficient
- does not use the Windows Desktop Window Manager (DWM)
- ...so it works even with DWM/Aero turned off
- works even with the OS default 15.625ms timer (in)accuracy
- not affected by notebooks running on battery power
Chrome attempts to wake up requestAnimationFrame() animations at vsync intervals, but the inter-frame timings for these animations (under Windows) still have significant jitter (around 1ms to 4ms) -- caused primarily by the inaccuracy of Windows WaitForXXX timing functions (that Chrome uses to wait for the vsync time). This can be seen at www.vsynctester.com, especially with notebooks, and even more so for notebooks not on AC (on battery). At 120Hz that jitter represents an unacceptably large portion of the 8.3ms frame budget.
Window's Sleep/WaitForXXX have an accuracy plus or minus an entire time quantum, which on many Windows computers is 15.625ms. Chrome uses timeBeginPeriod() to attempt to reduce that to 1ms while on AC power, and 4ms while on battery power (see time_win.cc), which affects (increases) power usage.
THE SOLUTION: The solution is to use the same interface that Window's own Desktop Window Manager already uses to implement a OS level compositing manager -- the DirectX graphics kernel subsystem. So, use D3DKMTWaitForVerticalBlankEvent():
https://msdn.microsoft.com/en-us/library/windows/hardware/ff547265(v=vs.85).aspx
A successful proof of concept prototype used D3DKMTOpenAdapterFromHdc() to obtain handles from the application's hDC that were then passed into D3DKMTWaitForVerticalBlankEvent(), to create a vsync synchronization loop, that woke up an animation loop every vsync.
Hopefully this can be added into Chrome in a very timely manner...

Thanks a lot for continuing to research this jerry. :)
IYO, would this solve the multi-window/tooltip desync problem that affects the current approach? That issue is still a problem for games and apps that are mouse-controlled, as a moving cursor tends to 'bleed over' and raise tooltips from tabs, address bar, etc, causing stutters when vysc is handed back and forth from system level to chrome.
I really hope this works out; it sounds like a much better approach.

briander/2: RE findings, yes, correct. Additionally:
* IDXGIOutput::WaitForVBlank: I also saw comments where people complained that
using it blocks all other DXGI calls until it returns (all unconfirmed).
* DwmFlush: I just found a situation where it does not return for over
3ms past vsync, which makes it no longer usable for the purpose for
finding 'near vsync'.
* D3DKMTWaitForVerticalBlankEvent: I personally tested on Win7 and Win81. The docs
say it has been there since Vista, which I have no reason to doubt, since Vista
is when DWM came to exist (and we know that DWM must vsync align efficiently).
sunn/3: I probably misunderstood your comment (about needing DWM to be disabled), but the beauty of D3DKMTWaitForVerticalBlankEvent is that it works *with* or *without* DWM, as D3DKMT is in the Windows kernel and sits even *below* DWM.
bajones/4: When you (or anyone @chromium.org) has serious cycles to work on this, just email me and I will turn over my prototype to jump start development. Using D3DKMTWaitForVerticalBlankEvent is crazy easy. The tough part is going to be how *best* to add this into Chrome.
TiAmIsT/5: I guess I am not 'close' enough to that even to even know.

1. I just confirmed it works under Vista.
2. Attached you will find inter-frame times from my prototype running on a Win81 box (specs at http://www.vsynctester.com/manual.html#testsetup)
In the prototype, there is an animation loop that calculates the next vsync wake up time and uses WaitForXXX on an event object and the appropriate timeout. As soon as it returns, it obtains a time (the frame time).
There is then a VSYNC synchronizer thread that loops calling D3DKMTWaitForVerticalBlankEvent and triggers the event object, which wakes up the animation loop.
win81-no-load is my prototype app running alone. The Windows default timer accuracy is 15.625 ms.
win81-load is my prototype app running, but with Canary also up and running the www.vsynctester.com animation (where Canary is using timeBeginPeriod(1)
The known precise inter-frame time for the Win81 box is 16.637ms.
not too bad?

Hi Jerry,
Just wanted to mention that you should be able to upload to Rietveld (our
code review tool) any patches you have for Chrome code. It would be great
if you could do this, even if these patches are 100% experimental and full
of hacks because it allows us to better see your ideas in code, replicate
your results and save round trips when we have questions. We can then work
on a real implementation if they prove to be useful.
If you have stand alone code, it would also be awesome if you could upload
the source to GitHub or similar code repository. This would importantly
allow us to understand if we can map your solution into the Chromium
codebase or if we need a very different approach.
Thank you very much for your work here, it is very much appreciated. Our
aim is to make Chrome the *best* browser out there but we have a long way
to go before this is the case here.

https://gist.github.com/anonymous/4397e4909c524c939bee shows example code using D3DKMTWaitForVerticalBlankEvent.
Chromium needs to figure out how to 'best' unblock the task (cause it to run immediately) that ultimately runs requestAnimationFrame(), based upon an (additional/optional) 'external' signal, like from 'vsyncSignalAllWaiters()' in the example code above, instead of solely based upon a time estimate (which has micro-jitter) of the next vsync time.

Thanks to TiAmIsTiAm for testing the prototype (and forcing me to investigate, and I think solve, multi-monitor issues).
What I found is that DWM is crazy buggy (on Win7) -- that DwmGetCompositionTimingInfo() can return a qpcRefreshPeriod that looks like "16.xxx, 16.xxx, 26.xxx" in a repeating pattern, or that DWM can sometimes return 16.667 (the 1/60 constant, which seems more like a fallback value when there is some internal DWM error), when that is not the actual period.
The great news is that using D3DKMTWaitForVerticalBlankEvent works around all of the DWM bugs, because DWM is thrown out as a source for vsync information.
What does DWM do when there are multiple monitors running at different vsync frequencies? The Chromium source code suggests that DWM uses the frequency of the primary monitor. Does anyone know why DWM would not just drive each adapter at its native frequency? The prototype currently mimics the presumed 'sync to primary' behavior.
But with DWM off, the prototype syncs to adapter that the prototype window is (mostly) running on.
When there are no tests running, the prototype will display in its main window the monitor and ms/Hz that it thinks it should be (and is) vsync'ing to. So as DWM goes up/down, as the primary adapter changes, or as the window is dragged around (with dwm off), you can actively see it change.
Anyone willing to play around with the prototype can:
http://www.duckware.com/test/chrome/dwm-vsync-tests.exe
(source code has been sent to briander..., bajones, sunn... and mit...)

I'm having a pretty terrible time finding any actual Microsoft-written documentation of DWM behavior in multi-monitor scenarios. My documentation in the code was based on personal testing. I've got a 60hz monitor and a 120hz monitor connected to my Windows machine at work, and a 60hz monitor and 75hz monitor (an Oculus Rift) connected to my machine at home. They both exhibit behavior of limiting refresh rates on the higher refresh monitor unless that monitor is set as the primary device.
It's encouraging to hear that D3DKMTWaitForVerticalBlankEvent seems more reliable than the DWM provided values.

Under Windows, Canary 43.0.2353.0 (r323184) (April 1) dramatically improved VSYNC accuracy from 1ms to nearly 100% spot on (only on AC; not on battery). When I asked around, nobody responded, so today I took the time to track when the change took place -- it was introduced between r323177 and r323182:
https://chromium.googlesource.com/chromium/src/+log/0bd2a738f107ad7021c70bfd36ae41e4565fe946..cdb7395d70f3f04fe91c75b39e67cca7abc8251f
What stands out is "Truncate the timeout of WaitableEvent::TimedWait on Windows" (waitable_event_win.cc):
https://codereview.chromium.org/1040833002
The side effect on VSYNC accurary (when the PC is on AC power) is very positive -- but at the expense that there must now be "spin waiting" going on somewhere. On average, there will now be a spin wait of 0.5 ms sixty times/sec (during an animation) -- meaning 30ms of spin wait every 1 second (3% increased overhead for a single core).
Was this change 'known' -- an intentional change to improve VSYNC accuracy -- or a side effect?

The patch doesn't have a bug associated with the it, so it is unclear why it was done. I believe it is unlikely to be an intentional change related to vsync. It is more likely related to other latency, power saving or general clean up changes.
I don't see any extra spin wait occurring here, the patch just changes the code so we wake up early rather than late. The Chrome rendering system shouldn't be dependent on *running* code at exactly a given time, just getting data to the video card *by* a given time. There is no reason to delay/wait that extra 0.5 ms when the wakeup happens early.

mit/15: This is Windows only. See the attached performance charts for vsynctester.com running against r323177 and r323182. The '1ms' precision (inaccuracy) of the OS has been eliminated.
The jitter seen in r323182.jpg is mostly due to the jitter in the metrics coming out of DwmGetCompositionTimingInfo -- because sometimes Canary starts in perfect 16.666 mode, ignoring DWM timing info, the line virtually perfectly flat; as it is in attached 60Hz.
I say 'spin waiting' because that logic is ALL over the place in Chrome in the form of 'if a deadline has not passed, continue to issue the Sleep/WaitFor/etc, until the deadline has passed'. (on return from waits, the deadline is checked *again* against current time). On an early return from these wait functions, that only results in a slew of 0.xxx ms being passed into WaitForSingleObject(), which is (now) truncated to 0ms which "If dwMilliseconds is zero, the function does not enter a wait state if the object is not signaled; it always returns immediately."
There is nothing wrong per se this new spin wait behavior. It is actually a very effective and cheap (logic wise) way to achieve nearly perfect VSYNC. The only issues are (1) was it intended, and (2) it does not work well 'on battery' where timer precision is often greater than 1ms (so the line is no longer flat even with spin waiting), and it reduces battery life, which maybe is not an issue given the low priority given to battery issues (no progress on issue 439751, reporting that issue 153139 is broken).

Regarding the spinning mentioned in comment #16, that should now be avoided starting with crrev.com/2086123002.
I would love to see us syncing to the actual vblank. I believe that IE/Edge treat setInterval/setTimeout values of near 16-17 ms as if they are requesting vblank synchronization, and I think it would be appropriate if we did the same.

I am going to experiment with D3DKMTWaitForVerticalBlankEvent.
The idea for now is to invoke this on GPU side, on a dedicated thread, on demand from PassThroughImageTransportSurface::StartSwapBuffers and see what kind of timing I get from this.

stanisc/29: Thanks!
And since D3DKMTWaitForVerticalBlankEvent is Windows only, I have a question I hope someone else out in the community can answer. For a long time, Chrome (Windows only) would not vsync properly until the Chrome app was resized (recently fixed; see issue 465356 and issue 632785 ). But the curious thing is that Chrome (before the app resize) had a one frame of input lag (but no vsync). After resizing the Chrome Window, Chrome had a two frame input lag (but vsync worked).
--> What changed in Chrome that caused the extra frame of input lag? And more importantly, is it possible to have both: (1) keep vsync, and (2) revert back to one frame of input lag?
I bring this up now, only because I wonder if this is caused by vsync timing and when Chrome sends frames to the OS, and how the OS (Windows) then composites and sends those frames to the screen. If Chrome swaps buffers based upon vsync, is that not 'too late' (under Windows only) since the Windows OS is compositing frames? If not, then great. But if it is too late, should Chrome 'swap buffers' be based upon a deadline that is maybe some split millisecond (0.5ms or something similar) *before* the next anticipated vsync event -- in hopes of getting the current Chrome buffer in the next Windows OS composited frame?

Chrome tries to use vsync to trigger when we start to generate a frame. If the entire process completes in less than 16 ms then the frame will be ready and should be presented on the *next* vsync. There is a lot of pipelining in the process (GPUs in particular are highly pipelined and get additional throughput when they buffer one or more frames), and DWM (Desktop Window Manager) can also add some latency.
Even video games (my previous career) usually have a few frames of latency from input to photons. VR apps work particularly hard to reduce latency because the problem of latency is much more severe in that context. See this article for thoughts on that:
http://oculusrift-blog.com/john-carmacks-message-of-latency/682/
I'm not sure what Chrome's input-to-photo latency is. I would like to measure it. There are tradeoffs (increased power, reduced scene complexity, increased code complexity) for pushing latency to extremely low levels so I don't think Chrome will try emphasize input lag as aggressively as VR apps do, but keeping it "as low as reasonable" is a worthy goal.

brucedaw, the surprising find is that Chrome presenting a rendered frame "on the *next* vsync", with Aero ON under Windows -- actually itself adds one frame of input lag.
When vsync happens, DWM has already swapped buffers at an OS level (starts the NEXT frame). Then Chrome acts on the vsync signal, and it is too late for Chrome's 'present' to make it to the CURRENT frame (it is now the NEXT frame, which does not make it to the screen until one frame later).
With Aero ON, presenting frames on vsync is the wrong present location.

stanisc, after having several offline discussions, I am now convinced that Chrome can both (1) use D3DKMTWaitForVerticalBlankEvent now and (2) later solve issue 658601 (there are several possible strategies).
If you have something you want tested regarding D3DKMTWaitForVerticalBlankEvent, let me know...

As a FYI, the attached graphs show why DwmFlush() is not suitable as a method to synchronize to VSYNC.
DwmFlush() 'wakes up' late even on a system with no load, and when the system is under load (running FishIE Tank in IE) , DwmFlush() wakes up really late.
[It is interesting to note that under the same load, Chrome today -- using timers -- performs *better* than DwmFlush]
Tested on a Dell Inspiron 15R notebook (Intel i7-4500U 1.80Ghz with 2 cores (4 threads), Intel HD Graphics 4400, 12GB, Windows 8.1) running the "High Performance" power plan.

So far I've been able to confirm a few things with my prototype:
1) No problem using D3DKMTWaitForVerticalBlankEvent from a dedicated background thread.
2) The latency of IPC calls from GPU process to Browser process seems reasonable on my dev workstation - 98.3% of calls are delivered within 1 ms so this has a good chance of being better than the timer based solution latency.
3) I don't want background thread to just generate VSync signals continuously and send them over IPC to the browser process - that would result in at least 60 idle wakeups on GPU process side and the same amount on Browser process side. So there should be a scheduling mechanism for waiting for VSync on GPU process side.
4) I tried a naive implementation where VSync waiting is triggered by each frame swap very similarly to how VSyncProvider is pinged to refresh VSync parameters in the current implementation. However that didn't work well and I realized I need a new IPC call from Browser to GPU to enable/disable VSync production.
5) I think we could and should use the existing GpuCommandBufferMsg_UpdateVSyncParameters IPC to deliver each VSync signal to the compositor code.
6) On the browser side this could be implemented as a new type of BeginFrameSource which generates a new BeginFrame signal every time it gets delivered a VSync IPC from GPU side.
The same BeginFrameSource class would be responsible for making an IPC back to GPU side to enable/disable VSync production there.
I am figuring out how to integrate this with the existing Compositor architecture and whether some refactoring would be needed to accommodate this.
7) There are some parts of code that still have to know VSync parameters (recent VSync basetime and interval) that are regularly updated by the current implementation (see UpdateVSyncParameters). The basetime for each VSync signal can be generated on GPU side right after D3DKMTWaitForVerticalBlankEvent returns from wait. The interval will probably have to be calculated on GPU side from the last few timestamps.

stanisc/37: Great news. Thanks for your time for working on this.
Some tips on using vsynctester.com to help in comparing before/after implementations...
(1) There is a new "Use rAF time arg as frame time". Under Chrome, the time argument to the rAF callback is the vsync time from Windows/DWM. And because this is so tightly grouped (microseconds), it is hard to see variation, so there is now the ability to set the graph scale. See attached for an example of 50 microsecond inter-frame jitter in Chrome today on a notebook computer. You should also be able to replicate a similar tight grouping when using D3DKMTWaitForVerticalBlankEvent(), even under high system load (all cores/threads maxed out). [tip: if not, review priority of background thread]. In my testing in a native Win32 app, timings from D3DKMTWaitForVerticalBlankEvent() wake up times mimic the tight grouping of times coming from DWM even under load (but the grouping range is computer specific).
(2) In Chrome, the 'late' line at vsynctester.com effectively graphs the time from vsync wakeup until rAF callback. Chrome a while back (due to a bug) spin-waited for the vsync time and was spot on. When I went back and tested against that version (r389148), I see a 200 microsecond delay for 'late' (attached) on one system, and a 100 microsecond delay on a second system.
(3) any 'power plan' in effect can greatly affect timings. I tested under 'High Performance' for best results.

I've got the prototype working end-to-end on my Windows 10 workstation and on a test Windows 10 laptop. The results look promising so far - see snapshots from vsynctester.com made with the prototype vs. Stable build of Chrome. These screenshots were captured on a workstation with a large number of cores.
I'll share more details in the next couple of days.

Here is the end-to-end prototype which gets VSync timing from D3DKMTWaitForVerticalBlankEvent running on GPU process on a separate thread.
This is still work in progress but should give an idea of what I am trying to achieve. For now this builds on Windows only.
https://codereview.chromium.org/2555173003/

Sadly the GPU VSync solution with D3DKMTWaitForVerticalBlankEvent waiting for VBlank doesn't seem to work well on my Windows 7 workstation with NVidia GPU.
D3DKMTWaitForVerticalBlankEvent itself works fine and returns from wait every 16.6 ms as expected. But all other graphics calls seem to freeze / run very slowly resulting in a super slow refresh rate - about 2-3 frames per second.
Looking in a chrome trace profile it seems other GPU related tasks align with D3DKMTWaitForVerticalBlankEvent finishing the wait. Also I've tried replacing D3DKMTWaitForVerticalBlankEvent call with a simple Sleep(16) and that resolves the freezing.
I don't know if this is specific to all Windows 7 clients or just the ones running NVidia driver. I might try to update my graphics driver to see if that resolves the issue. But this is very concerning. Too bad I didn't test this on a Windows 7 machine earlier.
The same code works very nicely on my another Windows 10 workstation with NVidia GPU and on a Windows 10 laptop with Intel GPU.

The reason for locking described in c#46 is dxgkrnl.sys!DXGFASTMUTEX::Acquire call. Both GPU main thread and VSync thread run into this but the main thread is blocked way more.
Possible solutions for Windows 7 that come to my mind are:
1) Use DwmFlush which is reported in c#36 to have a higher latency
2) Use some sort of combined timed wait and D3DKMTWaitForVerticalBlankEvent so that D3DKMTWaitForVerticalBlankEvent waits only for the final 0-1 ms. This might work but would be less reliable. And this solution won't allow us to stop increasing the system timer frequency which is one of the main goals of using GPU VSync signal instead of the timer based one.
3) Don't use GPU VSync on Windows7 and just stick to the current timer based implementation.

As a Win 7 experiment I replaced D3DKMTWaitForVerticalBlankEvent with DwmFlush in my code - the rest of the code is pretty much the same. It works fine on Windows 7 and vsynctester.com chart looks fairly the same as with D3DKMTWaitForVerticalBlankEvent on Win 10. But I tested this on a pretty beefy machine so I might not be seeing latency issues mentioned in c#36.
I now consider another small change to get actual frame timing right after finishing the wait using DwmGetCompositionTimingInfo. This would be similar to the approach taken by Mozilla developers (https://bugzilla.mozilla.org/show_bug.cgi?id=1127151). This is already implemented in VSyncProviderWin so basically this code just needs to call VSyncProviderWin after finishing the wait using D3DKMTWaitForVerticalBlankEvent on Win 8+ and DwmFlush on Win 7 and it should get back accurate vsync timestamp and vsync interval.

Another great test for web browsers (Chrome) that pass the actual vsync time as the time argument to the rAF callback....
Visit vsynctester.com, check "Use rAF time arg as frame time", wait 20 seconds, check "locked", then uncheck "Use rAF time arg as frame time" -- and then the blue line effectively shows the delay from *true* vsync until the rAF callback -- so a great way to compare how well/fast timers / DwmFlush() / D3DKMTWaitForVerticalBlankEvent are working (or not). Especially when you put the system under a load that maxes out all cores.

So this assumes rAF callback is called with the actual vsync timestamp, right?
I don't know if that is the case but I'll look into that.
I've been adding my own tracing events that measure the latency from the vsync time reported by D3D (DwmGetCompositionTimingInfo) to the moment vsync gets handled on GPU and Compositor. This should help me to compare the latency between different implementations.

Yes. As of https://codereview.chromium.org/787763006 (Chrome 45.0.2415.0 and later), the rAF callback time has been the vsync time. FF tries, but gets it wrong (they intentionally fake the time argument). So being able to graph and see that difference is a great tool.
tip: the vsync times you get from DwmGetCompositionTimingInfo can be anywhere from 2 frames behind to two frames ahead of real time. The easy way to deal with this is to simply conform the current time to the last vsync time (using the Dwm numbers), similar to the formula seen in section 4 of http://www.vsynctester.com/firefoxisbroken.html

Looking at call stacks in ETW profile it appears on Windows 7 D3DKMTWaitForVerticalBlankEvent is essentially the same as IDXGIOutput::WaitForVBlank - both end up calling gdi32.dll!NtGdiDdDDIWaitForVerticalBlankEvent (see callstack here - https://bugzilla.mozilla.org/show_bug.cgi?id=1199468#c8).
As mentioned in comment #7 above:
* IDXGIOutput::WaitForVBlank: I also saw comments where people complained that
using it blocks all other DXGI calls until it returns (all unconfirmed).
That doesn't seem to be an issue on Windows 8+. I am going to initially limit D3D VSync implementation to Windows 8+ and keep the timer based VSync on Windows 7 until we find a better solution.

The problem mentioned in comment above is due to Direct Composition being disabled as a driver workaround. That results in taking a codepath that doesn't turn GPU VSync feature on the GPU process side (while it gets turned on on the browser process side).
This will be fixed in the next patch (currently in code review).

stanisc, Thanks! It now runs on Win 8.1. This just missed today's Canary, so once it hits tomorrows Canary, I will provide some performance comparisons. Early testing on snapshot builds looks very good.

stanisc, Running Chrome with the new "--enable-features=D3DVsync" feature is looking very good.
One very minor issue I just noticed is with multiple monitors in 'extend the desktop' mode is that the time argument passed to the rAF callback *drifts* (check 'late' box at vsynctester.com to see the drift; best seen when displays are different Hz) -- which I believe is the frame time Chrome uses internally.
Is this issue because the code is still using DWM timing information (which I believe is based upon whatever the 'primary' display is)? Comments in the code imply this.
Would a solution be to simply use the wake up time from the 'wait for vsync' thread as the frame time?

In the early prototype of this feature the code used the current time at wake-up as v-sync frame time, but later I replaced that with a code that takes it from DwmGetCompositionTimingInfo because that seemed more accurate. The current time was typically about 30 microseconds behind the time reported by DwmGetCompositionTimingInfo.
I tested the early prototype on multi-monitor setup with one monitor running with custom resolution @ 50 Hz and it worked correctly with vsynctester.com.
But I can see the problem you are describing now. I guess DwmGetCompositionTimingInfo still returns v-blank timestamps for the main monitor.
I could go back to using the current time which would be slightly less accurate in a regular single-monitor case. I'll see what else could be done to address this.

The rules for multi-monitor with dwm are a bit weird - see https://msdn.microsoft.com/en-us/library/windows/desktop/hh437350(v=vs.85).aspx . DWM apparently times when it tries to draw based on the primary monitor, though when the flip happens is probably based on the actual monitor that's being presented on. We've seen this be a big problem when having a higher framerate (e.g. 120Hz) on the non-primary monitor.
Though it's possible they've fixed that in windows 10.

[This comment corrects/replaces comments 71/72/74/75 -- to document how Windows DWM works with multiple monitors in 'Extend' mode]
REFERENCES: Thanks to jbauman for pointing to https://msdn.microsoft.com/en-us/library/windows/desktop/hh437350(v=vs.85).aspx – which spells out the steps used by the "DirectComposition composition engine", and talks about dwm.exe and DWM -- so the presumption is the doc is talking about how DWM works. Also, review https://youtu.be/E3wTajGZOsA, which discusses the presentation modes in Windows, by Jesse Natalie.
DWM RUNS AT HZ OF PRIMARY MONITOR: The Windows DWM compositor operates at the frequency/Hz of the primary monitor -- and that is the frequency that DWM then 'presents' to the secondary monitor (regardless of the Hz of the secondary monitor).
DWM PRESENTS TO MONITORS ON VSYNC: But there is no 'tearing' on the secondary monitor even when the Hz does not match the primary monitor Hz, which implies that DWM updates all monitors on the vsync of the individual monitor (this was confirmed by MS). This means that on the primary monitor, after DWM composites, that the DWM 'present' has to wait for nearly an entire frame for the next vsync.
DWM IN SUMMARY: DWM wakes up the compositing loop on vsync of the primary monitor, composites all monitors, and then 'presents' to each monitor, which makes it to the monitor display on the NEXT vsync of the monitor (may be nearly an entire frame later).
DWMFLUSH: This DWM behavior now fully explains why DwmFlush() sometimes returns well after vsync – because it returns after DWM 'presents' (after composition).
WHY GAMES ARE NOT AFFECTED: Games take advantage of certain D3D modes (fullscreen) that bypass DWM.
CHROME+PRIMARY: When Chrome is run on the primary monitor (regardless of a dual monitor or not), Chrome can always successfully vsync to the primary monitor. The corollary to this is that if you are running Chrome on a secondary monitor and vsync is not working, just change the secondary monitor to be the primary monitor to 'fix' vsync problems. Annoying, but it works.
IE+SECONDARY: IE syncs to the primary monitor, even when run on the secondary monitor. So when IE runs on a secondary monitor that operates at a different Hz than the primary, there is horrible jank. I tested at 60Hz / 50Hz. The reason the for the jank is the interference pattern created by the two Hz – where a display frame is not receiving exactly one rendered frame.
CHROME+SECONDARY: When Chrome release is run on a secondary monitor in a primary=60 secondary=50 situation, vsynctester.com shows a very messy inter-frame graph, but the VSYNC indicator seems to work. It shows that Chrome is attempting to vsync at 60Hz, but runs at (an average; inter-frame has large spikes) 50fps. The presumption is that the Present(1) in ANGLE is syncing to 50Hz and not 60Hz? Is this then back pressure from the 'GPU' syncing to the 50Hz monitor?
D3DVSYNC+SECONDARY: This 'seems' to work with vsynctester.com in a primary=60 secondary=50 situation. Won’t know for sure until issue discussed in comment #69 is fixed.
ANGLE VSYNC + OTHER ISSUES: Testing the new "D3DVsync" feature is complicated by the fact that Chrome actually has a secondary vsync method which is *always* turned on (ANGLE vsync) -- it can not be turned off. It would sure help to validate the new "D3DVsync" feature if ANGLE vysnc could be selectively turned off ( issue 693761 ). And resizing the Chrome app window still changes something regarding vsync in Chrome, so this still plays some role in things (see issue 480361 )?
HOW I TESTED: Notebook computer with an internal display operating at 60Hz. Notebook connected via HDMI to a Vizio HDTV running at 50Hz. To set this up, right click on desktop; Screen Resolution; click on HDTV icon; click advanced settings; monitor tab; select 50Hz.
CAVEATS: When I use chrome://tracing to attempt to look at how ANGLE vsync was affecting things, sometimes it was clearly on and sometimes off. I think the second tab was interfering with results.
DWM BEHAVIOR MAY BE CHANGING: When I discussed the issue with some MS people, they pointed out the DWM does not work well when primary/secondary Hz are *different* values and that "There is an RS3 feature to better enable DWM to handle this situation".
--> Does anyone have contacts within MS to investigate this and confirm / plan for this upcoming feature?

Guys if you could also test what happens after the laptop or tablet is
undocked and primary switches from plugged in monitor to built in monitor.
I'm getting vsync and touch issues every time with chrome in this
scenario. and am forced to reset to resolve them. win 10, surface pro 4,
chrome 56.0.2924.87
Cheers.

Update: D3DVsync can now be activated via a flag on chrome://flags.
One outstanding issue that still needs to be addressed is that the v-sync timestamp is kind of broken when running on a secondary monitor. The implementation waits for v-blank event for the monitor the chrome window is on, but then it gets the timestamp and interval from DWM which returns the timestamp synchronized with the primary monitor. So if the secondary monitor isn't exactly in sync with the primary one this causes the timestamp drift relative to the timing of v-sync events. Not good!
The possible alternatives are:
1) Always wait for v-blank corresponding to the primary monitor. This would be more or less in sync with the currently existing timer based v-sync.
2) Ignore DWM and get the now() timestamp from the v-sync thread when it is awaken by v-blank event. This will require this solution to also calculate v-sync interval based on a recent history of v-sync timestamps. Based on my experiments this will make the accuracy of RAF timestamp a bit worse compared to DWM.
3) A hybrid approach - use DWM timestamps when running on a primary monitor; otherwise use now() timestamp from the thread.
I am leaning towards #3.

The following revision refers to this bug:
https://chromium.googlesource.com/chromium/src.git/+/873b91f1c41220589c05e39a9424ffc99eab785a
commit 873b91f1c41220589c05e39a9424ffc99eab785a
Author: stanisc <stanisc@chromium.org>
Date: Wed May 17 01:08:31 2017
D3D V-sync: prevent timestamp drift on a secondary monitor
I got back some preliminary UMA data from Canary experiment that
confirm the timestamp drift relative to the timing of v-sync signal
which makes BeginImplFrameLatency2 UMA to be all over the place with
a distribution that is spread evenly in the entire 0 - 16667 range.
This happens because D3D V-sync signal is generated based on v-blank
event for a display that contains contains the window (the current
display), but the timestamp is obtained from DWM which is based on
the most recent v-blank timing for the primary monitor. So if a
secondary monitor frequency is even slightly different that causes
v-sync / RAF timestamp drift that is clearly visible on some websites
like vsynctester.com.
One possible solution is to capture the timestamp when v-blank event
is received, but that seems to be a bit less smooth than the DWM
timestamp. So the compromise is to use DWM timing only when running on
a primary monitor; otherwise use the v-blank wake-up timestamp.
I've verified that this fixes BeginImplFrameLatency2 UMA distribution on
my setup where the secondary monitor refresh rate seems to differ from
the primary monitor by about 0.15 Hz.
BUG=467617, 680639
CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel
Review-Url: https://codereview.chromium.org/2874833003
Cr-Commit-Position: refs/heads/master@{#472279}
[modify] https://crrev.com/873b91f1c41220589c05e39a9424ffc99eab785a/gpu/ipc/service/BUILD.gn
[modify] https://crrev.com/873b91f1c41220589c05e39a9424ffc99eab785a/gpu/ipc/service/gpu_vsync_provider_win.cc

Another consideration is variable refresh rate.
The brand new Apple iPad now supports custom refresh rates (24Hz, 48Hz) all the way up to 120Hz. Also, on the PC, it's possible for refresh rate to vary while Chrome is running. It is expected that (eventually) Apple will probably end up supporting variable refresh rate within the Safari web browser, at least to a limited extent (e.g. full-screen HTML5 video playback, playing back 24fps videos at 24Hz) and potentially full screen WebGL.
For example, a windowed videogame running in GSYNC/FreeSync (or HDMI 2.1 VRR / VESA Adaptive-Sync) next to a Chrome window. requestAnimationFrame() rate varies with the framerate, but the framebuffer flipping is erratic. One can obtain a GSYNC monitor (e.g. www.blurbusters.com/gsync/list), enabled windowed GSYNC, run a game window alongside a Chrome window, and reproduce variable refresh rate stutter problems with Chrome. However, variable refresh rate support should be natively baked-in.
As a W3C Web Platform Working Group, Invited Expert, I'm collaborating on a standardization of support for variable refresh rates, see current ongoing work at https://github.com/w3c/html/issues/375 -- see the proposal at the bottom.

So far this effort has been more about the internal v-sync signal that is used for requestAnimationFrame and for kicking off BeginFrame events.
Frame buffer flipping is a related but a separate issue that we need to address too. Chrome sets swap interval to 1 meaning that frame buffers should be swapped on next v-blank. But the implementation relies on DWM to do that which in the case of multi-monitor setup seem to be tied to the primary monitor. At least that is my understanding.

Best solution regarding multiple monitor vsync would be to petition Microsoft to add multi monitor vsync support the Windows DWM.
I think they said somewhere that they will maybe change the DWM to support it in redstone 3 or later.

fxyydd, curious what results you get when you run the dwm-vsync-tests.exe from comment #11 above (just look at Hz displayed in the window of that app) and drag the app between primary/secondary displays?

fxyydd, please be specific. What hz is each display (primary/secondary) running at, or set to (display properties), what what hz do you observe (vsynctester) while running Chromium on each display (with and without D3D V-sync)?

I can't reproduce it with the dwm-vsync-tests.exe, but it's behaving a bit better than on older builds.
My primary display is at 144hz with secondary at 75hz, Running chromium on both displays without D3D V-sync behavior is normal with expected results of 144 & 75 fps/hz.
Running it with D3D V-sync flag the refresh rate seems to display correctly now on latest builds but fps is bouncing between 115-125 on the 144hz display and constantly at 50fps on the 75hz one.
It also seems that fps is affected by other programs running in the foreground, like running a fullscreen exlusive program like MPC-HC on secondary monitor actually fixes the issue and makes it behave correctly with 144/75fps while in windowed it actually throttles it to the monitor refresh rate. This behavior started appearing with the Redstone 16215 builds and it seems it still is not fixed on the RTM Candidate build 16299.15.
I'm actually unsure if the problem is with the implementation of vsync or if the fault lies with Microsoft.