Enter FCAT

In our comprehensive look at stuttering and FRAPS, we laid out what our ideal method would be for measuring frame intervals. Ideally we would like to be able to tag a frame from the start of the rendering pipeline to the end, comparing frames as they come in and out of the rendering pipeline by time stamping frames and then comparing the intervals in those time stamps to the intervals between the frames at the end of the rendering pipeline when they are displayed. Ideally, these two intervals would match up (or be close enough), with the simulation time between frames coming at an even pace, and the frame interval itself coming at an even pace.

Of course in the real world this isn’t quite impossible, but it’s highly impractical due to the fact that it requires the participation and assistance of the application itself to write the time stamps (by the time draw calls are being made, it’s too late). In lieu of that, simply being able to look at the end of the rendering pipeline would be a major benefit. After all, the end of the rendering pipeline is where frame swaps actually happen, and it is the position in the rendering pipeline that best describes what the user is seeing. If FRAPS isn’t enough because it can only see the start of the rendering pipeline, then the logical next step is to look at the end of the rendering pipeline instead.

This brings us to the subject of today’s article, FCAT, the Frame Capture Analysis Tool.

As we mentioned in our look at stuttering yesterday, as it turns out both NVIDIA and AMD agree with the fundamental problem of trying to judge frame intervals from the start of the rendering pipeline. For the past couple of years NVIDIA has been working on an alternative tool to measure frame latency at the end of the rendering pipeline, and at long last they are releasing this tool to reviewers and the public. This tool is FCAT.

So what is FCAT? FCAT is essentially a collection of tools, but at its most fundamental level FCAT is a simple, yet ingenious method to measure frame latency at the end of the rendering pipeline. Rather than attempting to tap into the video drivers themselves – a process inherently fraught with problems if you’re intending to do it in a vendor-neutral manner that works across all video cards – through FCAT NVIDIA can do true frame analysis, capturing individual frames and looking at them to determine when a buffer swap occurred, and in turn using that to measure the frame interval.

How FCAT Works

So how does FCAT work? FCAT is essentially a 2 part solution. We’ll dive into greater detail on this in part 2 of our FCAT article, but in summary, due to the inner-workings of video cards, monitors, and PC capture cards, both monitors and PC capture cards work at fixed intervals. Regardless of the frame rate an application is running at, most PC LCD monitors operate at a 60Hz refresh interval. In the case of v-sync this means buffer swaps are synchronized with the refresh interval (which among other things caps the framerate at 60fps), but when v-sync is disabled, buffer swaps can occur in the middle of a refresh. As a result any given refresh interval can be composed of multiple frames. This makes it possible to display well over 60fps on what’s otherwise a 60Hz monitor, with the end result being that multiple frames can be in one refresh interval.

PC capture cards work on the same principle, and just as how a monitor would refresh at 60Hz a PC capture card will capture at 60Hz. The end result being that while a PC capture card can’t see more than 60 whole frames, it can see parts of those frames, and being able to see parts of frames is good enough. In fact it sees the same parts of those frames that a user would see, since the 60Hz refresh rate on a monitor causes the same effect.

Ultimately by capturing frames and analyzing them, it is possible to tell how many frames were delivered in any given refresh interval, and furthermore by counting the time between those partial frames and comparing it to the refresh interval, it is possible to compute just how long the frame interval was and how long any individual frame was visible.

Of course doing this on a raw game feed would be difficult in the best of situations. As a simple thought experiment, consider a game where the player isn’t moving. If nothing changes in the image, how is one to be able to tell if a new frame has been delivered or not?

The solution to this is in the first-half of FCAT, the overlay tool. The overlay tool at its most basic level is a utility that color-codes each frame entering the rendering pipeline. By tagging frames with color bars, it is possible to tell apart individual frames by looking at the color bars. Regardless of the action on the screen (or lack thereof), the color bars will change with each successive frame, making each frame clear and obvious.

On a technical level, the FCAT overlay tool ends up working almost identically to video game overlays as we see with FRAPS, MSI Afterburner, and other tools that insert basic overlays into games. In all of these cases, these tools are attaching themselves to the start of the rendering pipeline, intercepting the Present call, adding their own draw commands for their overlay, and then finally passing on the Present call. The end result is that much like how FRAPS is able to quickly and simply monitor framerates and draw overlays, the FCAT overlay tool is able to quickly insert the necessary color bars, and to do so without ever touching the GPU or video drivers.

With the frames suitably tagged, the other half of the FCAT solution comes into play, the extractor tool. By using a PC capture card, the entire run of a benchmark can be captured and recorded to video for analysis. The extractor tool in turn is what’s responsible for looking at the color bars the overlay tool inserts, parsing the data from a video file to find the individual frames and calculate the frame intervals. Though not the easiest thing to code, conceptually this process is easy; the tool is merely loading a frame, analyzing each line of the color bar, finding the points where the color bar changes, and then recording those instances.

This ultimately results in a Tab Seperated Values file that contains a list of frames, when they occurred, the color bar they were attached to, and more. From here it is possible to then further process the data to calculate the frame intervals.

The end result of this process is that through the use of marking frames, capturing the output of a video card, and then analyzing that output, it is possible to objectively and quantitatively measure the output of a video card as an end-user would see it. This process doesn’t answer the subjective questions for us – mainly, how much stutter is enough to be noticed – but it gives us numbers that we can use to determine those answers ourselves.

Finally, for the purposes of this article we’ll be glossing over the analysis portion of FCAT, but we’ll quickly mention it. Along with the overlay and extractor tools, FCAT also includes a tool to analyze the output of the extractor tool, from which it can generate graphs, identify so-called “runt” frames, and more. The analysis tool is not strictly necessary to use FCAT – one can always do their own analysis – but the analysis tool does simplify the use of the suite by quickly and conveniently handling that last step of the process. We’ll get into the analysis tool in much greater detail in part 2 of our article, where we can apply it to our full suite of test results to better understand what it looks for and what it’s representing.

Post Your Comment

88 Comments

I'm not sure you realize how development of silicon based semiconductor product works. This is not on the scale of a planting season where you put down some seed an later that year you harvest your crop. You start off with the specification phase of the product, you move to development of the hdl and verification of everything and then you go to nre and silicon samples. This is a multi-year process. AMD may well have known about this issue just as long as nvidia but silicon products don't go from specification to product overnight. That is why vendors offer driver/firmware/microcode updates. As to taking a pot shot at AMD laying off R&D people, it's called a business decision. Sometimes you need to let people go so that the rest of the employees can remain employed. Otherwise you can end up facing bankruptcy and massive layoffs. I don't know if it's the right move at this time or not, and I'm no financial analyst. Currently nvidia has no gpu chips directly that share a piece of silicon with the cpu, unlike AMD and intel. The next generation of consoles seem to have gone with AMD. I haven't heard of any big chipsets from nvidia. Anyway, sorry for the long response. I believe AMD, Intel, and nvidia all have strong points and areas where they can improve. This seems like it might be largely a driver issue and I'll admit that AMD seem to have had issues with drivers. Your post just seemed like an easy shot against AMD. I am not affiliated with AMD, nvidia, or Intel and the views expressed in this post are my own opinions and not those of any current or previous employer.Reply

From what I heard/read somewhere, Nvidia new of such a problem back in their 8000 series card(8800gtx/gts/gt). So its something they knew about for a long time. Nvidia has hardware built in to their cards to keep the cards in sync frame wise where as AMD doesn't. Some time of 2 years for AMD to fix it properly as well as a software driver fix will most likely cost some fps.Reply

I'd like an AnandTech investigation of a slightly different issue:How fast does Apple update the screen on iPads and iPhones when displaying movies? Specifically, when displaying movies, do they switch to updating the screen only every 24th of a second?

The reason I think this is an interesting question is that, in my experience, movies displayed on iPad show none of the stuttering when panning that is so obvious on both TVs and computers, stuttering which is, as far as I can tell, generated by the 3:2 pullup. (I don't have a 120Hz TV, so I can't comment on how well they deal with this.

So we have the visual suggestion that this is what Apple is doing, along with the obvious point that it would presumably save power to only refresh the screen at 24Hz (though the power savings may be negligible).

I must admit I would find very interesting an investigation of this (perhaps by similar techniques to what are being used here, like a movie consisting of a color coded sequence of frames, and time-stamped video capture; though you'd probably have to use an external video camera.)

And this is not just an Apple specific issue; it would be interesting to know if Android and MS likewise are capable of displaying stutter-free movies on their mobile devices (unlike on the desktop where, sure, you have far less control, but for fsck's sake --- can't you at least do the job right in full screen mode?)Reply

Wow, so this issue of stuttering has been talked about amongst users for at least a decade, then Scott Wasson from techreport decides to run a test of latency just weeks before Nvidia releases their super duper new tool to test latency?......and Nvidia have been working on it for 2 years? What a coincidence!! ...and NVidia cards perform better in this regard? Double coincidence!! So does that mean Nvidia is a benevolent company who wants to help AMD fix their stuttering issues so AMD can sell more cards?? Wow they must be saints!! We can add this to Nvidia's long list of open, honest and transparent business practices.Reply

Not sure if joking or troll.... Scott's latency tests started in their August 23, 2012 article entitled "Inside the second: Gaming performance with today's CPUs". That's not "just weeks" ago. Second, if NVIDIA thinks Scott's FRAPS tests are so awesome for them, why would they bother to release a tool that measures at an entirely different point in the pipeline? Your conspiracy theory is not only factually wrong, it doesn't even make a good conspiracy theory.Reply

I also agree with you and Spoelie. Simulation step stutter is also an issue that should be covered. It would be really nice to get simulation timestamps directly from the output of the simulator and match them up to their corresponding output frames. However, this would probably require collaboration with game developers that you probably won't get. Until then, using a tool that works at the output of the renderer (like FRAPS) and can associate simulation steps with output frames would be nice.

That said, there is really little that AMD or Nvidia can do to fix issues in the application other than trick it into working correctly. These results would be more useful for game developers developing new engines. Also keep in mind, simulation steps are tied loosely (through queues) to GPU's processing capabilities (unless the bottleneck happens to be before the command is dispatched to D3D). Simulation steps should be roughly equal to frame times on average. If the GPU processing were completely consistent, then the latency between the simulation step and output would be fixed and the output would appear smooth. It is variations in frame times that cause variations in simulation steps. On average, the variations of each should be roughly equal. The worse case stutter, which should be something like double the frame time variation (when simulator is compensating it the opposite direction as the frame time variation), is what we need to look out for. That said, variation in frame time is generally smaller than frame time itself. I would suggest that simulation step stuttering is a smaller problem than frame time stuttering and becomes smaller as frame times get shorter. Point of interest, Nvidia's Frame metering may actually increase simulation step stuttering.Reply

I've just added FCAT overlay rendering support to OSD server of MSI Afterburner and EVGA Precision. Still need some time to discuss exact RGB color sequence with NVIDIA, then I guess we'll release it to public.Reply