For as long as I can remember talking about video cards and GPU performance at AnandTech, there has been debate over the type of benchmarks used to represent that performance. In the old days, the debate was mostly manufacturer driven. Curiously enough, the discourse usually fired up when one manufacturer was at a significant deficit in GPU performance. NVIDIA made a big deal about moving away from timedemos and average frame rates during the early GeForce FX (NV30) days, when its cards might have delivered a decent gaming experience but were slaughtered in most benchmarks. Even Intel advocated for a shift away from most CPU bound gaming benchmarks back during the early years of the Pentium 4 - again, for obvious reasons.

It’s a shame that these revolutions in gaming performance testing were always associated with underperforming products (and later dropped once the product stack improved in the next generation or two). It’s a shame because there has always been merit in introducing additional metrics in order to provide the most complete picture when it came to gaming performance.

The issue lay mostly dormant over the past several years. Every now and then there’d be a new attempt to revolutionize GPU performance testing, but most failed to gain widespread traction for one reason or another. Broad repeatability, one of the basic tenets of the scientific method, was usually cast aside in pursuit of a lot of these new attempts at performance testing - which ultimately limited acceptance.

A year and a half ago, Scott Wasson over at the Tech Report did something no one since Dr. Pabst was able to do: he actually brought about a revolution in the 3D game benchmarking scene.

The approach seemed ridiculously simple - we’ve all had the tools for so very long. Scott used FRAPS to record frame times, and would calculate how long every frame in a benchmark took to render. By focusing on individual frame latencies, Scott’s method could better characterize the little hiccups and stutters that would get smoothed out in an average frame rate. With the new method came a bunch of nifty graphs, and the world changed.

The methodology wasn’t perfect, as FRAPS lacks a holistic view of the 3D rendering pipeline, but it did reveal some surprising issues (in addition to spawning further work that uncovered even more issues on the multi-GPU front). Interestingly enough, many of the issues uncovered by this focus on frame times/latency seemed to primarily impact AMD hardware.

AMD remained curiously quiet as to exactly why its hardware and drivers were so adversely impacted by these new testing methods. While our own foray into evolving GPU testing will come later this week, we had the opportunity to sit down with AMD to understand exactly what’s been going on.

Although neither strictly a defense nor merely an explanation of what we’ve been seeing over the past year, AMD wanted to sit down and better explain their position. This includes both why AMD’s products have been impacted in the manner they were, and why at the same time (and not unlike NVIDIA) AMD is worried about FRAPS being given more weight than it should be. Ultimately AMD believes that it’s to the benefit of buyers and journalists alike to better understand just what is happening, why it’s happening, and just what the most common tools can and are measuring.

What follows is based on our meeting with some of AMD's graphics hardware and driver architects, where they went into depth in all of these issues. In the following pages we’ll get into a high-level explanation of how the Windows rendering pipeline works, why this leads to single-GPU issues, why this leads to multi-GPU issues, and what various tools can measure and see in the rendering process.

I agree with the double blinding idea. Techreport had some videos on the skyrim stuttering and I showed it my bro with the card names covered and he actually preferred the AMD card. Personally I though both of them were playable since the 240fps video exaggerated any stuttering issues. If you can't tell the difference in a 60hz or 120hz video/monitor there is no difference.

It would be nice if someone would develop an tool to measure the frames as they are being displayed, like as they are actually being viewed.Reply

The benefit of blindtest is twofold:It removes all the complexity involved in testing, and get to the point where it matters.Secondly we get an oppinion as to what the benefits of the game have, going to higher quality settings.Anand for much good, have the same staff, and we will get to know Ryan preferences in just a few rounds of testing.Then he can have a nice assistant changing the cards for him :)Reply

While not blind, HardOCP's maximum playable settings testing is done to capture this. They report min/avg/max/graphed FPS; but at whatever combination of settings gave the most eye candy while still being fast and smooth enough to be enjoyable. At times this has resulted in observations that "while the raw FPS numbers imply that turning on X should be doable the gameplay results indicated otherwise" (generally due to stuttering problems).Reply

"Playable" and "optimal" are different things; for the most part no one is suggesting the games and cards that have more problems with stuttering are "unplayable".

And, some people don't notice what bugs the fire out of others. Stuttering is one of those things. I think this has a lot to do with the fact that these problems have existed for quite awhile and people just got used to them, so kind of automatically ignore them.

So, I agree, if you don't notice it then it's not important. But if you do, then it is. :) I noticed this phenomenon years ago, and am very excited to see numbers that people can show quantifying the situation so that it can be discussed on more than a seat-of-the-pants level.Reply

The problem with double blinding is that some people notice more than others. If you're used to high end equipment on a 120hz monitor, you'll notice a hell of a lot more problems than dude off the street that normally plays on his laptop.Reply