Visual Computing Still Decades from Computational Apex

The Truth

There are few people in the gaming industry that you simply must pay attention to when they speak. One of them is John Carmack, founder of id Software and a friend of the site, creator of Doom. Another is Epic Games' Tim Sweeney, another pioneer in the field of computer graphics that brought us the magic of Unreal before bringing the rest of the gaming industry the Unreal Engine.

At DICE 2012, a trade show for game developers to demo their wares and learn from each other, Sweeney gave a talk on the future of computing hardware and its future. (You can see the source of my information and slides here at Gamespot.) Many pundits, media and even developers have brought up the idea that the next console generation that we know is coming will be the last - we will have reached the point in our computing capacity that gamers and designers will be comfortable with the quality and realism provided. Forever.

Think about that a moment; has anything ever appeared so obviously crazy? Yet, in a world where gaming has seemed to regress into the handheld spaces of iPhone and iPad, many would have you believe that it is indeed the case. Companies like NVIDIA and AMD that spend billions of dollars developing new high-powered graphics technologies would simply NOT do so anymore and instead focus only on low power. Actually...that is kind of happening with NVIDIA Tegra and AMD's move to APUs, but both claim that the development of leading graphics technology is what allows them to feed the low end - the sub-$100 graphics cards, SoC for phones and tablets and more.

Sweeney started the discussion by teaching everyone a little about human anatomy.

The human eye has been studied quite extensively and the amount of information we know about it would likely surprise. With 120 million monochrome receptors and 5M color, the eye and brain are able to do what even our most advanced cameras are unable to.

With a resolution of about 30 megapixels, the human eye is able to gather information at about 72 frames per second which explains why many gamers debate the need for frame rates higher than 70 in games at all. One area that Sweeney did not touch on that I feel is worth mentioning is the brain's ability to recognize patterns; or more precisely, changes in them. When you hear the term "stuttering" or "microstutter" on forums this is what gamers are perceiving. While a game could run at 80 FPS consistently, if the frame rate varies suddenly from 90 FPS to 80 FPS then a gamer may "feel" that difference though it doesn't show up in traditional frame rate measurements.

In terms of the raw resolution though, Sweeney then posits that the maximum required resolution for the human eye to reach its apex in visual fidelity is 2560x1600 with a 30 degree field of view or 8000x4000 with a 90 degree FOV. That 2560x1600 resolution is what we see today on modern 30-in LCD panels but the 8000x4000 resolutions is about 16x that of current HDTVs.

According to the Nyquist theorem that debates the amount of information required to present the "good enough" result for a given resolution, game engines would need about 40 billion triangles per second to reach perfection at the 8000x4000 resolution. Currently, the fastest GPU for triangle processing can handle 2.8 billion per second and Sweeney claims we are only a factor of 50x from reaching that goal. That difference could likely be reached in another two generations of architecture which actually gives some credence to those that say the end is near.

But triangle processing hasn't been the primary focus of gaming engines for some time and it doesn't tell the whole story.

This article just made my day. Thanks Ryan! It makes me happy to see developers that care about innovating for the entire industry, not just for themselves. It does seem strange that the gaming industry doesn't have some kind of universal algorithm for AI, ect.
I also can't wait to see this engine! That screenshot looks gorgeous.

Not necessarily. We don't know what other visual technologies will come out in the next 16 years that will require much more horsepower from graphics cards (or whatever they may be at that time). Think of the past 12 years what all we have seen. Anti-aliasing, Anisotropic filtering, higher resolutions, multi-monitor, stereo 3D, etc. etc. Think of a high end monitor in 1996 that was 17" CRT/Trinitron and cost upwards of $1000 that was limited to 1280x1024 @ 60Hz. Now even the most basic monitor can do 1080P for under $200.

I'm pretty sure we are going to see some neat stuff by then that will continue to push the computational needs of cutting edge games and applications.

Do these statements consider that particle based effects are becoming more prevelant? We also need to consider that emulating/simulating reality is not the only goal, we are also interested in some unreal things that may require more graphics horse power.

Also, video games generally have a limited number of characters and creatures running around doing stuff. I have not seen any games or demos with hundereds of characters running around doing stuff in the same scene. That and most games do not have insects and birds flying around in realistic numbers.

It is one thing to render a photorealistic scene, but it is another to render a photorealistic scene with extremly high numbers of extremly complex elements.

And what about physics simulations. We may realize improvements well beyond a 2000x increase in GPU power.

Also, what if someone wants to make a game like minecraft with cubes that are 1x1x1cm? How much power would that take? Add in complex physics simlutations and photorealistic graphics.

Please note that I am not an expert. I would like to hear responses to this comments.

The reason you don't see lots of people in a scene and the scene being photorealistic is they had to one or the other, because otherwise 95%(made up stat, but A LOT of people) wouldn't be able to run it because they don't have the crazy extreme hardware(expensive multi-GPUs, expensive 6-Core+ CPUs - like the i7 used in the tests)

And I guess a game where you have lots of people, but not the best graphics would be almost any MMORPG, and any game you have great graphics and few people in a scene is almost any FPS.

The power needed for your new version of Minecraft is probably A LOT, and you may be better with a complete particle system since the "cubes" are so small, just a guess though I don't know how much horsepower you need for Minecraft in the first place because I have never played it. Although I did look up that on the PC it is using Java, so that would need to change for your situation because it probably is no where near fast enough

Particle simulations will be trivial before graphics are trivial. I don't know of any large simulations even in feature films which take longer than an 8k render. Many simulations are still done on single beefy machines. And GPUs are getting better and better at crunching fluid sims which are the most intensive particle simulations to date. Newtonian hard surfaces and even cloth simulations are already nearly real-time at respectable detail levels on a GPU.

As to hundreds of characters, again full photo-realistic rigs are nearly real-time today. I've seen some GPU skin algorithms that are realtime.

As to there being hundreds of thousands of characters, that shouldn't be a problem. That will be a limitation of RAM and RAM we can assume will follow processor speed in parallel. By the time we get to 8k true life rendering we'll almost certainly be raytracing and raytracing doesn't particularly care about triangle count so much as resolution as long as you can fit it into memory. With fast disk arrays and smart caching you can load and unload everything not in view. And LOD systems aren't going anywhere anytime soon.

You have made a factual mistake. At the bottom of page 1, you state that the best GPUs are capable of drawing 2.8 _million_ triangles per second, yet this is obviously wrong as the original crysis often had scenes displaying that many triangles (and the 8800 series gpus performed that about 30 times per second!). Secondly, if you refer to your link, it shows ~2800 _million_ (2.8 billion) triangles per second! Clearly we have come a long way! Although to do this at 72hz is a whole other matter ;). Keep in mind these are absolute peak numbers.

A better means of assessing how far we are to real time, truly photo real graphics, would be to examine the processing power of current rendering farms and frame render times for film CGI. I know there are such farms that are hundreds of times more powerful than the most powerful gpu, yet they can take _hours_ to render a single frame, much less 72 times per second!!

That's a terrible methodology for determining how far we've come in real time... On one hand, CPU's and GPU's are built around a completely different architecture and compute problem sets extremely differently.

On the other, we don't utilize an ENTIRE farm on a single frame of a movie because the network overhead would be immense as there's literally several gigs of light/geometry/image data derived/cached/read from disk and used to draw the final 2d pixel representation. (This is assuming a 3d render and not a composite which is even worse since now we're reading 50 passes of 20-50 megabyte/image files that we're applying/caching complex 2d and 3d operations on but I digress.) Honestly we generally render 1 frame on one machine... With this nugget alone you are now comparing render times of a 16-24 (32 now with sandy bridge!) core xeon machine against the thousands of cuda cores in strung up Teslas which is a moot comparison anyway since their cores are not analogous.

There's a reason why my industry is slowly building more gpu-centric tool sets.

Also, most of our calculations are unoptimized because we're lazy and don't read/apply the latest siggraph whitepapers on computer graphics as quickly as we should.

I think there is a lot more power needed! First off I want larger than 90 degree FOV. So let’s start off at least 24,000 x 4000 for three screens wide. Then to prevent micro stuttering you need to be able to consistently hit 90 FPS. Okay what about 3D. You need to either double the FPS or double the resolution of the monitors. Let’s talk about environments instead of a tree swaying in the breeze to be realistic each branch and unique leaf has to be fluttering in the wind with a forest of unique trees with all the unique ground clutter. Let’s not forget the artificial limit we have for on screen NPC’s and other characters. Let’s have an army of thousands of unique characters running, jumping, and fighting in that forest. Let’s start to introduce AI to each of those unique NPC’s Okay what about particle effects like smoke and dust swirling around that army in the forest and realistic damage for all of it. Now try and model all the noise and echo’s from all those interactions in 9.3 surround sound at 96khz or better. Let’s not forget to incorporate head, eye, and other motion capture controls without the lag and inaccuracy of things like Kinect. Okay NVidia, AMD, Intel, and Epic make it happen. Seriously I want a user experience that is like The Matrix where reality and gaming are indistinguishable from each other.

Let's face it. Gaming/Graphics computing won't reach its peak until we have an equivalent of the matrix. Complete, totally immersive virtual reality almost indistinguishable from reality. Then I think we could say we're done. :)

I'm not sure where these people get their information, but the 72FPS figure quoted is nonsense. The Primary Visual Cortex in the rear of the brain (casually known as V1) perceives visual information at 120Hz/FPS; anything slower than this will be perceived as 'not real', though one may not be aware of how or why. This kind of divide between knowing something instinctively (because your brain knows) and not being able to explain it (because your brain doesn't bother passing the information to the conscious mind) is well-established in all 5 visual cortices. Look up 'blindsight', for example, which is the ability for people who are blind because of a brain lesion (injury) to navigate around obstacles they can't consciously see: they 'know' where they are and what's around them, because their eyes and the non-lesioned parts of their brain still work, but they can't consciously see because the part of their brain that is lesioned is the one that would pass that information to the conscious mind. All this is information any neuroscience student could give you; it's a shame it's been misrepresented to so many here.

I had a good laugh at the whole 72 fps thing. When I wrote my 30 vs 60 fps article back in 1999, I used 72 fps as a basis for "good enough for most everyone" because back then 72 Hz on a CRT monitor was considered mostly flicker free. My thinking was wrong there, but that rumor of 72 fps has unfortunately persisted.

Considering the somewhat dual digital/analog nature of our visual system, it is also not terribly accurate to say, "Our upper limit is 120 Hz" because that is a purely mechanical definition that does not really reflect the reality of light and sight. With some components of the eye able to detect even a single photon of light, it can quickly detect changes in light reaching it. So saying, "We can only perceive up to 120 Hz/FPS" is again false. The limit is higher, and some research has shown that even upwards of 240+ fps gives a greater feeling of reality than 120 fps.

The long and short of it is this; the real limitation is most likely the visual cortex, but through training and immersion in higher fps/quality visual simulations the cortex will learn to recognize the differences in speeds and input quality.

Eh, yes and no to the bandwidth issue. The performance gains that we have seen through the years scale really nicely, and in fact have surpassed even the loosest interpretation of Moore's Law due to better advances in not just process technology, but the ability of the major graphics guys to radically change their architectures. If we compare/contrast GPU development over the past 10 years vs. that of CPUs, I think we can see that the change is much more radical on the GPU side rather than being stuck with the x86 CPU architecture.

Memory has also moved forward, but one area that you are discarding totally is that of the rise of large caches within the GPU. Yes, we still need as much bandwidth as possible, but now that we have started to really use mathematics based shading rather than pure texture based rendering, memory bandwidth is not as huge a concern as it once was. Not saying that it is trivial or unimportant, as people still expect high quality texturing in modern games. So, while GDDR-5 with a 384 bit bus will give a pretty impressive 250+ GB/sec of bandwidth, the much larger and more complex caches in the GPUs are doing a lot of the work... so it helps to minimize the impact that slower/higher latency off chip memory has on overall performance.

As time passes, we will see much larger and complex cache structures on GPUs, which will allow better performance even though external memory will always be behind GPU development.

I think we've already reached the point of dimishing returns with graphics. We just don't NEED photo-realism for games.

All those extra little touches to add realism will mostly just jack up the number of man/hours needed to develop a game, and we're already getting shorter gameplay experiences and longer development cycles, just so they can add extra bits of realism to the vast array of terrain and items that go into a game.

It hasn't escaped my notice that the most fun I've had with new games comes from titles that don't even come close to photo-realism, such as Recettear, Minecraft, and Torchlight.