I don't see what's wrong with this question -- it's perfectly natural to be curious about how other developers have accomplished certain things. We should be encouraging this sort of curiosity, not punishing it with close votes.
–
John FeminellaFeb 7 '10 at 17:31

2

@user146780: who asked the question... The best programmers I've met where working in CGI. Gurus from SGI, people working on paralelizing Adobe Photoshop, etc. People here don't realize how complicated it is to write a modern game nor how skilled these coders are. If you want a humbling experience look what the germans from Cryotech did with the Crysis engine. There are videos on Youtube. You simply won't believe it. It's not just about "using octrees". Typically these programmers are simply much more skilled than the average programmers. And you can bet that the GT4 coders are very good.
–
SyntaxT3rr0rFeb 7 '10 at 18:50

2

you got gta4 running at 60fps!? GW! gta4 is a P.O.S that runs quite poorly, I've heard Force unleashed does too. I'd say Euphoria is the culprit. honestly, "CPU usage" is a very poor way to compare, simply uncap the frame rate and see which one runs fastest, thats the proper way to do it. also, remember, this "complicated game" while it might render lots of stuff, there is still only a screen worth of stuff, and if it's rendered in the right order, you might end up with near the same amount of pixel work as your "simple" demo, and pixels work is really what kills it.
–
mattFeb 8 '10 at 7:11

Good question too. I had (and still have) a similar question in relation to font rendering ever since I laid my eyes on Windows 3.1. I have always been surprised that screenfuls of text are rendered so quickly?
–
Agnel KurianFeb 24 '10 at 10:55

7

You need a profiler that shows you how much the GPU (Graphics Processing Unit) is used. I bet GTA IV shows you ~99%, and the demo 3%.
–
0scarJul 7 '10 at 1:53

16 Answers
16

In general, it's because (1) the games are being optimal about what they need to render, and (2) take special advantage of your hardware.

For instance, one easy optimization you can make involves not actually trying to draw things that can't be seen. Consider a complex scene like a cityscape from Grand Theft Auto IV. The renderer isn't actually rendering all of the buildings and structures. Instead, it's rendering only what the camera can see. If you could fly around to the back of those same buildings, facing the original camera, you would see a half-built hollowed-out shell structure. Every point that the camera cannot see is not rendered -- since you can't see it, there's no need to try to show it to you.

Furthermore, optimized instructions and special techniques exist when you're developing against a particular set of hardware, to enable even better speedups.

On the second point, it's common for graphics API examples to fall back to what's called a software renderer when your hardware doesn't support all of the features needed to show a pretty example, like shadows, reflection, ray-tracing, physics, et cetera. This mimics the function of a completely full-featured hardware device which is unlikely to exist, in order to show off all the features of the API. Since the hardware doesn't actually exist, it runs on your CPU instead. That's much more inefficient than delegating to a graphics card.

but a demo is unlikely to be optimal about it.
–
µBioFeb 7 '10 at 17:25

2

@tur1ng, the teapot demo, for example, may have enabled reflection, shadows and other effects.
–
Nick DandoulakisFeb 7 '10 at 17:27

2

The teapot might have more polygons than a GTA4 scene. The fact is, the current bottleneck in graphic rendering is more texture effects like bump mapping derived techniques to add details and other post rendering effects.
–
KlaimFeb 7 '10 at 17:31

5

Textures - the teapot is being created from a large number of individual triangles all with normals and lighting interactions. What looks like an insanely complex 3d world in the game is often fairly simple large blocks covered with a detailed picture. A lot of the '3d' is clever shadow and perspective artistic effects in a static 2d image drawn on the 3d shape
–
Martin BeckettFeb 7 '10 at 19:02

3D games are great at tricking your eyes. For example, there is a technique called screen space ambient occlusion (SSAO) which will give a more realistic feel by shadowing those parts of a scene that are close to surface discontinuities. If you look at the corners of your wall, you will see they appear slightly darker than the centers in most cases.

The very same effect can be achieved using radiosity, which is based on rather accurate simulation. Radiosity will also take into account more effects of bouncing lights, etc. but it is computationally expensive - it's a ray tracing technique.

This is just one example. There are hundreds of algorithms for real time computer graphics and they are essentially based on good approximations and typically make a lot assumptions. For example, spatial sorting must be chosen very carefully depending on the speed, typical position of the camera as well as the amount of changes to the scene geometry.

These 'optimizations' are huge - you can implement an algorithm efficiently and make it run 10 times faster, but choosing a smart algorithm that produces a similar result ("cheating") can make you go from O(N^4) to O(log(N)).

Optimizing the actual implementation is what makes games even more efficient, but that is only a linear optimization.

I know that this question is old, but its exciting that no one has mentioned VSync!!!???

You compared the CPU usage of the game at 60fps to CPU usage of the teapot demo at 60fps.

Isn't it apparent, that both run (more or less) at exactly 60fps? That leads to the answer...

Both apps run with vsync enabled! This means (dumbed-down) that the rendering framerate is locked to the "vertical blank interval" of your monitor. The graphics hardware (and/or driver) will only render at max. 60fps. 60fps = 60Hz (Hz=per second) refresh rate. So you probably use a rather old, flickering CRT or a common LCD display. On a CRT running at 100Hz you will probably see framerates of up to 100Hz. VSync also applies in a similar way to LCD displays (they usually have a refresh rate of 60Hz).

So, the teapot demo may actually run much more efficient! If it uses 30% of CPU time (compared to 50% CPU time for GTA IV), then it probably uses less cpu time each frame, and just waits longer for the next vertical blank interval. To compare both apps, you should disable vsync and measure again (you will measure much higher fps for both apps).

Sometimes its ok to disable vsync (most games have an option in its settings). Sometimes you will see "tearing artefacts" when vsync is disabled.

I agree, to get a better comparison you should disable vsync. However, the root of the issue still stands. 30% for the teapot vs 50% cpu usage for the game is a smaller difference than one might normally expect. But I remember in the early days when environment mapping just started to become popular and the cool nVidia demo at the time was an environment-mapped teapot spinning around. Those demos usually wouldn't even hit 60 fps back in the day. I guess my point is that these teapot demos often push the boundaries of a new visual effect.
–
Steve WorthamJan 18 '11 at 19:23

Interesting read. I was aware about the limit of monitor refresh rates, but I didn't know that vsync should be disabled to see how fast things could go! :)
–
Nick MillerJan 27 at 21:22

Perhaps the best example (certainly one of the best known) is Id software. They realised very early, in the days of Commander Keen (well before 3D) that coming up with a clever way to achieve something1, even if it relied on modern hardware (in this case an EGA graphics card!) that was graphically superior to the competition that this would make your game stand out. This was true but they further realised that, rather than then having to come up with new games and content themselves they could licence the technology, thus getting income from others whilst being able to develop the next generation of engine and thus leap frog the competition again.

The abilities of these programmers (coupled with business savvy) is what made them rich.

That said it is not necessarily money that motivates such people. It is likely just as much the desire to achieve, to accomplish. The money they earned in the early days simply means that they now have time to devote to what they enjoy. And whilst many have outside interests almost all still program and try to work out ways to do better than the last iteration.

Put simply the person who wrote the teapot demo likely had one or more of the following issues:

less time

less resources

less reward incentive

less internal and external competition

lesser goals

less talent

The last may sound harsh2 but clearly there are some who are better than others, bell curves sometimes have extreme ends and they tend to be attracted to the corresponding extreme ends of what is done with that skill.

The lesser goals one is actually likely to be the main reason. The target of the teapot demo was just that, a demo. But not a demo of the programmers skill3. It would be a demo of one small facet of a (big) OS, in this case DX rendering.

To those viewing the demo it wouldn't mater it it used way more CPU than required so long as it looked good enough. There would be no incentive to eliminate waste when there would be no beneficiary. In comparison a game would love to have spare cycles for better AI, better sound, more polygons, more effects.

in that case smooth scrolling on PC hardware

Likely more than me so we're clear about that

strictly speaking it would have been a demo to his/her manager too, but again the drive here would be time and/or visual quality.

@stacker: are you implying that all the computation that take place in top-notch 3D games that are not done by the GPU are actually mono-threaded and would, by some chance, fill 100% of the CPU? Meaning that the game perfs would be bound to one non-GPU core? I find that very hard to believe.
–
SyntaxT3rr0rFeb 7 '10 at 18:36

4

It doesn't imply the program is mono-threaded - it just implies that at least one thread is going as fast as it possibly can. Which is reasonable, because why would you want it to go any slower? On the other hand, many games are almost entirely mono-threaded. It's very difficult to write complex simulations in an effective way when multithreading because the typical situation in concurrent/distributed systems of accepting a little more latency to buy a lot more throughput is no good for a game that is supposed to be responsive.
–
KylotanFeb 8 '10 at 10:24

Sometimes a scene may have more going on than it appears. For example, a rotating teapot with thousands of vertices, environment mapping, bump mapping, and other complex pixel shaders all being rendered simultaneously amounts to a whole lot of processing. A lot of times these teapot demos are simply meant to show off some sort of special effect. They also may not always make the best use of the GPU when absolute performance isn't the goal.

In a game you may see similar effects but they're usually done in a compromised fashion in effort to maximize the frame rate. These optimizations extend to everything you see in the game. The issue becomes, "How can we create the most spectacular and realistic scene with the least amount of processing power?" It's what makes game programmers some of the best optimizers around.

By all the qualified and good answers given, the one that matter is still missing: The CPU utilization counter of Windows is not very reliable. I guess that this simple teapot demo just calls the rendering function in it's idle loop, blocking at the buffer swap.

Now the Windows CPU utilization counter just looks at how much CPU time is spent within each process, but not how this CPU time is used. Try adding a

I had a DX teapot demo which always used 25% of my CPU. Turns out it was because I was on a quad core and, to lock the game loop at 60 fps, I had an "Are we there yet" loop which constantly checked the time. I changed it to sleep(timeToNextFrame) and the cpu usage dropped to near 0.
–
Jonathan PierceFeb 8 '13 at 16:20

In addition, there are many many tricks from an artistic standpoint to save computational power. In many games, especially older ones, shadows are precalculated and "baked" right into the textures of the map. Many times, the artists tried to use planes (two triangles) to represent things like trees and special effects when it would look mostly the same. Fog in games is an easy way to avoid rendering far-off objects, and often, games would have multiple resolutions of every object for far, mid, and near views.

The core of any answer should be this -- The transformations that 3D engines perform are mostly specified in additions and multiplications (linear algebra) (no branches or jumps), the operations of a drawing a single frame is often specified in a way that multiple such add-mul's jobs can be done in parallel. GPU cores are very good add add-mul's, and they have dozens or hundreds of add-mull cores.

The CPU is left with doing simple stuff -- like AI and other game logic.

How can a great big PC game like GTA IV use 50% of my CPU and run at 60fps while a DX demo of a rotating Teapot @ 60fps uses a whopping 30% ?

While GTA is quite likely to be more efficient than DX demo, measuring CPU efficiency this way is essentially broken. Efficiency could be defined e.g. by how much work you do per given time. A simple counterexample: spawn one thread per a logical CPU and let a simple infinite loop run on it. You will get CPU usage of 100 %, but it is not efficient, as no useful work is done.

This also leads to an answer: how can a game be efficient? When programming "great big games", a huge effort is dedicated to optimize the game in all aspects (which nowadays usually also includes multi-core optimizations). As for the DX demo, its point is not running fast, but rather demonstrating concepts.

Look at the answer on vsync; that is why they are running at same frame rate.

Secondly, CPU is miss leading in a game. A simplified explanation is that the main game loop is just an infinite loop:

while(1) {
update();
render();
}

Even if your game (or in this case, teapot) isn't doing much you are still eating up CPU in your loop.

The 50% cpu in GTA is "more productive" then the 30% in the demo, since more than likely it's not doing much at all; but the GTA is updating tons of details. Even adding a "Sleep (10)" to the demo will probably drop it's CPU by a ton.

Lastly look at GPU usage. The demo is probably taking <1% on a modern video card while the GTA will probably be taking majority during game play.

From what I know of the Unreal series some conventions are broken like encapsulation. Code is compiled to bytecode or directly into machine code depending on the game. Also, objects are rendered and packaged under the form of a meshes and things such as textures, lighting and shadows are precalculated whereas as a pure 3d animation requires this to this real time. When the game is actually running there are also some optimizations such as only rendering only the visible parts of an object and displaying texture detail only when close up. Finally, it's probable that video games are designed to get the best out of a platform at a given time (ex: Intelx86 MMX/SSE, DirectX, ...).

I think there is an important part of the answer missing here. Most of the answers tell you to "Know your data". The fact is that you must, in the same way and with the same degree of importance, also know your:

CPU (clock and caches)

Memory (frequency and latency)

Hard drive (in term of speed and seek times)

GPU (#cores, clock and its Memory/Caches)

Interfaces: Sata controllers, PCI revisions, etc.

BUT, on top of that, with the current modern computers, you would never be able to player a real 1080p video at >>30ftp (a single 1080p image in 64bits would take 15 000 Ko/14.9 MB). The reason for that is because of the sampling/precision. A video game would never use a double precision (64bits) for pixels, images, data, etc..., but rather use a lower custom precision (~4-8 bits) and sometimes less precision rescaled with interpolation techniques to allow reasonable computation time.

There are other techniques as well such as Clipping the data (both with OpenGL standard and software implementation), Data compression, etc. Keep also in mind, that current GPUs can be >300 times faster than the current CPUs in term of hardware capability. However, a good programmer may get a 10-20x factor, unless your problem is fully optimized and completely parallelizable (particularly task parallelizable).

By experience, I can tell you that optimization is like an exponential curve. To reach optimal performance, the time required may be incredibly important.

So to get back to the teapot, you should see how the geometry is represented, sampled and with what precision Vs see in GTA 5, in term of geometry/textures and most important, the details (precision, sampling, etc.)