4 Answers
4

Scene 2 is cheaper. That's why bump mapping, normal mapping, parallax mapping, relief mapping etc where invented in the first place. You would need huge amount of vertices in order to reach the same amount of detail. Note however that with the advent of hardware tessellation, that too has become possible recently, though you still don't have the hugely detailed base mesh but generate that procedurally, e.g. through textures.

@Anon why accept this? it'd be nice if somebody would explain this from a technical point of view instead of saying "if rendering large amounts of verteices was efficient then we wouldn't have normal mapping", because you've asked a good question and this answer is hardly more than stating the obvious
–
dretaSep 23 '12 at 14:57

This answer is generally true but specifically wrong. You can basically make the shaders and/or the vertex count as 'heavy' as you want it to be. The only limit is CPU
–
zehelvionSep 24 '12 at 19:41

This works because texture mapping is a form of both precalculation and compression. The idea is that rather than having to store and continually recalculate all of the info needed to derive the end result per-vertex at a very fine level of tessellation, you instead encode it into a texture map (and at the most basic level, if you think about it, that's all that a texture map actually is) and do a fast lookup.

I think it strongly depends on the scene rendered and shaders used. It's usualy better to use low-poly geometry and add as much detail as possible via textures and pixel shader operations. But you have to have two things on your mind:

GPU have to process all the source geometry (at least to the point where the position in view-space is determined, then some culling optimizations can be done and geometry that won't be visible is omitted from further rendering) - if you do not generate geometry on GPU (using geomery shaders or tessellation shaders) this optimizations can be very significant, but if the source geometry is huge, this could be the bottleneck.

Per-pixel operations depend on the render target resolution, if we assume each pixel to be written only once. BUT this is usualy not true, due to pixel rewrites caused by rendering geometry, that is closer to camera, than scene rendered sofar (depth-test). This per-pixel performace is often called pixel-fill rate and could be bottleneck as well.

For example:
Particle engines are usualy limited on the vertex count side and GPU raytracing apps are limited by pixel shader operations.

GPUs incur a large amount of overhead when triangles are too small on-screen. There are two main reasons:

There are fixed costs per triangle, associated with setting it up to be rasterized and pixel-shaded. For example, each triangle must be clipped to the viewport, edge equations for rasterization must be calculated, and plane equations for attribute interpolation must be calculated. It is possible for this to become a bottleneck if there are very many triangles.

GPUs shade many pixels in parallel. Current GPUs work on groups of 32-64 pixels, and most likely they must all come from the same triangle - i.e. pixels from different triangles can't currently be merged together into one work-group. Therefore, triangles smaller than this waste some hardware resources that could otherwise be doing useful work.

See this article for some more details (the rest of that series also a must-read if you wish to understand how GPUs work).

The takeaway is that very highly tessellated meshes, with each triangle covering only a few pixels, are likely to be very slow to draw and do not use the GPU's hardware efficiently. Therefore, assuming that you want to render an image that contains pixel-level detail, the better way is to use a texture.

Texture samples are a lot of work too. However, there are some compensations:

Mipmapping and the texture cache help you a lot by amortizing the slowest part: fetching the texel data from memory and decompressing it. Usually, nearby pixels will sample nearby parts of the texture, so your cache hit ratio is going to be high.

The texture sampling hardware is designed to have many, many requests in flight at once. Texture samples can still be a bottleneck, but it's harder to bottleneck this than the triangle setup unit.

When all of the pixels in a GPU work-group are waiting for texture samples, the GPU can suspend that work-group (like a thread sleeping) and do something else with that hardware until the texture samples come back.