The big problem here is the "shadow volumes" - that's geometry that may be as complex as the scene, but isn't the scene, and changes as the sun moves. We can get clever and try to write a geometry shader, but in the old day you had to build your shadow volumes. Ouch.

Another problem with stenciled shadow volumes is that "fake" geometry shaped by transparency is ignored.

Shadow Maps

Shadow maps bake distance calculations into a texture, from the light source's point of view. But we have two ways to code this, both pretty painful:

1. For each light source2. Draw the scene (depth only) from the light's 3.perspective3. For the scene4. Draw the scene with a shader that knows about the shadow maps

Of course while that looks nice and fast, that shader features one shadow calculation per light source.

An alternative is to build the scene in passes:

1. For each light source2. Draw the scene (depth only) from the light's perspective3. Draw the scene with a shader to accumulate that light's contribution

The first option uses a huge shader and a lot of VRAM; the second one hits the scene graph a lot. The second case might be a win if we can use our scene graph culling code to draw only the part of the scene that could be affected by the light (assuming the culling works that well).

It should be noted that for one directional light (e.g. the sun) the shadow map case is great: we need one texture, one extra pass, and we automatically get shadows on everything. Transparent geometry is not shadowed, and we don't have to ask "how do I make a stencil volume" for everything that might show up in the scene graph. Render time for the shadow pass can be very fast because we can skip texturing and color write-out.

The Problem With Shadow Maps

The problem with shadow maps is shadow resolution. You lose resolution in 3 dimensions: the width/height of the shadow map (which shows up as big blocky pixels in the shadow) and "depth" - if the precision of depth values isn't good enough, "close calls" between the occluder and occludee are not resolved correctly due to rounding errors.

Depth precision is less of a factor now that modern hardware supports formats like R32F (a single 32-bit floating point channel) but horizontal resolution is limited by texture size; even a 2048x2048 texture doesn't provide much detail over more than about 500 meters of "world". For a flight simulator, that's not enough.

To make matters worse, the shadow texture's position is based on the light source, not the viewer's camera. This means that often the most detailed part of the shadow map is somewhere useless (like far away), while the ugly crude part is right in front of the user. There are a number of algorithms, like trapezoidal shadow maps (TSM) that try to reshape the shadow map to use the pixels better, but if your scene graph is huge, this isn't an improvement of the right magnitude.

Splitting the Lights

One way to solve the resolution problem is to break the light source into multiple "lights", each shadowing a different part of the scene. (Think of a directional light as really bounded by a cube, because this is the area the shadow map casts a shadow for. We are making many small cubes that cover the volume of the original light cube, which had to be big enough to cover the whole scene.)

This approach gives us a potentially arbitrary improvement in resolution, but it puts us on the "multiple light" scenario - two passes over (at least part) of the scene graph per light. If you have efficient culling there might be a win, but shadows are cast, so a small light volume can end up shadowing a large area at some light source angles.

Stenciled Shadow Maps With GBuffers

I'm not sure if anyone has done this before - for all I know it is a known production technique and I just haven't found the paper. The technique I describe here uses a G-Buffer to save on iterations over the scene graph, and the stencil buffer to save on shadow map memory. If you already have a G-Buffer for deferred shading, this works well.

What Is a G-Buffer

A G-Buffer is a series of frame buffers which encode everything you could want to know about a given pixel. Typically for a given pixel X, at a minimum you would know the XYZ position in eye space, the normal vector, and the unlit ("raw") color of the source point on the source geometry that filled pixel X.

In other words, a G-Buffer is sort of a spatial index, giving you constant time access to your scene graph, given screen-coordinate X & Y as an input. With deferred shading, you can now create light effects - for a given screen-space pixel, you know what a given light would have done to that pixel, because you still have its normal, position, original color, etc.

G-Buffering is a win for lighting because we can radically reduce the number of pixels for which per-pixel lighting is applied.

Because we will apply lights later in screen space (by reading the G-Buffer to get the normals, positions, etc. of the pixls involved) there is never any overdraw. If your scene was created by drawing the screen 4x over, the G-Buffer contains the final result, only the pixels that showed up. So if you have 4x overdraw, G-Buffering represents a 4x reduction in lighting calculations.

If a light has limited scope (e.g. it is directional or attenuated), you only have to apply the lighting calculation to the small set of pixels that it could affect. With traditional lights, every pixel in the scene must consider every light (or you must turn the lights off and on very carefully). With G-Buffering, the "second-pass" of accumulating lighting can be done for small screen areas by drawing a trivial bounding volume for the light. This means that small lights are much, much cheaper in terms of pixel fill rate.

Be warned that G-Buffering also has three weaknesses that you have to eat just to get started:

To draw the scene itself, you have to fill a huge amount of data - perhaps 12 to 16 floating point values per pixel. That's a much higher fill rate to the framebuffer than you'd have with conventional texturing (or even HDR). So you'll pay a price just to have a G-Buffer. You only win if you save enough lighting calculations later to justify the G-Buffer now.

FSAA is not an option; the G-Buffer needs exact values from precisely one scene graph element for each pixel. So you may need to render at a higher res (ouch) or apply post-processing to smooth out geometry. (Post processing is recommended in GPU Gems 2.)

Alpha-blending works, um, badly. Basically you only get the final values of the front-most pixel. So you will have to compromise and accept a hack...for example, the front-most pixel's lighting calculations applies to the blend of the texture colors of all contributing pixels. If this hack ruins your content, your only alternative is to fully render the scene before blending, then do a second pass - at that point the G-Buffer is probably not a win.

G-Buffering and Shadow Maps

G-Buffering works well with shadow maps; you can "project" a shadow map onto a scene using only the contents of the G-Buffer, not the original scene itself. (To project the shadow you really only need object position, which is in the G-Buffer.)

But what about shadow map resolution? We can use the stencil buffer to accumulate the shape of the shadow cast by each shadowing element in the scene graph, and then bake the light. The full rendering loop would look like this:

1. Draw the entire scene to the G-Buffer2. For each light source3. Clear the stencil buffer4. For each shadow-casting object (for this source)5. Draw shadow map for this object6. Project shadow map into stencil buffer using G-Buffer7. Accumulate contribution of light, using stencil buffer

Note that for the sun-light case, our time complexity is relatively similar to the original shadow-map algorithm: step 5 at the worst case involves drawing a shadow map for the entire scene. (I'm waiving my hands - we'll be filling a lot more pixels, because we've got a full-sized shadow map for individual elements. It's the equivalent of being allowed to have a truly enormous shadow map.)

But what's interesting about this is that the cost of the steps aren't necessarily as high as you'd think in terms of fill rate.

Step 5 (drawing the shadow map for one object) can use as many or as few pixels as we want -- we can choose our shadow map size based on the position of the object in eye space! That is, we can finally spend our shadow fill rate budget on things that are close to us. Since each object gets its own shadow map, our max resolution can be relatively high.

Step 6 requires us to draw some kind of "shadow casting volume" onto the screen - that is, to cover the pixels that could be shadowed by a given occluder. This has the potential to be a lot less than the entire screen, if we can draw relatively sane volumes. (But the only penalty for a cruder, larger volume is a few more shader ops.)

Step 7 requires us to burn in the light, which (as with all G-Buffering) only requires touching the pixels that the light could illuminate; so we get a win for lights with limited range or tight focuses.

This algorithm has some nice scalability properties too:

Because we know where in the final scene's eye space we are working, we can tune quality on a per-shadow basis to spend our computation budget where we need it. (E.g. turn off PCF for far-away shadows, use smaller maps, don't shadow at all.)

We can even make content-level decisions about shadows (e.g. shadow only buildings).

The algorithm trades quality for time. Traditional shadow maps are limited by maximum texture size, which is a painful way to scale. This lets us simply trade a faster CPU/GPU combo for better results, with no maximum limit.

Future Parallelization

If we ever get GPUs that can dispatch multiple command streams, the inner loop (4,5,6) could be parallelized. Essentially you'd have a few shadow map textures, and you'd pipeline steps 5 on multiple cores, feeding them to step 6. So a few GPUs would be prepping shadow maps, while one would be baking them into the stencil map. This would let us scale up the number of shadow-casting elements.