A while ago I started looking into DirectX Raytracing to generate shadows for area lights of different shapes. Since I didn’t have access to a graphics card with native raytracing support, I used the fallback layer that emulates raytracing with compute shaders on devices without hardware support for raytracing. While visually I got pretty good results, the performance was not quite as good to be feasible for large-scale games. Therefore I was thinking of a fallback solution that is fast enough, yet produces results that are closer to the ground truth as a conventional PCSS (Percentage Closer Soft Shadows) approach.

In my research I stumbled over the paper “Real-Time, All-Frequency Shadows in Dynamic Scenes” (by T. Annen et al). Basically first an extended Convolution Shadow Map is generated, using either a mip-map chain or a summed area table. With the help of this structure it is possible to determine an average occluder depth. Then a frustum, formed by the point to shade and the area of the light source, is intersected with a virtual plane at the determined average occluder depth in order to lookup the amount of shadowing via the extended Convolution Shadow Map. So I was thinking to replace the Convolution Shadow Map part with a stochastic sampling approach followed by a denoising step, similar to how real-time ray-traced shadows are usually done. In this way it is possible to get rid of many disadvantages of Convolution Shadow Mapping (e.g. trigonometric base functions, high texture memory requirement) and to support arbitrarily shaped and oriented area lights, not only rectangular lights that are aligned with the viewport of the shadow map.

The basic steps of the idea are:

Generate shadow map:
In my demo I used a regular spot light shadow map, but it would be also possible to use omni-directional shadow maps.

Generate shadow mask:
In this step first the average occluder depth is determined in the spirit of PCSS. We sample the shadow map per pixel with 4 randomly distributed samples over the area light shape that change per screen pixel location and over time and 1 additional sample that is located in the center of the area light. It turned out that this gives almost the same visual results as using 256 randomly distributed samples per pixel. The resulting average occluder depth is used to form a virtual occluder plane, with which 4 rays from the pixel to shade to the randomly distributed light samples are intersected. At each intersection a shadow map comparison is performed and the average shadowing value is written into the shadow mask.

Denoise shadow mask:
To eliminate noise that stems from changing the light samples per screen pixel location and over time, first 2 bilateral blur passes are performed followed by a temporal pass, similar to the one used for temporal anti-aliasing.

A reference raytracing-based approach generates the shadow mask by tracing 1 ray per pixel to the same stochastic light samples and applies the same denoising step as described above. In the below comparison screenshots from my demo, the images on the left were generated via the described shadow map-based approach and the images on the right via the reference raytracing-based approach. The top row shows a 50x50cm rectangular area light, the middle row a 100x10cm rectangular area light with 45 degree rotation, and the bottom row a 20x100x50cm cuboid area light. Please note that the demo uses the Crytek Sponza scene and the dwarf model from the Microsoft DirectX SDK as dynamic mesh.

From the screenshots it is visible that the shape and orientation of the area light source has a significant impact on the shape of the shadows. Obviously the shadow map-based approach doesn't produce the exact same results as the raytracing-based approach, but manages to maintain the overall shape of the shadows. For 2D light shapes like rectangles or ellipses one could try to manually skew the PCF (Percentage Closer Filtering) sample kernel used for conventional PCSS, but this becomes impractical for 3D light shapes like ellipsoids or cuboids.

For the lighting part I used LTC (Lineraly Transformed Cosines) as presented by Stephen Hill and Eric Heitz. Since my demo supports rectangle, ellipse, ellipsoid and cuboid area light sources, I added a LTC implementation for cuboid lights. This is a pretty straight forward extension of rectangular area lights that integrate over the edges of a rectangle according to the Stoke’s theorem. So all we need to do is to determine the silhouette edges of the box, viewed from the point to shade, ensure a consistent counterclock-wise edge winding and integrate over these edges. A full-screen deferred lighting pass on a Nvidia Geforce GTX 1070 at 1080p takes ca. 0.89 ms for a cuboid LTC light, which is slower than using a rectangular LTC light at ca. 0.32 ms, but is still feasible for real-time applications. In the screenshot below I removed normal-mapping and set the material roughness to a low value to better illustrate the box-shaped specular highlight of the cuboid LTC light.

Finally I combined the stochastically generated shadow mask with the analytically computed lighting according to “Combining Analytic Direct Illumination and Stochastic Shadows” (by E. Heitz, S. Hill and M. McGuire). Below the left screenshot used the shadow map-based approximation, while the right screenshot the raytracing-based approach.

Performance-wise I got following results for generating the denoised shadow mask for a 50x50 cm rectangular area light source on a Nvidia Geforce GTX 1070 at 1080p: