Clustered Shading is a technique for efficient lighting on modern GPUs, first proposed by Olsson et. al., with which I have been evangelizing this technique for the last few years. My contribution is an adaptation for practical use in a real AAA game engine. While I have provided slides and held several conference presentations on the subject, the game using this technique has as of this writing not yet been released, and I haven't published any running samples of it before. This demo attempts to fill that gap.

The main motivation for Clustered Shading is performance, flexibility, and simplicity. It normally out-performs competing techniques, such as tiled shading, and in particular the worst-case performance, which is what matters most. It's compatible with both deferred and forward shading, making it a unified lighting solution that doesn't interfere with any other technical choices.

This demo differs in a few ways from the implementation we are using in the Avalanche Engine, but the underlying principle remains the same. In this demo, forward shading is used, and because of that, MSAA just works. It's also using a very simple world-space clustering scheme, instead of the more typical view-space. Since the number of lights is fairly limited in this demo, I'm exploring encoding lights as bits in a bitfield. This may very well be good enough for AAA titles too, but won't scale forever. At some point a list of indices is going to be more compact and faster in practice. But for a lower workload, it makes one less indirection in the inner loop and keeps the memory fixed.

As a reference I have pretty much copied my Deferred Shading 2 demo from 2008, and made a few minor improvements to it. At the time I was fairly proud of it, but it doesn't really represent the state-of-the-art of Deferred Shading anymore. In particular it's a Classic Deferred implementation, which gets very bandwidth bound on modern hardware. With no MSAA I see Clustered Shading outperforming it by a factor of 2x, and with MSAA enabled the gap only widens, up to a 5x factor at 8xMSAA.