Path Tracing: Boosting performance with ray coherence (Part 2)

Olivier Therrien, programer and researcher at CDRIN
September 05, 2018

In the last post we saw how it was possible to shoot rays so that they remain coherent across pixels, but still random over sample id so that they eventually converge to the same result. It was a nice experiment but it’s not perfect because in a real-time application, the temporal changes in the structural noise will be visible to the user. Temporal Anti-Aliasing might help a bit, but the bias for the low sample count we can actually afford in games will still be very high, because all pixels will sample a similar direction.

In the OptiX path tracing demo, modified for only 1 direct path + 1 bounce + light rays, this is what I get (GTX 980, 512×512):

With full coherence:

So even on a small scene, coherent rays are quite a bit faster. What if there was a way to have the performance benefits of coherent rays, but still keep a diverse ray distribution within a single frame?

When doing path tracing, a 2D random number is generated on a unit square, and then mapped onto hemispherical space.

What if we used a few coherent random numbers, and interlaced them over a few pixels? For example, a 2×2 sampling square pattern would ensure that at least all major 4 directions are present in this 2×2 pixel area.

This would then be much easier for a denoiser to get a coherent representative image with a low sample count.

So I tried it, and this is what it looks like (4×4 pattern, tested with both zero and full coherence):

The performances are pretty much the same as the fully incoherent rays we had initially, both for coherent and incoherent. So even if interlacing coherent rays can be easier to denoise, it’s useless because we lose all performance advantages it had over the incoherent solution…

But it would probably be friendlier for the GPU to have coherent information grouped spatially like clusters. And since ray direction usually has more influence than it’s starting position for predicting the final hit area, why not take each of the 4 directions and group (bin) them side by side on the screen? The starting ray positions will now be farther from their neighbour positions, but directions will be more correlated.

This would ensure that every rays that are going the same way would be grouped together on the screen, and can potentially be assigned to the same threads, and can potentially share more data with their neighbours. It can still be merged back into the right position later as a post-process.

Or at least that’s what I hope.

Let’s try it! (4×4 pattern, full coherence):

Wow! This actually works! We are sampling 16 directions wihtin a 4×4 pixel region, and are able to keep most of the performances that we get when sampling only 1 direction for the whole screen! This means that a denoiser could blend those regions together and get the equivalent of 16 samples.

Once this is done the pixels of the rendered image will need to be put back to their original place, this can be done with a shader post-process using point filtering.

It’s also possible to use completely random numbers, but limited in the range covered by the direction bin in which it operates, like the following:

The way to visualize this intuitively is that each tile will be assigned a region over the global random distribution, and will as a whole cover it entirely. To make sure that a given generated random number is contained in the correct tile, it must be offset and scaled according to the size of the bin on the screen:

This is the equivelent of pure random sampling, but spatially grouped, so it should perform a bit better. Let’s try it! (4×4 pattern, 0 coherence):

Pretty nice to see that even fully incoherent rays perform better when grouped! The gains are not as good as full coherence, but it’s still a welcome improvement, and it doesn’t require altering the sampling pattern, which might be a problem for some projects.

One question that remains is: What is the optimal number of bins, and how much does this actually improve performances?

At that point the important metric is not only the FPS, but also the ability for the denoiser to take care of the interlaced pattern so it’s less apparent. So a 2×2 or 4×4 pattern with fully coherent rays will probably be the best for games. Also it will be important to scramble the pattern over time so that TAA can make the most of it. If there is still patchy patterns visible, it’s possible to add a bit of randomness into the rays as seen in the previous posts, but this must be done with care to avoid introducing too much bias into the image. Some applications will want to keep a fully stochastic sampling, and tiling their rays like that will probably help get a bit more performance.

It would be very interesting to see the results on a larger scene with more complex materials, and isolate the performances of indirect rays only. It would also be nice to test it in a game engine with TAA and other denoising techniques to see what kind of results are possible to get. Also, if this works for ray tracing, it probably works for screen space effects that do frame buffer ray marching like SSAO! I would be very curious to see the numbers on that. Maybe for a later post!