We've gotten to the point now where it's possible to make a real-time renderer with 1000 dynamic lights, but the problem is that we can't really generate 1000 real-time shadow maps yet.

Most games usually only have a handful of dynamic shadow-casting lights, and a large number of small point lights w/o shadows, or a large number of static lights with baked shadows.

What if for all these lights where we can't afford to generate shadows for them, we spun the problem around backwards --- instead of calculating the visibility from the perspective of each light, what if we calculate the approximate visibility from each surface?

That's crazy talk, Hodgman! There's millions of surfaces (pixels) that need to be shaded, and only thousands of lights, so it should be more expensive... at least until we've also got millions of dynamic lights...

However, the results don't have to be perfect -- approximate blurry shadows are better than no shadows for all these extra lights.

And if it's possible, it's a fixed cost; you calculate this per-pixel visibility once, and then use it to get approximate shadows for any number of lights.

There's only a few techniques that come to mind when thinking along these lines:

DSSDO -- an SSAO type effect, but you store occlusion per direction in an SH basis per pixel. When shading, you can retrieve an approximate occlusion value in the direction of each light, instead of an average occlusion value as with SSAO.

Screen-space shadow tracing -- not sure what people call this one. Similar to SSAO, but you check occlusion in a straight line (in the direction of your light source) instead of checking occlusion in a hemisphere. I've used it on PS3/360, and IIRC it was used in Crysis 1 too.

The problem with #2 is that it's still per-light -- for each light, you'd have to trace an individual ray, and save out thousands of these occlusion results...

The problem with #1 is that it's just an occlusion value, disregarding distance -- you might find an occluder that's 2m away, but have a light that's only 1m away, which will still cause the light to be occluded. This means it can only be used for very fine details (smaller than the distance from the light to the object).

To make technique #1 more versatile with ranges, what if instead of storing occlusion percentage values, what if we stored depth values, like a typical depth map / shadow map? You could still store it in SH, as long as you use a shadowing algorithm like VSM that tolerates blurred depth maps (in this case you would have one set of SH values to store the z, and another set to store z2 for the VSM algorithm).

You could then generate this data per-pixel using a combination of techniques -- You could bake these "depth hemispheres" per texel for static objects, bake out "depths probes" for mid-range and do screen-space ray-tracing for very fine details, and then merge the results from each together.

Then when lighting a pixel, you could read it's z and z2 values for the specific lighting direction and apply the VSM algorithm to approximately shadow the light.

I haven't tried implementing this yet, it's just day-dreaming, but can anyone point out any obvious flaws in the idea?

To make technique #2 work for more than one light, what if we only use it to shadow specular reflections, not diffuse light. We can make the assumption that any light-source that contributes a visible specular reflection, must be located somewhere in a cone that's centred around the reflection-vector, and who's angle is defined by the surface roughness.

Yeah, this assumption isn't actually true for any microfacet specular (it is true for Phong), but it's close to being true a lot of the time

So, if we trace a ray down the R vector, and also trace some more rays that are randomly placed in this cone, find the distance to the occluder on each ray (or use 0 if no occluder is found), and then average all these distance, we've got a filtered z map. If we square the distances and average them we've got a filtered z2 map, and we can do VSM.

When shading any light, we can use these per-pixel values to shadow just the specular component of the light.

I have tried implementing this one over the past two days. I hadn't implemented decent SSR/SSRT/RTLR before, so I did that first, with 8 coarse steps at 16 pixels, then 16 fine steps one pixel at a time to find the intersection point. When using this technique to generate depth maps instead of reflection colours, I found that I could completely remove the "fine" tracing part (i.e. use a large step distance) with minimal artefacts -- this is because the artefact is basically just a depth bias, where occluders are pushed slightly towards the light.

At the moment, tracing 5 coarse rays in this cone costs 3.5ms at 2048x1152 on my GeForce GTX 460.

In this Gif, there's a green line-light in the background, reflecting off the road. Starting with a tiny cone centred around the reflection vector, the cone grows to an angle of 90º:

Instead of animating the cone width for testing purposes, my next step is to read the roughness value out of the G-buffer and use that to determine the cone width.

This effect will work best for extremely smooth surfaces, where the cone is small, so that the results are the most accurate. For rough surfaces, you're using the average depth found in a very wide cone, which is a big approximation, but the shadows fade out in this case and it still seems to give a nice "contact" hint.

But any screenspace technique is just asking to be unstable, wide search areas for SSAO just end looking like a weird kind of unsharp mask as it is. Not too mention you'd just get light bleeding everywhere since you've only screenspace to work off of.

I mean, it's a neat idea for some sort of "better than SSAO" or directional SSAO kind of idea. But I'm afraid I'd be skeptical of doing any more in screen space that's not inherently screenspace than is already done. Even SSR looks weird and kind of wonky in practice, EG Crysis 3 and Killzone Shadowfall.

The problem is that they're designed for RSM/etc, where you have a lot of little lights with their own little ISM, which all add up together to approximate a large-area bounce light. Because each of these "bounce lights" is made up of, say, 1000 VPLs, each with their own ISM, the amount of error present in each individual ISM isn't noticable.

I haven't tried it, but I have a feeling that if I've got a street with 100 individual streetlamps on it, and I use ISM to efficiently generate shadows for them, then the ISM artefacts will be hugely noticeable... Maybe I could use ISM's for my "backwards" shadowing idea though, where my "visibility probes" are rendered using the ISM technique? That would allow for huge numbers of dynamic probes...

In Killzone, they use SSR to supplement reflection probes -- SSR, local probes and a global sky probe are all merged.

I'm thinking of a similar thing for the "per-pixel depth hemisphere" system -- you'd have a huge number of approximate visibility probes about the place, which give very coarse mid-range blockers -- e.g. if you're inside a room with one window, don't accept light contributions from a spotlight from behind the wall that's opposite that one window. Obviously these probes would have to be quite conservative, with bleeding/tolerance around the same scale as their radius of influence.

You could then supplement this with screen-space data, in the cases where it's possible to get any results.

On SSAO, it really gives me a headache when games do stuff like this (Far Cry 3, why u do this?)... They've got this great lighting and GI system, and they go and shove a bloody outline filter in the middle of it!?

The last SSAO filter I wrote looks like this (just the contact-darkening on the players, the grass is old-school planar projection stuff, and yeah, there's no shadow-mapping/etc), which I personally don't think is as objectionable as the above, (I would call the above a contrast filter) Maybe I'm biased though...

With the screen-space specular masking, at the moment, I'm pretty hopeful I'll be able to use it. It does break down at the edges of the screen, with thin geo, and with complex depth overlaps... but I also do a lot of image based lighting, and there's not a lot of good ways to shadow an IBL probe, and the scene looks a lot better with these shadows fading in, in the areas where the technique works...

How about two level DSSDO. One for small level detail and one for larger. Then picking better value per pixel?[/size]

Yeah that could work -- you could kinda have a hierarchy of occlusion values, or shells of occlusion hemispheres. If you know the depth range of each "shell" where the assumptions are valid (occluders and closer than light source), then you can use just the valid ones... You probably wouldn't need too many of these layers to get half decent results...

Perhaps the biggest block right now is that there is no way to fill this voxel with occlusion data in real-time.

The most efficient way I see would be regular rasterization but where the shader (or the rasterizer) decides on the fly which layer from the 3D texture the pixel should be rendered to, based on its interpolated depth (quantized). However I'm not aware of any API or GPU that has this capability. This would be highly parallel.

Geometry shaders allow selecting which RenderTarget should a triangle be rendered to, but there is no way to select which RenderTarget should a pixel be rendered to (which can be fixed function; not necessarily shaders)

I tend to agree. Not to mention the fact that the information you need might not even be there when working with screenspace data. Personally I feel like experimenting with voxels and other optimized data formats for occluders would be a much more promising avenue to explore, but I'm okay with being proven wrong.

Perhaps the biggest block right now is that there is no way to fill this voxel with occlusion data in real-time.

The most efficient way I see would be regular rasterization but where the shader (or the rasterizer) decides on the fly which layer from the 3D texture the pixel should be rendered to, based on its interpolated depth (quantized). However I'm not aware of any API or GPU that has this capability. This would be highly parallel.

Geometry shaders allow selecting which RenderTarget should a triangle be rendered to, but there is no way to select which RenderTarget should a pixel be rendered to (which can be fixed function; not necessarily shaders)

I like this line of thinking.

You probably don't need 1024 vertical voxels, so it's possible to spend more on horizontal ones, or to store, say a 8-bit distance instead of a 1 bit solid/empty flag.

You could keep two separate voxel structures, the static one, then a dynamic one that is based on slices. You could sample both of them when required ( when a surface is near a light and a dynamic object ).

You could also do some tricks so that highly dense but noisy things like leaves could be faked and not traced directly, just with an appropriate noise function, say.

I've thought about the voxel approaches. Even with an octree they are extremely complicated, especially if you want self shadowing. Lot of ideas to speed it up like varying quality based on distance to the camera, but even then you end up having to voxelize geometry for occluders or find other methods to mark which voxels are shaded and which aren't.

That said I think a really fast theoretical (as in I made this up a while ago) approach would be to use RTW in a single pass with a low resolution texture (like 64x64). You find all the objects within the radius of your light source then generate a frustum at your point light. Point the frustum at (1, 0, 0) and cut the world into 8 quadrants. Now for all the objects in front of the near plane of the frustum do nothing. For the 4 quadrants behind the near plane assign them to a quadrant. If they overlap two (or 4) quadrants duplicate the objects into both all quadrants. In 2D:

Now render all the geometry in each quadrant passing into a shader the center of the point of the light and transforming the vertices into world space for each quadrant. Then normalize the vertices in the angle 0 to 180 into 0 to 45 degrees so they're all inside of the frustum. If your triangles are small enough there should be no artifacts really. Here's a 2D example of what I mean by artifacts. The red line represents our geometry and we normalize the angle so it's squished into the frustum. This distorts the line (if we look at all the points along it) into the blue line. If we only look at the vertices we see the magenta line. You then render a depth map using RTW. If you're good with math you can probably create a fragment shader that correctly interpolates the vertices and calculates the correct depth (removing the artifact). What you'd end up with would be a RTW'ed low resolution spherical map. When you sample to see if a shadow exists you'd need to perform a look-up on the texture for each light source.

You'd only need a texture for lights that collide with geometry and you can choose a texture size based on the distance to the camera. (RTW will correctly warp to give a higher resolution closer to the camera also). I hope that makes sense. I made it up mostly on paper a few months ago and haven't been able to tell anyone to see if it's viable.

I've been meaning to come back to this, but have been working full time on stuff that pays the bills

Here's some gifs that I actually produced months ago. Most of the lighting in the scene is from a cube-map, with a few (green) dynamic lights in there too. There's no shadows, so the "SSSVRT" adds all of the shadowing seen in the odd frames of the gifs:

Seems to work really well on shiny env-map-lit (IBL) objects to 'ground' them.

Re voxels - that's a challenge for another day

I imagine you could use both. Screen-space stuff like this is great for capturing the really fine details (which would require an insane amount of memory in a voxel system), so you could combine this with a voxel method for more coarse-detail / longer-distance rays, and/or image probes for really long-range rays.

We are using screenspace shadow tracing. Its quite cheap. I use 24 x gather4 samples from quarter resolution 16bit depth buffer. Then I just count intersecting samples and use expontential shadow term. pow(shadowTerm, intersectedSamples)

Is all the shadowing done in screen-space, or are there traditional techniques used as well?

Typical cascades shadow map are maybe showing from moon light but I can't be sure because those point lights are so much brighter than anything else. There is also temporallly smoothed SAO variation with multi bounce lighting that contributes fully shadowed areas quite well. https://www.dropbox.com/s/x7tvd8bags5x3pj/GI.png

I thinks it's best to combine screen space shadow tracing with tiled shading . For directional lights surely we need shadow maps but usually point and spot lights have small ranges. And we can assume most of the shadow casters that are going to affect the final result are in G-Buffer .

In this way we can create raytace jobs for each light and each pixel . Let's consider pixel at 0,0 . We know from tiled shading that 16 lights may light this pixel. we create 16 trace jobs ( the direction and start point) , And now we can dispatch a thread for any trace job in compute shader and write the results in a buffer and when sahding using this data.

Take a look at AMD Leo Demo . I think they used somehow similar approach.

I also believe a combination is best. In games, usually there are many static light sources, which can be treated efficiently by CSM that are updated only very n'th frame, so wont consume much time. The animated objects like players and enemies are usually small and completely visible in the scene. There, screen space shadows could be efficient, rather than having another render pass.

Perhaps the biggest block right now is that there is no way to fill this voxel with occlusion data in real-time.

The most efficient way I see would be regular rasterization but where the shader (or the rasterizer) decides on the fly which layer from the 3D texture the pixel should be rendered to, based on its interpolated depth (quantized). However I'm not aware of any API or GPU that has this capability. This would be highly parallel.

Geometry shaders allow selecting which RenderTarget should a triangle be rendered to, but there is no way to select which RenderTarget should a pixel be rendered to (which can be fixed function; not necessarily shaders)

You could use a variant of the KinectFusion algorithm to build a volumetric representation of the scene. The basic idea is to get a depth image (or a depth buffer in the rendering case) and then you find the camera location relative to your volume representation. Then for each pixel of the depth image you trace through the volume, updating each voxel as you go with the distance information you have from the depth image. The volume representation is the signed distance from a surface at each voxel. For the next frame, the volume representation is used to find out where the Kinect moved to and the process is repeated. The distances are updated over a time constant to eliminate the noise from the sensor and to allow for moving objects.

This is a little bit of a heavy algorithm to do in addition to all of the other stuff you do to render a scene, but there are key parts of the algorithm that wouldn't be needed anymore. For example, you don't need to solve for the camera location, but instead you already have it. That signed distance voxel representation could easily be modified and/or used to calculate occlusion. That might be worth investigating further to see if it could be used in realtime...