Deferred shading and light volumes

Hello,
currently I'm trying to optimize my deferred shading by using lightvolumes to process only those fragments that are affected by the lightsource. I've read many tutorials and I'm very confused about how to do that.
First of all I've created a crude sphere around my pointlightsource and rendered it for testing porposes in the first(geometry) pass (see screenshot). But this doesn't do any good since the determination of pixels to lit up actually happens during the second(light) pass? Right now I draw a 2D fullscreen quad in the range [-1,1], supply the VS with the quad vertex coordinates (gl_Position = vec4(li_vtx.xyz, 1) * 2.0 - 1.0) and apply the light fragmentshader. Fine so far. But how do I do that with a 3D sphere around the lightsource during the second light pass? I guess this has to be done somehow in screenspace for every pixel? If I draw the sphere in 3D space in the light pass and apply the FS, I see only the sphere and it seems that the FS tries to draw the textures on the sphere. I simply don't understand how to do exactly this step - using this sphere to determine the pixels the FS has to cover for the lighting pass.

I'm trying to optimize my deferred shading by using lightvolumes to process only those fragments that are affected by the lightsource. I've read many tutorials and I'm very confused about how to do that. ... how do I do that with a 3D sphere around the lightsource during the second light pass?

You can do that, though conventional wisdom is that if you have a lot of light sources (the goal), the overhead of all that per-light light volume rasterization and state changes can hinder lighting performance. Can be cheaper to just use screen-aligned quads ... more on that in a sec. That also opens you up for tile-base deferred shading, where the real perf++ can be had if you have lots of light sources and/or light volume overlap in screen space.

For the light volumes thing, think about it this way. If on top of your G-buffer, you rasterize (with the same MV and PROJ) an opaque solid representing a light volume (where a light volume is the furthest extent of the light source's illumination), then you know that it will occlude (in screen-space XY) every single G-buffer sample that it could possibly illuminate (and potentially alot more if you consider screen-space Z). Right? And it's typically less fill than a full-screen quad, right? Ok, so don't render it opaque. Just use its rasterization to give you samples in the G-buffer that you need to run your lighting shader on for that light source. Congrats! You've just limited your lighting fill.

Now, about the over-coverage in screen-space Z... Think of a light source volume 1) completely in front of an opaque fragment, or 2) completely behind an opaque fragment. With the normal single-sided depth test, you can take care of one of these cases (rendering front faces with LEQUAL depth test, or rendering back faces with GEQUAL depth test, where the depth buffer is the G-buffer's depth buffer). But to take care of both you need some special sauce. One approach is to do both passes, using stencil to AND the first result into the second. That gives you a precise volume solid test in Z, but is even more expensive. Another approach is to use the depth bounds test and test against both a min Z and a max Z at once -- avoids the multipass per light as with the stencil trick, though it gives you only a constant min and max screen Z to trim fragments in Z (in practice this is the min/max Z value of your light source volume solid), and while it's supported on NVidia don't think I've seen confirmation that AMD supports it yet.

As I said, these are all nice, but can be problematic to batch together to avoid a bunch of state change "pipeline bubbles" per light. From what I gather, often what is done nowadays is to take the bounding solid for each light, bin it by a screen-space "tile", and then blast the lights in a single tile all-in-one-go. Saves G-buffer read/write fill when there is potentially a lot of light volume overlap in screen space, and even more importantly allows you to batch a bunch of light sources together with no state changes in between.

first I'd like to thank you for your time explaining to me the details. But I still don't get the whole picture.

For the light volumes thing, think about it this way. If on top of your G-buffer, you rasterize (with the same MV and PROJ) an opaque solid representing a light volume (where a light volume is the furthest extent of the light source's illumination), then you know that it will occlude (in screen-space XY) every single G-buffer sample that it could possibly illuminate (and potentially alot more if you consider screen-space Z). Right?

Let me see if I got this: What you mean exactly with on top of the gbuffer? I guess that I should draw the sphere together with the geometry in the Gbuffer like I already did? Or should I do this in a separate pass together with gbuffer read/light pass? Can yo shade some more detail on that, please? I'm having a hard time wrapping my mind around this

Ok, so don't render it opaque. Just use its rasterization to give you samples in the G-buffer that you need to run your lighting shader on for that light source. Congrats! You've just limited your lighting fill.

Well, if I do render the sphere with the geometry to the gbuffer and make the sphere not opaque(transparent?) than the fragment shader has no way of detecting it after rasteriazion during the light pass, right? So this leaves me with the question when exactly do I have to rasterize the sphere (geoemetry or light pass)? And I guess that the fullscreen 2D-Quad I draw is only for the fragments not being affacted by any lightsource, right?

I'll postpone the screen-Z issue until I got the basics right.

Some thoughts about tiled based clipping: It would probably suit my needs the best since I'll have many overlapping lightsources and the target hardware has a narrow memory bandwidth (Intel SandyBridge with Mesa 9.1 on Linux). However most implementations determine the tiles using DirectCompute or OpenCL. Since the target hardware on Linux isn't capable of this I'll probably end up computing the tiles on the CPU and I fear I'loose any performance advantage that way. Any Opinions here?

In case you don't already know, it has been shown that tile based deferred shading trumps classic deferred shading and light pre-pass by a substantial margin. I refer you to Andrew Lauritzen's slides. In there you'll also find the same advice on bounding geometry that Dark Photon already suggested. Throw away those full-fledged spheres and cones!

I don't understand why mapping light sources to tiles must be implemented using OpenCL or DirectCompute. It's a simple mapping of bounding geometry to tile bounds. I suggest you implement it and compare it to your non-tiled solution.

BTW, SandyBridge + Linux + OpenCL works - just not with the GPU. So you could implement all the stuff using CL kernels and when the time comes that the GPUs are finally supported on Linux you're already set.

Hi,
@thokra: You're probably right that tiled shading is the best way to go. I'm kinda new to Shading and don't know too much about OpenCL so I'll give it a shot after I've got the oldstyle light volumes to work properly. I guess OpenCL CPU emulated must be slow as hell however.

@Dark Photon: I believe I found at least one more answer but I'n not there yet.
Here is what I do:
-Create FBO MTRs
-Enable writing to FBO (first geometry shader pass)
-Do my scene geometry
-Disable writing to FBO
-Bind MRT Textures
-Now instead drawing a FS Quad I draw the Sphere with same MovelView/Projection and do the lighting pass, do NO draw the fullscreen quad.

I Guess this is the way to go. If all goes right I should be seeing ONLY the parts that intersects with the sphere, i.e. the fragment that are lit, right? Well, it's not happening though . All I see is the sphere in 3D space at it seems that shader maps colors on the sphere (see screenshot).

Let me see if I got this: What you mean exactly with on top of the gbuffer? I guess that I should draw the sphere together with the geometry in the Gbuffer like I already did? Or should I do this in a separate pass together with gbuffer read/light pass?

I meant in the same buffer as the G-buffer, with depth testing and writing enabled, after you rasterize the opaque scene into it. While you could actually do this if you want, I just propose writing an opaque sphere or something is just for you to get a good mental picture of what would happen.

(Hmmm. Need a pic. Search images.google.com...) Ah! Here we go:

Imagine the plane is the opaque ground plane already in your G-buffer (the "scene"). Now imagine that sphere is the maximum illumination extent of a point light source. Further imagine your eyepoint is looking straight down on that plane with the sphere in-view. You can see that the sphere is going to cover all of the pixels on the G-buffer that could potentially be illuminated by the point light source. Right?

Now for the screen-Z issue, check this out:

If (same assumptions) plane is the scene in your G-buffer and the sphere is the light source volume around a point light source, and you're looking down on the plane above the sphere, then you can see that the sphere, while it has screen-space XY coverage over some of the plane (G-buffer) pixels, it doesn't actually contain any of the G-buffer pixels in Z. So it can't illuminate any of the plane pixels.

Make sense?

Well, if I do render the sphere with the geometry to the gbuffer and make the sphere not opaque(transparent?) than the fragment shader has no way of detecting it after rasteriazion during the light pass, right?

Not quite, no. You're only using the rasterization of the light volume solid (or quad, or whatever) to use to determine which samples from the G-buffer you're going to read and apply illumination to for that point light source. The lit result is written to (and blended with) the lighting buffer.

So you don't actually draw an opaque sphere to the lighting buffer or the G-buffer. You just use the pixels that would be rendered as an excuse to apply lighting to those pixels for that light source.

So this leaves me with the question when exactly do I have to rasterize the sphere (geoemetry or light pass)?

After you've finished rasterizing your opaque scene into the G-buffer, and after you've retargetted your rendering from the G-buffer to the lighting buffer. The lighting passes "read" from the G-buffer (via G-Buffer textures bound to texture units for example) and "write" (blend) to the lighting buffer.

And I guess that the fullscreen 2D-Quad I draw is only for the fragments not being affacted by any lightsource, right?

Well, if they're not illuminated by any light source, you can't see them, right?

Most often, you'd rip a full-screen quad for the directional light source (sun/moon/etc.) if there is one in your scene. If you're in a dungeon though, you might not have one.

Some thoughts about tiled based clipping: It would probably suit my needs the best since I'll have many overlapping lightsources and the target hardware has a narrow memory bandwidth (Intel SandyBridge with Mesa 9.1 on Linux). However most implementations determine the tiles using DirectCompute or OpenCL. Since the target hardware on Linux isn't capable of this I'll probably end up computing the tiles on the CPU and I fear I'loose any performance advantage that way. Any Opinions here?

It depends, but my guess is it's probably still worth it if you've got a lot of lights and a lot of light overlap. If you've got slow memory and a fast CPU (the usual case), you're likely gonna save a crapload of slow memory reads and writes with tiled. Try non-tiled first and see what you think. Optimize it. Then try tiled if you need more perf. You just need a way to bin them efficiently. If you're doing culling already, you've probably already got a half-way efficient binning algorithm lying around that you can extend.

And no reason you can't easily implement tiled on the CPU. Personally, I'd always implement something on the CPU first -- it's just too darn "easy" to get running and optimized compared to implementing it on the GPU. Then if you need more speed even after optimization, you can go multicore or go GPU with it.

Hi,
@thokra: You're probably right that tiled shading is the best way to go. I'm kinda new to Shading and don't know too much about OpenCL so I'll give it a shot after I've got the oldstyle light volumes to work properly.

If/when you do, allocate some time to learning it slowly. It's not as easy as writing GLSL shaders. You have to think very carefully about parallel execution issues.

@Dark Photon: I believe I found at least one more answer but I'n not there yet.
Here is what I do:
-Create FBO MTRs
-Enable writing to FBO (first geometry shader pass)
-Do my scene geometry
-Disable writing to FBO
-Bind MRT Textures
-Now instead drawing a FS Quad I draw the Sphere with same MovelView/Projection and do the lighting pass, do NO draw the fullscreen quad.

Yep, you got it!

I Guess this is the way to go. If all goes right I should be seeing ONLY the parts that intersects with the sphere, i.e. the fragment that are lit, right?

Right.

What's all that blue stuff around your sphere?

Looks like you've got the sphere rendering. Now it's just down to figuring out if/why you're not generating the necessary illuminated fragments for the sphere pixels.

again thanks a bunch for your patience and your effort. and this time I believe I finally got it
Image below shows basically the sphere after some code tweaks. There is a "nice" artifact on the right side of the sphere below the texture lightbold indicating that the sphere is indeed a 3D object. Btw. to make the sphere visible I've used some GL_FUNC_ADD blending.

And here is another shot with an additional FS Quad supplying all fragments with static ambient light:

However I'm still not happy with it. First thing I do a glCullFace(GL_FRONT) and that makes the light dissappear as soon as the camera enters the light volume. Disbaling culling seems to help. Is there some other solution?
Second Drawing the FS Quad and the Sphere produces overdraw (rendering the lit fragments inside the sphere and then again with the new lit values).

And still there are many fragments that are not actually inside the sphere but in front or completely behind. You mentioned the Z-Depth issue and I guess I can cull out many fragments lying in front or completely behind the sphere, right? Can I use a stencil buffer - similar like Carmacks reverse to do the culling?

How do I do that? I guess I can render my geometry as usual into the G buffer but I have to setup a separate stencil buffer, right? I found some hints to attach the depth buffer as a stencil buffer much like this:

Well this one actually works for me. OpenGL wiki says the format is 24 bit for depth buffer and 8 bits for stencil (32 bit total). I'm confused about the GL_UNSIGNED_INT_24_8. The depth texture sampling gives us a value between 0..1 yet the format is unsigned int. So how is this supposed to work? Does the shader internally converts the unsigned int to float when sampling with texture()?

Assuming the MRT setup is good, how do I go on from there?
- I guess disable writing into the depth buffer after populating the geomerty to the gbuffer.
- Do an extra sphere render pass where I set up the stencil, like decrement for front side and increment for the backside of a polygon?
- Render the sphere again with lighting pass enabled and set the glStencilFunc correctly, right?

This is a crude description of how I think this might actually work but I'm very fuzzy about the particular details.

I've made some progress using the stencil culling and it amlost works . There is however one more issue: Depending on the view of the camera the stencilbuffer seems to have trouble culling out fragments that are completely in front of the sphere.

Screenshot below shows the sphere with a view in direction (0,0,-1):

Well this looks pretty good actually. The blue background is intentional by the way .

Now lets see how the result is when the camera is in direction (-1,0,0):

And in direction (0,0,1):

It seems that some floor and ceiling fragments covering the part of the sphere hat reaches below the floor/ceiling failed the stencil test. This is especially strange since it depends on the viewpount. How can I fix that?