Linearize the Depth buffer

Just want to share some thoughts about the famous depth buffer' resolution issues when drawing the large scenes.

It all comes from the uniform division of all coordinate components by w, which is, I think, undesirable in some cases.

As I am rendering a terrain of a size as large as 100x100 km (!), I had to win in the battle of Z-fighting. I had to linearize the Z-buffer in order to do that.

I do understand that the default non-linear Z-buffer allows eye-to-world transformations done by a single inverse matrix multiplication, while linear depth will require some magic to be done separately for perspective-divided xy and linear z values to achieve the same transformation. But IMO this disadvantage is not as big as the Z-fighting issues.

So let's estimate the precision aspect of the problem - what level of tolerance do we need from a Z-buffer? I think, in most scenes the surfaces hardly get closer than 1 mm to each other. So having the tolerance of +-0.5mm for Z-buffer to distinguish is enough. In case the z-values linearly related to the distance, what would be the maximum drawable distance with a depth buffer of 32-bit precision? Surprisingly, it can handle over 2000 km (!), distinguishing the surfaces which are 1mm apart from each other. At any distance! In case of 24-bit depth buffer we get 8 km only, but it is still not bad, isn't it?

Try to do it with conventional Z-buffer, huh? The zNear should be pushed so far back, that the new camera will need to be introduced to cover the gap, and then there will be a mess with two depth buffers, or a fragmented one... Headache.

So how do we make a Z-buffer linear instead of the default perspective-divided?

In some places I've seen the advice to premultiply the vertex shader output's z-component by w, so after the unavoidable division by w we will get the original z back. I came to this naive idea too, and THIS WAY IS WRONG!!! I got pop-through artifacts for the primitives intersecting the clipping plane. The source of that artifact is a clipping, which is performed BEFORE the division by w. Once the primitive gets clipped, the new vertices are inserted, and their x,y,z,w are the product of linear interpolation, which results in incorrect value assigned to z-coordinate - it makes it bigger than it actually should be, letting the primitives just behind the rendering ones to bleed-through.

So the correct way to make the Z-buffer linear is to write the desired depth value right into the gl_FragDepth inside the fragment shader. So we take z from the result of model-view transformation, negate it, divide by zFar and get a depth value which is 0.0 at the camera's position and 1.0 at the most distant point in the scene. The resulted value calculated inside the vertex shader we store in the additional output component to interpolate across the primitive and simply write it directly to gl_FragDepth inside the fragment shader.

And here is the drawback: modifying the gl_FragDepth switches off early Z-cull, which is a very desirable feature, especially if the fragment shader's code is heavy.

And it seem to be to silly that there is no extension exists yet, which turns on/off the division of z-component by w. Or better yet, let the vertex shader to specify the clipping and division vectors explicitly instead of the fixed-functionality-generated {w,w,w,w}|{-w,-w,-w,w} pair for clipping, and {w,w,w,w*w} for division.

As far as I have read into it, there are 2 ways:
- calculating your own log depth and writing it to gl_FragDepth, then a 16bit buffer already should give you really good results (But losing all possible hardware optimization like hyper-z and early-z rejection)
- 32bit float depth buffer with inverted range (possible with glDepthRangedNV from GL_NV_depth_buffer_float, also supported by AMD/ATI but not by Intel)

Here the "MaxViewDistance" is a maximal possible distance a point in the scene could possibly have. As a result, the gl_Position.w ranges from (0.0 to 1.0], and with the setup shown the near clipping plane is located 1mm away from the camera's plane and there is no far clipping plane - instead, the assumption is taken that there are no points could be further away from camera than the "MaxViewDistance" says.
The "LinearFragDepth" is an output variable being linearly interpolated across the polygon and written as a final frag depth inside the fragment shader:

Code :

gl_FragDepth = LinearFragDepth;

Yeah, simply writing the interpolated and scaled depth value directly to the z-buffer, saving it from f@#%ing division, which we can not avoid FOR NO REASON - that is the only thing so far I learned to hate in my beloved OpenGL.
To unproject the coordinates back into View space, we need to divide XY by the depth (Z), negate Z, and then multiply all three by MaxViewDistance. As simple as that. No brainwashing remappings required.

This z-linear algorithm works fine for my terraforming engine, however, the fps suffers from unavailability of early Z-cull, because I touch gl_FragDepth, which switches that feature off. And that is so silly! In fact, I statically write the interpolated attribute - the same could be done by fixed functionality by simply NOT dividing one of the components by w.

And this clipping space for z [-1...1] being mapped to [0...1] - it doesn't make sense to me as well. Additional operation included just for brainwashing? It takes me one glance at a single line of DirectX code to hate it, but some things in it are better than they are in OpenGL. Such a shame, ay?

Can we expect more control over the clipping and perspective division operations in future releases of OpenGL? Or they are here already unknown to me?

As I am rendering a terrain of a size as large as 100x100 km (!), I had to win in the battle of Z-fighting. I had to linearize the Z-buffer in order to do that.

That is not an extremely huge terrain. Even if the viewer stands on the edge of that block, the whole terrain should be visible without artifacts with 24-bit depth buffer by setting the front clipping plane at 1m and the back clipping plane at 100km.
I'm using far/near = 1e5 and it suites 24-bit depth buffer fine. Just enable back face culling

Originally Posted by Yandersen

So let's estimate the precision aspect of the problem - what level of tolerance do we need from a Z-buffer? I think, in most scenes the surfaces hardly get closer than 1 mm to each other. So having the tolerance of +-0.5mm for Z-buffer to distinguish is enough. In case the z-values linearly related to the distance, what would be the maximum drawable distance with a depth buffer of 32-bit precision? Surprisingly, it can handle over 2000 km (!), distinguishing the surfaces which are 1mm apart from each other. At any distance! In case of 24-bit depth buffer we get 8 km only, but it is still not bad, isn't it?

I'm not sure I understand this math. There is no bases for such general claims. The size of the object, or to be more precise a screen-space error depends not only on the distance, but also on FOV. The pixel is something we should used for distinguishing object, not distance in millimeters. Or you are talking about front clipping plane when you mention tolerance.

Many years ago, I also thought regular 24-bit depth buffer is not enough for large terrain rendering, but then, after switching to topocentic coordinate system I realized that I don't need higher precision. The near clipping plane is usually kept at 1/2 or 1/3 of the distance to the nearest object or terrain surface, while the far clipping plane is 1e5 times further away. If there is an object near the eyes of the viewer while he looks at the distant horizon, it probably should be rendered in a separate pass with different clip distances. By the way, to have a realistic display of such a scene, either that object or the horizon should be blurred.

Well, currently I have far/near = 30000/0.001 = 3e7 and 24 bits of depth buffer which I stuff with the linear Z values, and it works fine too. As a maximum value stored as uint24 corresponds to the distance 30000 and the minimum is 0, then I have the buffer with constant precision over the whole range, capable to distinguish between two surfaces 0.00179 apart from each other, no matter where exactly in the scene are those positioned.

Originally Posted by Aleksandar

I'm not sure I understand this math. There is no bases for such general claims. The size of the object, or to be more precise a screen-space error depends not only on the distance, but also on FOV. The pixel is something we should used for distinguishing object, not distance in millimeters. Or you are talking about front clipping plane when you mention tolerance.

Sorry for being a stubborn old-fashioned european here, but for me, the value 1.0 stands for "1 meter (m)" and 0.001 is "1 millimeter (mm)". In other words, I implicitly tag the abstract numbers to give them some real-world meanings, so they taste more realistic, you know.

Originally Posted by Aleksandar

The near clipping plane is usually kept at 1/2 or 1/3 of the distance to the nearest object or terrain surface

Yes, one have to mess-up doing the checkings all the time, and we still look through the walls in games and have invisible body of the character being played. C'mon, let's move on, let the camera be simulated like it is a real object - let place it wherever we want on the model. The "real" camera is OK with zNear to be a couple dozen of millimeters away from the view' center and it has to be able to "see" hundreds kilometers away. It is not possible with conventional Z-buffer, but it becomes trivial with linear Z-buffer by simply avoiding that w division for the third component. The target of the rendering pipeline is the projection from 3D onto the surface. Surface has x and y coordinates, but z is just an attribute. It does not need to be divided or clamped by the same values as x and y are, it has to be handled differently as it serves different purpose. And as I see in GL4.5 there is some DirectX-inspired movement in that direction - glClipControl which allows to change the clipping volume from -1...1 to 0...1 for z. Good. But not enough yet, I think.

---

Back to the topic of clipping and perspective division.

The fixed functionality takes a w component of whatever was written in the gl_Position, and generates 2 vectors from it: {w,w,w} and {-w,-w,-w}, which are used to clip the xyz values. And with the help of glClipControl the second vector could be {-w,-w,0}. After the clipping we got new primitives generated, which lay entirely inside the clipped volume. That means that after we got a division of the components by the upper bound ({w,w,w}) it brings values in range [-1...1] (for z it could be [0...1]). And from that point of view, I see no reason for the fixed functionality here: the division vector could be an output of the vertex shader just like the gl_Position, so the vertex shader could assemble it not necessarily like {w,w,w}, but place any values into it. If that would be allowed, then the linear Z-buffer could be implemented as simple as:
gl_PositionDiv = vec3(gl_Position.ww, zFar);
the ortographic projection could be done similarly:
gl_PositionDiv = vec3(1.0,1.0,1.0);
the conventional perspective projection:
gl_PositionDiv = gl_Position.www;
So the gl_PositionDiv together with automatically generated negative version of it will be used for clipping, and division is made afterwards by gl_PositionDiv to bring the coordinates in range [-1...1].

We can even push more functionality into gl_PositionDiv by expanding it from 3-components to 4, using the last component to make an additional clipping based on the value of gl_Position.w, which will serve as "free" clipping plane(s).

The gl_PositionDiv can essentially substitute the projection matrix making it obsolete in some way:
gl_Position = ModelViewMatrix * VertexCoord;
gl_PositionDiv = vec4(a*(-gl_Position.z), b*(-gl_Position.z), -far, (-gl_Position.z)/near);
where 'a' = tg(HalfAngle)*aspect, and 'b' = tg(HalfAngle).
Such setup makes a perspective projection and a linear z buffer. The last component, w, is discarded after it is used in clipping. It serves as a cull plane for culling the geometry behind the near clipping plane in the example above.

The conventional projection, defined as
gl_PositionDiv = gl_Position.wwww;
in most cases equals to
gl_PositionDiv = -gl_Position.zzzz;
It could be modified to allow a better z-buffer utilization by dividing the third component not by -z, but by -z with some offset:
gl_PositionDiv = vec4(-gl_Position.zz, -gl_Position.z+zOffset, -gl_Position.z);
as the result, the distribution of depth values will be close to linear in the area near the viewer, but for the further objects the precision will degrade approaching 0 at infinite distance. Such mapping of Z-buffer may be the most optimal solution, I think.

There are numerous tricks could be made playing with gl_PositionDiv variable. If the shader does not write to it, it is automatically constructed (gl_PositionDiv=gl_Position.wwww) resulting in the equivalent of a conventional fixed functionality. Just like in case with the gl_FragDepth.

What do you guys think about such GLSL extension which allows to explicitly specify the division vector?

Well, currently I have far/near = 30000/0.001 = 3e7 and 24 bits of depth buffer which I stuff with the linear Z values, and it works fine too. As a maximum value stored as uint24 corresponds to the distance 30000 and the minimum is 0, then I have the buffer with constant precision over the whole range, capable to distinguish between two surfaces 0.00179 apart from each other, no matter where exactly in the scene are those positioned.

Can you provide some proof of this? A screen shot or something else.
Because, with that settings the polygons are scattered all over the scene much before 30km. Take a look at the link. pfd.cDepthBits = 24!
Near clipping plane should be set to at least 1m in order to have consistent rendering without artifacts.

On the other side, what should be viewed at 1mm from the eye?

Originally Posted by Yandersen

Yes, one have to mess-up doing the checkings all the time, and we still look through the walls in games and have invisible body of the character being played. C'mon, let's move on, let the camera be simulated like it is a real object - let place it wherever we want on the model. The "real" camera is OK with zNear to be a couple dozen of millimeters away from the view' center and it has to be able to "see" hundreds kilometers away.

I'm not sure what you are developing, but every game should have some kind of collision detection. So, the engine should check distances. Even with 1e-6m nearZ you can walk through the wall if you don't have collision detection.

Honestly, it is much easier to deal with higher precision and don't care for settings and splitting scene (if it is really necessary). I do understand why you are trying to find a solution that could make your life easier, but as you can seen playing with depth values will just ruin the framerate.

Can you provide some proof of this? A screen shot or something else.
Because, with that settings the polygons are scattered all over the scene much before 30km. Take a look at the link. pfd.cDepthBits = 24!
Near clipping plane should be set to at least 1m in order to have consistent rendering without artifacts.

Should I repeat that I use a linear z-buffer? Here, I dropped a camera as close to the ground as I could without intersecting the surface (it is around a couple cm above the ground on the pic) and rendered 240km view distance (240000) having the near clipping plane set at 1mm (0.001) - see the image in attachment. Obviously, the fps is crippled having so much polygons rendered, but you asked for the proof of 24-bit linear z-buffer capability, so there it is. If you want to see the code, I keep it all open here:https://drive.google.com/folderview?...Tg&usp=sharing
This is just a hobby for me, nothing commercial. And as I do all this for fun, I want to enjoy that. And zNear=1m makes me unhappy. Messing with two cameras makes me unhappy. Poor z-buffer resolution makes me unhappy. But solving all these problems with linear z-buffer makes me very happy and proud of myself! And that is why I want to bring that topic for a discussion to share the trick with the others to make someone else happy, right? And if the proposed extension will be implemented, z-buffer could be used more efficiently and everybody can be happy! Without the extension, the only drawback is an unavailability of Early Z-Cull which does not work if fragment shader writes depth, but I have no choice because fixed functionality likes to divide z, so I have to manually save it from that.

Originally Posted by Aleksandar

On the other side, what should be viewed at 1mm from the eye?

Um, the nose I guess?..
I am trying to prove that using the depth buffer in more optimal way we can do much more and much better with it without making it bigger in size. The only thing on the way, IMHO, is the stubbornness of the minds and legacy conventions.

Originally Posted by Aleksandar

I'm not sure what you are developing, but every game should have some kind of collision detection. So, the engine should check distances. Even with 1e-6m nearZ you can walk through the wall if you don't have collision detection.

With the collision detection you wouldn't be able to push your camera with 1m zNear through the door anyway. Would be funny to watch like the artifacts on the distant landscape suddenly appear as the character walks into the house forcing the game engine to squeeze zNear to adjust for a short-distanced environment. Unless two camera's implemented with all this unnecessary mess it brings along with it. I really do not see the reason for all this. All the problems go away once the perspective division of x and y leaves z-component alone. The easy implementation I suggested in the post above.

Originally Posted by Aleksandar

Honestly, it is much easier to deal with higher precision and don't care for settings and splitting scene (if it is really necessary). I do understand why you are trying to find a solution that could make your life easier, but as you can seen playing with depth values will just ruin the framerate.

That is why I suggest the extension which will allow the user to explicitly set the perpective division vector instead of letting the fixed functionality to assemble it with gl_Position.www. Technically, what is the difference which values go in that vector? Letting the user to specify the different values will greatly expand the functionality and potential of the shaders, allowing better use of depth buffer, as I suggested in the post above.

I've took a thought about what would be the most optimal function for calculating the depth value targeting the most optimal z-buffer utilization AND simulating the camera's perspective as close to reality as possible. I've come to the conclusion that the inverse function is the best option:

Depth[-1...1) = (-z + a) / (-z + b);

This function removes the limitation of far clipping plane because the depth approaches 1 as z approaches negative infinity. As a consequence, the precision of the depth buffer is the highest in the area close to the camera and it degrades along with the distance of the object being drawn. The good thing is that the rate at which precision degrades can be controlled by the 'a' and 'b' coefficients: half of the depth buffer is utilized at the distance (z) equal to 'a'. Therefore, changing the 'a' we can control the distribution of the z values. The increase in precision nearby the camera results in the decrease of the precision at further distances, and it makes perfect sense as normally the distant objects are drawn with less details. But in contrast to the standard approach, the distribution of z-values can be controlled to prevent the excessive resolution nearby the camera even if the near clipping plane is very close.
To control the distribution of depth values let's define two constants:

zNear - the distance from the camera (-z) behind which all geometry should be clipped (corresponds to Depth==-1); consider this as the position of the camera's lens;

zHalf - the distance from the camera (-z) at which the half of the depth buffer is utilized (Depth==0); approximately 1/3 of the desired view-distance should be used there, IMO; and obviously, that value should be bigger than zNear;

And again, remember: Depth==1 is an unreachable limit.

Having basic parameters specified, the exact formula will look like this:

Depth[-1...1) = (-z - zHalf) / (-z + zHalf - 2*zNear);

The resulting depth will be higher than -1 for all z below -zNear and as z approaches negative infinity the depth approaches 1 (think about object being drawn further and further away from the camera). So with this function the clipping volume becomes an opened half-space letting the fragments being drawn if their z is below the -zNear limit:

That removes the zFar clipping plane which puts an artificial limit for the view distance.

The x and y are still have to be divided by -z to simulate the effect of perspective. And this makes the contrast for the z component which we want to divide by different value (-z + zHalf - 2*zNear). And I see no other way to efficiently solve this rather than implement the extension (yeah, I will repeat myself, roll the eyes, guys) which will let the vertex shader to output the custom division vector:

In contrast to the extension approach, the simulation will involve the use of clipping plane (it is possible to cheat here, though), add an additional division in the vertex shader (which is done before the clipping, and that gives divisions by 0 in some cases) and involves the use of gl_FragDepth which turns off the Early Z-cull.

Should I post the proposed extension somewhere in "Suggestions for the next release of OpenGL"?

A small trick I've just discovered which I've never seen anywhere else so far. The projection matrix can be easily configured to have no far plane - just the near clipping plane. Such projection matrix looks like this:

where 'f' is a cotangent of a half of the view angle and 'a' is an aspect ratio of the screen (x/y). So those two values are the same as the gluPerspective uses. The only structural difference can be seen in P[10] and P[14] members.

Using such type of perspective matrix we have all geometry behind the near clipping plane clipped (objects behind it have the Zndc < -1). The value of Zndc is approaching 1 as the object' distance (Zc) approaches negative infinity. As the result, geometry in front of the camera is never clipped ("theoretically"). But in practice, as I tested, due to the precision errors, distant objects suffer from flickering and for a very large distances actually some of them somehow reach the limit and partially disappear.

The distance at which flickering start to occur depends only on the resolution of the depth buffer and the distance at which the near clipping plane is placed - the further away it is, the better. Just like with the conventional setup. The only difference is that one does not need to bother with a far clipping plane.

The rough testing has shown that with 'near' set to 0.1 and 24bits of depth buffer' precision one can draw objects as far as ~20000 before the flickering becomes noticeable. The result seem to be consistent with the conventional far-clipped perspective matrix, IMO.

Did anyone ever encountered usage of such perspective matrices? It does not seem to be common...
I understand that having both planes specified one can ensure that the whole depth buffer range is utilized. But in practice, if the ratio of far/near is just 20, the difference of z-buffer utilization is just 10% and for the larger ratios the difference becomes indistinguishable. Therefore I see no good reason to bother with far clipping plane set by the conventional perspective matrix unless I am missing something?..

Yeah, simply writing the interpolated and scaled depth value directly to the z-buffer, saving it from division, which we can not avoid FOR NO REASON - that is the only thing so far I learned to hate in my beloved OpenGL.

There's actually a good reason why the Z-buffer works as it does. Z-buffer depth is linear in screen space. In other words, it's "noperspective".

One implication of this is that it can be interpolated cheaply. That might not seem to be so important on TFLOPS GPUs but keep in mind that the Z-reject rate should be high enough to avoid frequent pipeline bubbles (and that MSAA multiplies the number of depth tests required per fragment). Another implication is that with screen space linear values the depth buffer can be compressed efficiently using simple delta encoding schemes.

With view space linear depth you lose both these advantages. I would argue that you're better off storing Znear/Zview in a float depth buffer since that's still linear in screen space and has a constant relative error.