Customized clipping volume

According to the current specification the clipping volume is defined by:

-Wc <= Xc <= Wc
-Wc <= Yc <= Wc
Zmin <= Zc <= Wc

where Xc,Yc,Zc,Wc are the clip coordinates produced by the vertex shader as the components of gl_Position, and the Zmin is either -Wc or 0 depending on the value of depth mode set by glClipControl function. After the clipping, the division of {Xc,Yc,Zc} is performed by {Wc,Wc,Wc} producing the normalized device coordinates.

As the upper clipping bound (which is also the division vector) assembled from 3 identical values {Wc,Wc,Wc}, then all three coordinates {Xc,Yc,Zc} are divided by the same value. Technically, I see no reason to restrict the division vector from being assembled with unequal values (not necessarily {Wc=Wc=Wc}). By letting the vertex shader to output the division vector explicitly the additional functionality may be achieved (will be discussed below).

So the proposed extension references to the shading language adding an additional _optional_ output for the vertex shader stage:

out vec3 gl_PositionDiv;

If the vertex shader writes to the gl_PositionDiv, then the clipping volume is defined by:

where Zmin is either -gl_PositionDiv.z or 0 depending on the value of depth mode set by glClipControl function. If the vertex shader does not write to gl_PositionDiv then that vector is automatically assembled as:

gl_PositionDiv = gl_Position.www

which is being essentially an equivalent of the fixed functionality implemented currently.
After the clipping the normalized device coordinates are calculated by dividing the gl_Position.xyz by gl_PositionDiv.

The major problem this extension is targeted to solve is a poor z-buffer utilization due to the uniform division of x,y and z components. Specifying the separate division coefficients for xy and z opens a new possibilities to control the distribution of depth values. In particular, the Far clipping plane can be eliminated to allow drawing the objects any distance away and Near clipping plane can be set at much closer distances compared to the conventional setups. This will make it possible to render the large scenes without introductions of separate cameras or overpushed Near clipping plane; the special/oversized depth buffer formats would not be required either.

NOTE: the gl_PositionDiv is defined at that post as vec4 type. I can not edit the post as the time limit has expired. But it might be even better to make gl_PositionDiv a four-component vector and use the forth component as clipping bounds for Wc just the same way the first three are used (unless it will have a considerable performance cost). This will further increase the functionality of the proposed extension.

Can't sleep well having no answer for the topic - so badly I want that extension.
I even reposted that on nVidia forums and placed a reference at AMD forum - dead silence! Is there any way to contact anyone who can at least tell if this extension is possible to implement at all? Anybody? Please!..

If you wan't this so badly, why don't you devide the components of gl_Position by your special division vector, set the w component 1.0 and see if it really is what you want.

Division must be done after the clipping. Doing so in vertex shader is a potential source of overflow and division-by-zero for those which trap into the area close to the clipping plane (W plane, "division plane").

Originally Posted by Agent D

Also, if it can be implement in the shader by yourself and nobody else on earth needs it, why would you create a GL extension for it?

Do you even have a use case for this?

Right now I save the value of the negated z-component, interpolate it and write as gl_FragDepth implementing linear z-buffer which allows me to draw the scene without artifacts with zFar>200000 and zNear=0.001 (can be even smaller) having 24 bits of depth buffer. With this setup I have a z-buffer able to distinguish between surfaces 0.01 apart from each other over the whole 200000 distance.
This action is essentially equivalent to dividing the negated z-component by '1.0' (instead of 'Wc' which is still used for x and y components) and using Zmin=0 instead of zMin=-Wc (which may be set via glClipControl).

Making the components of division vector separately specified instead of uniformly set to the same value will let us control the distribution of z-values independently from xy.

What will we get? We can draw incomparably larger scenes with an incomparably smaller constant zNear without a need to introduce additional cameras, dynamic adjusting of zNear, and even a standard 24bit depth buffer will be sufficient.

Practically, with this extension we would be able to attach a camera-object of a size as small as a human's eye right onto the character's model and it would be able to capture the large open-world scene as well as the parts of the character's model the camera is mount on. This will let the game engines to render and simulate the character's body uniformly with the other game objects without "making it transparent" or introducing fake parts rendered individually which cause those to not receive shadows, decals, not colliding with other objects, e.t.c.

Well, there are techniques that make it achievable even without the extension, yes, and there are games that prove it is all possible. But all those tricks have computational expenses, while the proposed extension does not add any additional computations overhead: there are 3 values (Xc,Yc,Zc) being divided anyway, and I see no technical difference of dividing those by a vector which has different components instead of the equal ones. So all it takes is to allow us to specify those components explicitly instead of getting them assembled by a fixed functionality. Or is there is smg I do not take into account?

The hardware units that deal with depth writes and testing are fixed-function units (also color blending/writes), perhaps that's why only fixed 16b/24b and [0,1] FP32 is used. If the full FP32 range were available for the z-buffer, I would think that we'd at least see an extension to glDepthRange() to allow values outside of [0,1]. (I also wouldn't mind seeing an increased z-buffer range for some bad cases artists create, btw.)

I see you did not read the actual definition - just a topic's name, ay?
The resulting NDC coordinates are still fall in range [-1...1]. Only the clipping bounds I suggest to make unequal - instead of
{Wc,Wc,Wc} vs {-Wc,-Wc,zMin} I suggest to use custom possibly unequal values for each of the coordinates:
{Wcx,Wcy,Wcz} vs {-Wcx,-Wcy,zMin} which are derived from the explicitly set division vector
{Wcx,Wcy,Wcz} instead of the single tripled value Wc taken from the forth component of gl_Position.

Generally they tend to reduce the fixed functions, not to extend them. I would rather have them remove perspective division altogether and let the shaders do it themselves if they need it.
Division is one of the more hardware-taxing operations, its cost is relatively high and is good to avoid when not necessary.
Your proposal implies 3 divisions instead of single one per vertex. (note that dividing all x, y and z by the same value w really means single division (1/w) and 3 multiplications, which are far cheaper)

Generally they tend to reduce the fixed functions, not to extend them. I would rather have them remove perspective division altogether and let the shaders do it themselves if they need it.

Once again. Perspective division must be done after the clipping. You can not clip primitives in vertex shader. Division can not be done for the zero or tiny numbers in divisor as the result is undefined. And the clipping and division using the same number also ensures that the result lays in range [-1...1]. My point is that the number may be different for different components, not necessarily the same for all.

Originally Posted by l_belev

Division is one of the more hardware-taxing operations, its cost is relatively high and is good to avoid when not necessary.
Your proposal implies 3 divisions instead of single one per vertex. (note that dividing all x, y and z by the same value w really means single division (1/w) and 3 multiplications, which are far cheaper)

?! O.O Sorry, but as far as I know, the GPU hardware is vectorized - processing 4 items at once. Calculation of 1/w is the same thing as calculation of {1/w,1/w,1/w,1/w}. The source register may be stuffed with 4 different items to produce {1/wx,1/wy,1/wz,1/ww} and it will take the same amount of cycles as if all values would be equal - it doesn't matter what were the actual contents of the source xmm register (assuming hardware with SSE support).

At any rate, this extension idea seems terribly fishy anyways. As pointed out, normalize your w to one then divide by those gl_PositionDiv factors. The only issue where it is a problem is when none of the fields from gl_PositionDiv is negative or close to zero. But the world is not ended. You can do this all with a geometry shader and do the clipping yourself [a triangle clipped N-times produces a triangle fan of no more than N+1 triangles by the way]. Also, the entire point of dividing by the same value, w, is to do perspective. What it looks like you really want is just for depth values, which can be done by by hand from the vertex shader anyways by normalizing yourself. The implicit requirement of w>0 means infront of the eye (but not the near plane), which I assume you'd want anyways. Normalize the z's yourself is my advice.

Asking for a different divide value for each element of gl_Position.xyz would make the clipper (fixed function part) of a GPU take even more sand, in order to keep the same performance of triangles to clocks because various performance shortcuts for the most common situation (namely all w's are positive and away from zero) would be gone. On subject of that, most hardware (if not all) has a dedicated unit handling triangle setup and clipping all rolled into one. Additionally, most implementations have a guard band logic to avoid clipping and let scissoring do the job. To be precise: if a triangle has all of the w's are positive (and away from zero) and z's in happy range [-1,1], then scissoring takes care of the clipping volume (essentially). If all the w's are positive and some of the z's are icky, then a triangle needs to get clipped against just the two z requirements, which results in at most 3 triangles. The really ugly case is when one or more of the w's is negative (but not all); that case is the icky case and the clipper more often than not then does the clipping pain computation against all the clipping planes. That part sucks, always sucks and uses up a fair amount of sand. There has been hardware (like old Intel GPU's) that did not have a dedicated clipper; the clipping and divide work was done by the programmable EU's. It was not happy, so they added a dedicated clipper.

My advice: likely all you want is normalizing z your own way (which just is a VS job), but if you really want the whole enchilada, make a GS to implement that which you are after.