Meta

Tag: Z-Buffer

I wrote an earlier posting, in which I tried to simplify the concept of Multi-Sampling. This posting will not make sense to the reader, unless he or she has already read the earlier posting, or unless otherwise, he or she already knows, how Multi-Sampling is different from Full-Screen Anti-Aliasing.

Just due to some thought, I’ve come to realize a major flaw in my earlier description. In spite of the rendering of each triangle, being unaware of the rendering of other triangles, a distinction nevertheless needs to exist, between how the ability of one triangle-edge to fill only part of a screen-pixel, should affect the lighting of triangles belonging to the same model / entity, and how this should affect the lighting of triangles belonging to some other model /entity.

If two triangles belong to the same model, and the first fills 47% of a screen-pixel, then this should not make the second triangle less-bright, and the two of them may yet succeed at filling that screen-pixel completely. Yet, if the second triangle belonged to another model later-rendered, and assumed to be placed behind the first model, then its brightness should in fact be reduced to 53%.

I think that the only way this can be solved, is to involve another buffer. This one could be called a ‘Multi-Sample Mask’. Triangles are super-sampled, and start to fill this mask with single bits per super-sample, kind of like a stencil. Then, the triangles belonging to the same model / entity would be singly-sampled, but would only write their shaded color to the screen-pixel, to whatever degree the corresponding patch in the multi-sample mask fills the screen-pixel.

(By default, whatever fraction of the output-color would be added to the screen-pixel, as long as the screen-pixels started out as zeroes or black, before rendering of the model /entity began. )

And then, before another entity can be rendered, the mask would need to be cleared – i.e. set back to zeroes.

As it stands, the Z-buffer would need to have the resolution of the Multi-Sample Mask – as if FSAA was being applied.

I think that the question, of whether only the edges of each entity will be anti-aliased, or of each triangle, will be answered by how often this mask is reset.

(Updated 12/06/2017 : )

(As it stood 12/05/2017 : )

AFAICT, This represents a special problem with alpha-textures, and alpha-entities.

In the past, when I was writing about hardware-accelerated graphics – i.e., graphics rendered by the GPU – such as in this article, I chose the phrasing, according to which the Fragment Shader eventually computes the color-values of pixels ‘to be sent to the screen’. I felt that this over-simplification could make my topics a bit easier to understand at the time.

A detail which I had deliberately left out, was that the rendering target may not be the screen in any given context. What happens is that memory-allocation, even the allocation of graphics-memory, is still carried out by the CPU, not the GPU. And ‘a shader’ is just another way to say ‘a GPU program’. In the case of a “Fragment Shader”, what this GPU program does can be visualized better as shading, whereas in the case of a “Vertex Shader”, it just consists of computations that affect coordinates, and may therefore be referred to just as easily as ‘a Vertex Program’. Separately, there exists the graphics-card extension, that allows for the language to be the ARB-language, which may also be referred to as defining a Vertex Program. ( :4 )

The CPU sets up the context within which the shader is supposed to run, and one of the elements of this context, is to set up a buffer, to which the given, Fragment Shader is to render its pixels. The CPU sets this up, as much as it sets up 2D texture images, from which the shader fetches texels.

The rendering target of a given shader-instance may be, ‘what the user finally sees on his display’, or it may not. Under OpenGL, the rendering target could just be a Framebuffer Object (an ‘FBO’), which has also been set up by the CPU as an available texture-image, from which another shader-instance samples texels. The result of that would be Render To Texture (‘RTT’).

The concept seems rather intuitive, by which a single object or entity can be translucent. But another concept which is less intuitive, is that the degree to which it is so can be stated once per pixel, through an alpha-channel.

Just as every pixel can possess one channel for each of the three additive primary colors: Red, Green and Blue, It can possess a 4th channel named Alpha, which states on a scale from [ 0.0 … 1.0 ] , how opaque it is.

This does not just apply to the texture images, whose pixels are named texels, but also to Fragment Shader output, as well as to the pixels actually associated with the drawing surface, which provide what is known as destination alpha, since the drawing surface is also the destination of the rendering, or its target.

Hence, there exist images whose pixels have a 4channel format, as opposed to others, with a mere 3-channel format.

Now, there is no clear way for a display to display alpha. In certain cases, alpha in an image being viewed is hinted by software, as a checkerboard pattern. But what we see is nevertheless color-information and not transparency. And so a logical question can be, what the function of this alpha-channel is, which is being rendered to.

There are many ways in which the content from numerous sources can be blended, but most of the high-quality ones require, that much communication takes place between rendering-stages. A strategy is desired in which output from rendering-passes is combined, without requiring much communication between the passes. And alpha-blending is a de-facto strategy for that.

By default, closer entities, according to the position of their origins in view space, are rendered first. What this does is put closer values into the Z-buffer as soon as possible, so that the Z-buffer can prevent the rendering of the more distant entities as efficiently as possible. 3D rendering starts when the CPU gives the command to ‘draw’ one entity, which has an arbitrary position in 3D. This may be contrary to what 2D graphics might teach us to predict.

Alas, alpha-entities – aka entities that possess alpha textures – do not write the Z-buffer, because if they did, they would prevent more-distant entities from being rendered. And then, there would be no point in the closer ones being translucent.

The default way in which alpha-blending works, is that the alpha-channel of the display records the extent to which entities have been left visible, by previous entities which have been rendered closer to the virtual camera.

Any game-engine currently on the market, uses the GPU of your computer – or your tablet – to do most of the work of rendering 3D scenes to a 2D screen, that also represents a virtual camera-position. There are two constants about this process which th game-engine defines, which are the closest distance at which fragments are allowed to be rendered, which I will name ‘clip-near’, and the maximum distance rendering is to be extended to, which I will name ‘clip-far’.

Therefore, what some users might expect, is that the Z-buffer, which determines the final outcome of the occlusion of the fragments, should contain a simple value from [ clip-near … clip-far ) . However, this is not truly how the Z-buffer works. And the reason why has to do with its origins. The Z-buffer belonging to the earliest rendering-hardware was only a 16-bit value, associated with each output pixel! And so a system needed to be developed that could use this extremely low resolution, according to which distances closer to (clip-near) would be spaced closer together, and according to which distance closer to (clip-far) could receive a smaller number of Z-values, since at that distance, the ability of the player even to distinguish differences in distances, was also diminished.

And so the way hardware-rendering began, was in this Z-buffer-value representing a fractional value between [ 0.0 … 1.0 ) . In other words, it was decided early-on, that these 16 bits followed a decimal point – even though they were ones and zeros – and that while (0) could be reached exactly, (1.0) could never be reached. And, because game-engine developers love to use 4×4 matrices, there could exist a matrix which defines conversion from the model-view matrix to the model-view-projection matrix, just so that a single matrix could minimally be sent to the graphics card for any one model to render, which would do all the necessary work, including to determine screen-positions and to determine Z-buffer-values.

The rasterizer is given a triangle to render, and rasterizes the 2D space between, to include all the pixels, and to interpolate all the parameters, according to an algorithm which does not need to be specialized, for one sort of parameter or another. The pixel-coordinates it generates are then sent to any Fragment Shader (in modern times), and three main reasons their number does not actually equal the number of screen-pixels are:

Occlusion obviates the need for many FS-calls.

Either Multi-Sampling or Super-Sampling tampers with the true number of fragments that need to be computed, and in the case of Multi-Sampling, in a non-constant way.

“Alpha Entities“, whose textures have an Alpha channel in addition to R, G, B per texel, are translucent and do not write the Z-buffer, thereby requiring that Entities behind them additionally be rendered.

And so there exists a projection-matrix which I can suggest which will do this (vertex-related) work:

One main assumption I am making, is that a standard, 4-component position-vector is to be multiplied by this matrix, which has the components named X, Y, Z and W, and the (W) component of which equals (1.0), just as it should. But as you can see, now, the output-vector has a (W) component, which will no longer equal (1.0).

The other assumption which I am making here, is that the rasterizer will divide (W) by (Z), once for every output fragment. This last request is not unreasonable. In the real world, when objects move further away from us, they seem to get smaller in the distance. Well in the game-world, we can expect the same thing. Therefore by default, we would already be dividing (X) and (Y) by (Z), to arrive at screen-coordinates from ( -1.0 … +1.0 ), regardless of what the real-world distances from the camera were, that also led to (Z) values.

This gives the game-engine something which photographic cameras fail to achieve at wide angles: Flat Field. The position from the center of the screen, becomes the tangent-function, of a view-angle from the Z-coordinate.

Well, to divide (X) by (Z), and then to divide (Y) by (Z), would actually be two GPU-operations, where to scalar-multiply the entire output-vector, including (X, Y, Z, W) by (1 / Z), would only be one GPU-operation.

Well in the example above, as (Z -> clip-far), the operation would compute:

Of course I understand that a modern graphics card will have a 32-bit Z-buffer. But then all that needs to be done, for backwards-compatibility with the older system, is to receive a fractional value that has 32 bits instead of 16.

Now, there are two main derivations of this approach, which some game engines offer as features, but which can be achieved just by feeding in a slightly different set of constants to a matrix, which the GPU can work with in an unchanging way: