Perspective Projection in OpenGL

Until now we have only defined the focal length $f$ as the distance between the image plane and the camera center, but
nothing was stated about the size of the image plane in $x$- and $y$-direction

In the end, only the ratio between the size of the image plane and the focal length is important, which is uniquely defined by the opening
angle $\Theta$. All configurations with the same opening angle result in the same image (only with scaled $x$- and $y$-coordinates).

In OpenGL, the size of the image plane is always chosen such that the resulting $x$- and $y$-coordinates are in the range $[-1; 1]$.

For a given opening angle the focal length is therefore obtained by (compare figure):

Transformation Matrices in OpenGL

For the projection from the camera coordinate system into the image plane the GL_PROJECTION matrix is used.

The manipulation of this matrix is activated by

glMatrixMode(GL_PROJECTION);

All functions for matrix manipulation, such as glLoadIdentity, glLoadMatrix, glMultMatrix, glRotate,
glScale, glTranslate, glPushMatrix, glPopMatrix, gluPerspective are then executed on the GL_PROJECTION matrix.

The current state of the GL_PROJECTION matrix influences the transformation of objects only if they are drawn (OpenGL as a state machine)

Example: "Dolly Zoom" or "Vertigo Effect"

The idea of the "Dolly Zoom" effect is to compensate a camera translation in $z$-direction ("Dolly")
by a change in focal length ("Zoom")

Mathematically it is easy to see from the projection equation that achieving the compensation is possible,
because for the $y$-coordinate of a projected point we have:

$\tilde{p}_y = f \frac{p_y}{-p_z}$

Since there is only one focal length $f$ but typically many
3D points with different depth value $p_z$ in the scene, the compensation can only be
achieved for a selected depth value. This creates an interesting perspective effect.

A well-known movie is Vertigo (1958)
by Alfred Hitchcock, who has used this effect to simulate dizziness of the protagonist

In OpenGL, all transformations, except for the projection matrix $\mathtt{A}$, are combined into a so-called GL_MODELVIEW matrix

Thus, the GL_MODELVIEW matrix directly describes the transformation from the respective local coordinate system to the camera coordinate system

The GL_PROJECTION matrix $\mathtt{A}$ describes the mapping from the camera coordinate system into the image plane

gluLookAt

To simplify the definition of the matrix $\mathtt{T}_{\mathrm{\small cam}}^{-1}$ there is the GLU function

gluLookAt(eyex, eyey, eyez, refx, refy, refz, upx, upy, upz);

By setting up an eye point $\mathbf{C}_{\mathrm{\small eye}}$, a targeted reference point $\mathbf{P}_{\mathrm{\small ref}}$,
and a vector $\mathbf{v}_{\mathrm{\small up}}$ (which defines the direction in which the $y$-coordinate of the camera is pointing)
the basis vectors of the camera coordinate system can be computed:

Z-Buffer

Depth Test

In the previous examples glEnable(GL_DEPTH_TEST)
and glClear(GL_DEPTH_BUFFER_BIT) were used without discussing their functionality

The function call glEnable(GL_DEPTH_TEST) is used to activated the depth test in OpenGL

If the depth test is disabled, the primitives are written into the framebuffer in the order in which they are passed into the OpenGL pipeline

This means that later drawn primitives are covering the ones drawn earlier

This is typically not the desired behavior

Instead, primitives that are closer to the camera should cover more distant ones, regardless of the order of drawing

Ideally, the decision for each drawn pixel should be done in the framebuffer, because the individual primitives can penetrate each other

In OpenGL the Z-Buffer method is employed

Z-Buffer Method

$x$

Normalized device coordinates

Camera coordinates

$\mathtt{A}$

$x$

$y$

$z$

$x$

$y$

$z$

Although actually the $z$-coordinate in the camera coordinate system is the one to consider, the depth test can be carried
out after the perspective division, since the depth relations are not changed

However, when using "Normalized device coordinates" the $z$-axis is reversed with respect to the camera coordinate system,
i.e., more distant points have a larger $z$ (note, the left-handed coordinate here)

For points on the near-plane in the camera coordinate system, now applies $\tilde{p}_z=-1$ and respectively for the ones on the far-plane $\tilde{p}_z=1$

Z-Buffer Method

The Z-Buffer method requires (in addition to the usual framebuffer which contains the color information) a depth buffer of the same dimensions, which contains the depth values

Framebuffer Depth Buffer

Z-Buffer Method

At the beginning of the rendering process, the depth buffer is initialized with the z-values of the far-plane.
This is done in OpenGL using the command glClear(GL_DEPTH_BUFFER_BIT)

Writing a pixel in the frame- and depth-buffer occurs during the per-fragment operations in the OpenGL pipeline

The depth value for each pixel is interpolated by the rasterizer using the transformed vertex information

If the depth value for the pixel is smaller than the currently stored one in the depth buffer the
color value is written into the framebuffer and the depth value into the depth buffer, otherwise both remain unchanged

FOR each primitiv
FOR each pixel of primitive at position (x,y) with colour c and depth d
IF d < depthbuffer(x,y)
framebuffer(x,y) = c
depthbuffer(x,y) = d
END IF
END FOR
END FOR

Z-Fighting

depth resolution

$z$

The depth buffer has only a certain accuracy. Typically an integer value with 16, 24 or 32 bits of precision

The interval [-1.0; 1.0] is mapped to [0.0, 1.0] and then to [0, MAX_INT], e.g., [0, 65535] for 16 bits

The value is rounded to the nearest integer

Because the "Normalized device coordinates" have already been divided by $p_w$ the rounding errors for objects
close to the camera are smaller (and consequently their depth accuracy is higher)

Therefore, at distant primitives that are close together sometimes the so-called
"Z-Fighting" can be observed, which is caused by random inaccuracies in the z-values where at times the one or the other primitive is shown.

To resolve Z-fighting, it is important to choose the near- and far-plane with care, since these ultimately define the z-range onto which the possible integer depth value are spread

Therefore, the near- and far-plane should be selected as close together as possible, such that they just enclose the depicted 3D scene