Vertex shader transformations

Let's recapitulate what the vertex shader does: It gets a vertex in model space, it outputs a vertex in projected eye space and optionally allows to attach varying data types to the vertices which are then interpolated across triangles and given to the fragment shader for the fragment position.

We can utilize the vertex shader anywhere along this chain for:

optimizing performance by doing expensive computations for vertices only when possible (and triangles are large)

changing or distorting the mesh

transforming parts of the computation to a special coordinate system

culling parts of the mesh based on a condition

Let's go through examples for some of these!

Mesh distortions

Sometimes, you may want to animate a mesh in a way that can't be put into transformation matrices. Consider a wing beat of a bird (or a dragon) for instance - it's a complicated motion moving not only the wings up and down, but also rolling from the leading wing edge to the trailing edge - definitely not captured by a simple rotation around the wing attachment point. For an example, watch the dragon beat her wings and then sweep them backward to dive down into Grand Canyon:

(video capture by Wayne Bragg)

We can achieve such distortions in the vertex shader, This is most conveniently done in model coordinates. If we arrange the model such that the head points into the y-direction and the body axis is at x=0, we know that the wings stretch into the positive and negative x direction, If we require y=0 in the center of the wing, we further know that the difference between leading and traling wing edge is along the positive and negative y direction.

Based on this knowledge, we can define a vertical mesh displacement function which is non-linear - in the case of the dragon, it's in fact linear up to an arm reach and beyond that parabolic in the x-direction, and there's a forward displacement of the whole mesh growing with x for the upward stroke and a backward displacement for the downward stroke.

The whole motion is driven by periodic (but not quite sinusoidal) wingbeat parameters wingflex_alpha and a slightly lagging wingflex_trailing_alpha. These are engineered such that there's a time difference between upward and downward stroke. These parameters appear as scale factors in the displacement function.

In addition, the wing sweep is controlled by a uniform wingsweep_factor - which inhibits the flapping motion and induces a different kind of displacement function in which the mesh is distorted inward-backward.

The relevant parts of the vertex shader responsible for the mesh transformations look like this:

// if the wings are folded, we sweep them back
vertex.y += 0.5 * x_factor * wingsweep_factor;

float sweep_x = 0.5;
if (vertex.x > 0.0) {sweep_x = - 0.5;}

vertex.x+= sweep_x * (1.0 + 0.5 *x_factor) * wingsweep_factor;

gl_Position = gl_ModelViewProjectionMatrix * vertex;

By arranging the original model coordinates in a suitable way and clever design of a displacement function, quite compelling visuals can be achieved without having to animate tens of different submodels separately.

Rotation matrices

Imagine you have a model which you want to render with correct attitude with respect to an environment scene (for instance, you're interested in the environment reflecting in glossy parts of your model). Model coordinates are unsuitable for the task as they don't have any information on how your model has moved. Eye coordinates are also not particularly useful, because it's not particularly straightforward to see how the environment is oriented in eye space (in particular, finding the skyward up direction in eye space isn't trivial).

In fact, it'd be best to do the computation in local horizon coordinates in which the z-direction is up, x and y can point north and east, and we just rotate the model into the correct attitude within this coordinate system.

Such attitude rotations are described by a rotation matrix. Unfortunately, they're not unique in terms of conventions, so you need to pick the right one. In flight simulation context, pitch, yaw and roll are usually used to describe the state of the vehicle, so we're interested in the rotation matrix for Tait-Bryan angles.

What makes rotations yet more awkward is that the order in which they're carried out matters. In particular, the combination of pitch, yaw and roll means: Assume you have an object pointing towards north with the nose on the horizon in upright attitude. Now rotate along the object's z-axis ('up') with the yaw angle to get to the correct heading. Then rotate along the object's y-axis (used to be east-west before the yaw rotation, for an aircraft it's the axis along which the wings stretch) up or downward to bring the nose to a certain pitch. Then rotate along the object's nose to tail axis for the roll angle.

Whenever the axis of the later rotations changes with the previous rotations, any other order will not do and the precise order is important. The exception here is the yaw rotation which is always carried out in the z-axis perpendicular to the horizon plane.

We can construct the appropriate rotation matrix from pitch, yaw and roll passed as uniforms inside the vertex shader. First, let's define two matrix-generating functions as helpers:

In a similar way, any rotation matrix corresponding to any available angle convention can be constructed and used to move a mesh into the desired orientation by.

vec4 targetCoord = finalMatrix * gl_Vertex;

Object positioning

The way we have discussed the rendering pipeline so far, objects are correctly positioned in the scene by the translation part of the transformation matrix from model to eye space. That however may not be the most efficient way to accomplish the task,

Imagine we want to render a forest with O(10.000) trees. We have just a few basic types of trees, and we want to repeat them over and over, just possibly scaling their size and rotating them to make the repetition less obvious.

gl_ModelViewProjectionMatrix is technically a uniform mat4 - which means it's constant for every draw the GPU does. Since all trees have different position and potentially size, it means we need a separate draw for every tree. As we will discuss later, that's very inefficient, because a GPU loses a sizable chunk of time to change from one draw to the next. Thus, we can't use this solution.

We could pre-compute all the translations and assign them to the trees in model space already, then all trees could be processed in a single draw and all gl_ModelViewProjectionMatrix would do is to change from model to eye space without the scalings and translations. That works fast, but is not particularly memory efficient. Suppose a single tree model has 100 vertices, 10.000 trees then have a million vertices.

There's a better way to do it - we just make one copy of a tree in model space (which means we need to store 100 vertices once in memory and just pass pointers to this otherwise) and append the position data as vertex attribute (again, we only need one position offset for the whole tree, so we need to hold the translation and scale vectors once and just reference this data for the rest of the mesh). Then the vertex shader is required to correctly place and scale the object.

We end up with a special-purpose shader of course - all this shader will ever do is render trees. But that means we can use existing vertex attributes - we can for instance hard-code the color of the mesh into the shader and use gl_Color to hold position data. As argued earlier, if branches are represented by texture sheets, there's no real meaning to a normal of the tree either - so we may use gl_Normal for scaling.

Culling

For applications like LOD systems, it's sometimes useful to drop objects from further processing early on. Taking the tree example above, we may for instance want to drop half of the trees beyond some distance to improve rendering performance.

In the vertex shader, this can be done pretty efficiently by moving a vertex out of the view frustrum and performing no other operations - it will then be dropped by the rasterizer.

Say we have a uniform detail_range and have already computed the distance to a tree mesh (and, as above, the actual tree position is contained in gl_Color.xyz). We can then for example key on the fractional part of one of the position components to divide the trees into two groups (those which have an fractional x-component of less than 0.5 and the others - visually they're hardly different as a tree is much larger than half a coordinate meter) and then just process one of the groups beyond a distance - the relevant structure in the vertex shader looks like: