Thursday, February 24, 2011

Many moons ago at Insomniac, we used to use a partial derivative scheme to encode normals into texture maps. Artists complained that it was having detrimental effects on their normal maps, and rightly so. Then one day, our resident math guru Mike Day turned me onto this little trick in order to make amends. I hadn’t come across it before so thought it might be worth sharing.

A common technique for storing normals in a texture map is to compact them down to a 2 component xy pair and reconstruct z in the pixel shader. This is commonly done in conjunction with DXT5 compression since you can store one component in the DXT rgb channels and the other in the DXT alpha channel, and they won’t cross pollute each other during DXT compression stages. DirectX 10 and 11 go a step further and introduce a completely new texture format, BC5, which caters for storing 2 components in a texture map explicitly.

The code to reconstruct a 3 component normal from 2 components typically looks like this:

Graphically, what we’re doing is projecting points on the xy plane straight up and using the height of intersection with a unit hemisphere as our z value. Since the x, y, and z components all live on the surface of the unit sphere, no normalization is required.

Is this doing a good job of encoding our normals? Ultimately the answer depends on the source data we’re trying to encode, but let’s assume that we make use of the entire range of directions over the hemisphere with good proportion of those nearing the horizon. In the previous plot, the quads on the surface of the sphere are proportional to the solid angle covered by each texel in the normal map. As our normals near the horizon, we can see these quads are getting larger and larger, meaning that we are losing more and more precision as we try to encode them. This can be illustrated in 3D by plotting the ratio of the differential solid angle over the corresponding 2D area we’d get from projecting it straight down onto the xy plane. Or equivalently, since the graph is symmetrical around the z-axis, we can take a cross section and show it on a 2D plot.

The vertical asymptotes at -1 and 1 (or 0 and 1 in uv space if you prefer) indicate that as we near the borders of our normal map texture, our precision gets worse and worse until eventually we have none. Your mileage may vary, but in our case, we tended to use wide ranges of directions in our normal maps and would like to trade off some of the precision in the mid-regions for some extra precision at the edges. Precision in the center is the most important so let’s try and keep that intact. What options do we have? Well it turns out we can improve the precision distribution to better meet our needs with a couple of simple changes…

Nothing is constraining us to using a hemisphere to generate our new z component. By experimenting with different functions which map xy to z, better options become apparent. One function we have had success with is that of a simplified inverted elliptic paraboloid (that’s a mouthful, but is more simple than it sounds).

z = 1 – (x*x + y*y)

This is essentially a paraboloid with the a and b terms both set to 1, and inverted by subtracting the result from 1. Here is a graphical view:

Notice that we don’t have the large quads at the horizon anymore and overall, the distribution looks a lot more evenly spread over the surface. Wait a second though… this doesn’t give us a normalized back vector anymore! Fortunately the math to reconstruct the normalized version is roughly the same as before in terms of cost. We need to do a normalize operation but save on the square root.

Below, the overlaid green plot shows the angle to texture area ratio of the new scheme. The precision is still highest in the center where we want it, and we’re trading off some precision in the mid-ranges for the ability to better store normals near the horizon.

Looking at the 3D graphs, another observation is that we’re wasting chunks of valid ranges in our encodings. In fact, a quick calculation shows that ~14% of the possible xy combinations are going to waste. A straightforward extension to the scheme would be to use a function which covers the entire [-1, 1] range in x and y to make use of this lost space. There are an infinite amount of such functions, but a particularly simple one is this sort of dual inverted paraboloid shape.

z = (1-x^2)(1-y^2)

which looks like:

One downside of this is that we need a little more math in the shaders in order to reconstruct the normal. Because of this, on current generation consoles, we went with the basic inverted paraboloid encoding. Going forward though, with the ratio of ALU ops to memory accesses still getting higher and higher, the latter scheme might make more sense to squeeze a little extra precision out.

Finally, here is the code used when generating the new normal map texels (the initial circular version not the second square version). The input is the normal you want to encode and the output is the 2D xy which is written into the final normal map.