When to pack textures together?

Recommended Posts

Ive noticed in alot of recent games that textures are packed together, presumably to save space and for performance.

For example Albedo in RGB and Roughness in Alpha, or Normal map with only RG and Roughness and metalness in BA. and so on.

But some games just use a separate Albedo, roughness, specular , ao , emissive, and mask texture. meaning a lot of textures per model..

In the recent Titanfall2, there were 6 or 7 different texture maps loaded for each of the head/body/ legs /arms of the character. That's alot of textures!

Could someone explain to me the benefits of packing them together and also the downsides of doing so? as this is clearly not being adopted by every studio.

Also performance wise, what would be the over head of having many different sampler reads? surely packing would be beneficial here as you would only need to load a couple of textures? or would this have very little impact at the shader stage?

My initial guess would be that compression / quality factors into the decision to pack?

0

Share this post

Link to post

Share on other sites

In my engine, we author all of our assets as independent maps, but the shader code contains instructions for how they should be packed. The data compiler will automatically merge together as many independent maps into "packed" maps as required. This also lets us very quickly experiment with different packing schemes without the art team even being aware.

Also performance wise, what would be the over head of having many different sampler reads? surely packing would be beneficial here as you would only need to load a couple of textures? or would this have very little impact at the shader stage?

AMD specific advice: The overhead will be small, so may have no perf impact on some shaders, but may also have a noticeable impact on other stages. AMD GPU's use a bindless resource model, which basically means that all texture bindings are stored inside cbuffers, and cbuffers are backed by RAM just like vertex buffers. Each texture-fetch must first then load the texture "descriptor" (a small struct containing pointer, format, size, etc) from a buffer, and then issue the actual texture-fetch instruction. The latency on a fetch could be as high as 1000 cycles, but perhaps the GPU is able to keep 10 thread-groups in flight at once, bringing observable latency down to 100 cycles, and if you exploit cache coherency then you'll reduce that time another order of magnitude... CBuffer loads are almost always cache coherent, and texture loads are quite often coherent in normal situations (unless using random texture coordinates, aren't using mip-mapping, or otherwise are badly sampling high-frequency data).
If your shader ends up in a situation where a texture fetch does have a latency of 100 cycles, then you basically get 100 cycles worth of free ALU computations... and if you don't have 100 cycles worth of math to do in the meantime, then you end up sitting around idle waiting for that fetch to arrive. In these situations, the "small overheads" of having extra textures isn't going to impact you, because your shader isn't at max performance anyway...
It's mostly when you've already optimized absolutely everything else that you'll actually be able to notice this overhead.

My initial guess would be that compression / quality factors into the decision to pack?

Yep.
Normal maps are 3-channel, so BC1 makes sense... however, BC1 interpolates between entires in an RGB palette, which means there's interference between channels when compressing -- e.g. a strong signal in R will leak over into G. Early on, people came up with a two-channel normal map scheme (often called DXT5_NM) where you'd use BC3 and place your X data in G, and Y data in A (and leave R/B blank). This gave better quality as with BC3, RGB and A are compressed independently, so there's no interference.
This led to the development of BC4/5, which are basically just one/two BC3 alpha channels, which allows you to use the high compression quality two-channel normal map trick, without wasting any bits on those empty R/B channels.

When using the BC3 trick, you could store extra data (roughness/etc) in the R/B channels, but it would affect the compression quality of your normals, which gives the developer a size/quality trade-off.

Likewise if you've got three completely independent maps -- e.g. roughness, specular, AO -- then you could put them into a BC1, but the compression will cause cross-talk between them all, so signals from one channel may show up as slight artifacts in the other channels... but it does get you a 6:1 compression ratio.
Alternatively, you could store these as three BC4 textures, which gives much improved quality with no cross-talk between the maps, but only a 2:1 compression ratio.

Share this post

Link to post

Share on other sites

Packing textures into into one is a way of reducing size and clutter, the downside is that it limits the compression that can be used.

Some engines have restrictions on how textures can be packed, you can get around this with custom shaders.

A single image has four channels allowing you to store four gray scale images into one.

The reason why some engines don't bother with heavy packing of textures is because the strict compression that can be used in single textures, without detail loss, saves more space than packing textures.

For this reason engines like Unreal opt for packing of similar textures only. A example is the Metal, Gloss, AO and Emissive are all gray scale and can be packed into one texture and then compressed. In Unreal Albedo and Normal maps should have there own textures, so you have three textures per model.