Know Your Texture Filters!

Over at the forums gamefish and BurningHand got a little worried about the supposedly bad performance of TextureAtlas. They did some benchmarking on G1/G2/Hero level hardware and were amazed to find that the benchmark could only achieve 6 frames per second when using TextureAtlas but a nice and niffty 60 frames per second using a Texture directly. Nate and me proclaimed strongly that there’s no difference in the renderpath when using a TextureAtlas or a Texture. They are indeed exactly the same. Sadly i could only find the time to figure out what’s wrong an hour ago and i wanted to share my findings.

To understand what’s going on we need two things: a test case and knowledge about how texturing works. Let’s start with the former.

gamefish provided us with a TextureAtlas pack file and a nice little benchmark. Here’s the original code:

Pretty straight forward. It just creates a SpriteBatch for rendering, loads a texture atlas and creates a sprite based on one of the entries in the texture atlas. The font is used to output the current frames per second. The render() method is no surprise either. We just clear the screen, tell the SpriteBatch that we want to render stuff, render the sprite and the fps text and be done with it.

Next question: what’s the size of the sprite and what does the texture atlas look like? Here’s the texture page of the atlas:

Some fine pixel art. Here’s the pack file of the texture atlas describing the different regions:

We load the “map” region to our Sprite instance. This means that our sprite will actually map to the background image in the texture atlas. Since we don’t specify a size for the sprite it will take on the size of the texture region in pixels which is 855×480. Note that there’s no need for the sprite’s dimensions and position to be given in pixels. We could set the width and height of the sprite to 4×2 and it would still map to that 855×480 pixel region in the texture atlas.

Now, the issues were reported on G1 level hardware with a 480×320 pixel screen. Luckily i have my trusty Hero which has that exact hardware configuration (MSM7500, 480×320). So when we render the sprite as defined in the code above we actually only see the lower 480×320 pixels of the background image. The SpriteBatch will setup an orthogonal projection so that its view will span from (0,0) to (Gdx.graphics.getWidth(),Gdx.graphics.getHeight()) in our world. On the Hero we thus view the coordinates (0,0)-(480,320), on a Nexus One we’d view (0,0)-(800,480), on a Droid we’d view (0,0)-(855,480) and so on. The output on the Hero looks as follows:

Omg, 6 FPS, libgdx sucks! TextureAtlas sucks! Or does it?

Let’s think about what happens here. We have a pixel perfect projection. That means each texel (pixel from the texture) corresponds to exactly one pixel on screen. Even better we do not even see the whole image on screen, so that should be fast right? All the silly Hero has to do is scan over 480×320 pixels and directly draw them to the screen. Silly Hero!

Well, actually it’s not that simple. If you look back at the original texture you’ll see that it’s actually 1024×1024. Even though the Hero only needs to fetch the texels for the lower 480×320 pixels it actually has to do more. Texels are laid out in a linear block of memory. Texels (0,0) to (479,0) are at address 0 to 479, pixel (0,1) however starts at address 1024! When the texture unit of the GPU fetches that texel after texel (480,0) it has to “jump” over 544 texels (not literally of course, just imagine it actually has to). That will mess with any caches and in this case it totally destroys performance. We could fix that by making the background image its own smaller texture. However, given it’s size (855×480) we’d have to create a 1024×512 texture as we need to have power of two dimensions. So that’s not really a solution.

There’s a second more subtle problem though that kills our performance even more. I just said the texture unit only has to fetch 480×320 texels as that’s the amount that’s visible on screen on the Hero. Look at the title of this post, then look at the pack file. Can you spot the problem?

filter: Linear,Linear

This is our problem. That line in the pack file specifies the minification and magnification texture filters to be used by the texture atlas texture. It’s an OpenGL thing and works like this.

If we draw a triangle/quad/sprite and the pixel to texel ratio is bigger than 1 the texture is magnified. Say you have a 16×16 texture and your quad takes up 32×32 pixels on screen -> bam, magnification

If we draw a triangle/quad/sprite and the pixel to texel ratio is smaller than 1 the texture is minified. Say you have a 32×32 texture and your quad takes up 16×16 pixels on screen -> bam, minification

There’s one filter for minification and one for magnification. In the pack file the first one is the min-filter, the second one is the mag-filter. So what does linear mean? Here’s the two big options we have:

GL_NEAREST, fetch the nearest texel that best map to the pixel on screen.

GL_LINEAR, fetch four nearest texels that best map to the pixel on screen.

So, you might wonder why that has any effect since we are actually having a 1:1 pixel to texel ratio. Well, OpenGL will use the magnification filter in this case. Yes, it’s silly, but that’s life. So even though we are rendering pixel perfect OpenGL will fetch 4 texels per pixel, quadrupling the memory bandwidth needed to render our fullscreen quad.

Let’s change the magnification filter in the pack file:

filter: Linear, Nearest

Here’s the output of our little program with the magnification filter set to Nearest (fetch on texel per pixel in case of pixel perfect rendering):

Omg, libgdx rocks! TextureAtlas rocks! The Hero rocks! We are now really only sampling 480×320 pixels instead of four times that amount. We even have blending enabled (SpriteBatch enables it automatically). That’s great. We are done and can finally go to bed after being awake for 37 hours. Or can we?

Nope. You see, i’m pretty sure the original author of that test wants the complete background to be shown on the Hero as well, not just on the 855×480 pixel screen of the Droid. To make that happen we have to tell the SpriteBatch that we want to see 855×480 units of our world (which maps directly to the texels of our background image). To make this happen we just add this line after the creation of the SpriteBatch:

This will make SpriteBatch show the region (0,0)-(855,480) of our world. That means that we will see the full sprite, not just it’s lower left half, even on the Hero. Let’s check the output:

Two things we learn from this: OMG LIBGDX SUCKS AGAIN! And, it’s not a good idea to scale text. Ever. Always render it pixel perfect or use an extremely big font size. Otherwise you’ll get that mess. Squeezing our eyes a little we can see that we are back at 6fps again. Of course, we sample 480x320x4 pixels again as our minification filter is still set to linear (we map the 855×480 texels to 480×320 pixels on screen -> bam minification). So let’s change the minification filter to linear as well:

filter: Nearest,Nearest

And we are back up to 60 FPS:

Super awesome sauce! However, when you look at the smooth rolling hills you’ll see that they are not all that smooth anymore. We see nasty edges and steps. GL_LINEAR smooths those out, GL_NEAREST presents them in their full glory. For some games that pixelated look is totally OK and we can be happy with our solution. For other games we’d like to have the smoothed out experience. But we can’t use GL_LINEAR cause that will kill our performance, right? Well, we could downsample our texture to a more fitting size for the 480×320 screen, say down to 512×512. The texture unit wouldn’t need to sample that many pixels anymore and we could use GL_LINEAR at full speed. But that would mean we’d have to have multiple versions of our assets for different resolutions. Now that sucks

Right, that’s why we use mip mapping. What is mip mapping? Let me show you a nice little picture illustrating a so called mip mapping chain for our 1024×1024 source image above:

What happens here? As you can see i downsample the original image to a fourth of its size (1024×1024 to 512×512). I apply that to all images in the chain until i reach a 1×1 image. That makes up our mip mapping chain (also called pyramid sometimes, depends on how you present it to your readers).

“OK dude, but what does this help? I mean sure, i can generate those images at runtime and all, but it’d still have to care for what image is used on what device, right?”. Nope, OpenGL supports us a lot here. And it does so by the mechanism called mip mapping. Here’s how it works:

determine the pixel to texel ratio and select the best fitting image from the mip map chain to fetch the texels from

That’s it :). All we have to do is send OpenGL not only our original image data but also the other images of our mip mapping chain. Libgdx does that automatically for you when you specifiy one of the TextureFilter.MipMapXXX enum values as the minification filter in Graphics.newTexture()/newUnmanagedTexture(). OpenGL will then use whatever is fitting for the current pixel to texel ratio automatically if we set the minification filter to a mip map filter. And yes there’s more than one.

GL_NEAREST_MIPMAP_NEAREST, this will fetch the best fitting image from the mip map chain based on the pixel/texel ratio and then sample the texels with a nearest filter.

GL_LINEAR_MIPMAP_NEAREST, this will fetch the best fitting image from the mip map chain based on the pixel/texel ratio and then sample the texels with a linear filter. Ya, that’s no type you specify the mip map filter last (NEAREST here) and the actual texel filter first

GL_NEAREST_MIPMAP_LINEAR, this will fetch the two best fitting images from the mip map chain and then sample the nearest texel from each of the two images, combining them to the final output pixel

GL_LINEAR_MIPMAP_LINEAR, this will fetch the two best fitting images from the mip map chain and then sample the four nearest texels from each of the two images, combining them to the final output pixel

As you can see, the list goes from cheapest to fanciest. My recommendation: use GL_LINEAR_MIPMAP_NEAREST, it gives the best results on all current hardware i know of. The corresponding TextureFilter is TextureFilter.MipMapLinearNearest. Let’s specify this in the pack file and see the effect:

filter: MipMapLinearNearest,Nearest

Output:

Cool. So that will use the 512×512 image of the mip map chain on the Hero. This also means that the background will have 423×240 pixels. Hrm, that’s slightly less then the 480×320 pixels of the screen of the Hero. So we actually upscale that a little. Depending on your assets that might not be ideal.

If you need pixel perfect rendering on all screen resolutions there’s no other way then storing multiple versions of your assest specifically designed for each resolution. In almost all cases you can get away with either using a highres texture and downscaling it for devices like the Hero and optionally using mipmapping if you want things to be less pixelated (albeit a little bit upscaled).

Chose your poison. If you read this far then you get the medal of honor of today. If you didn’t then i won’t answer any “performance issue” questions from you related to filters on the forums. Deal with it

Btw, same thing on the Droid gets 40fps max. Hero >> Droid! Nah, that’s a different issue. The Droid (and the Nexus One and many other second gen devices) are heavily fillrate limited. They will drop immediatly to 40-45ish FPS if you draw a single fullscreen quad. However, they’ll stay at that framerate for a long time if you add additional objects to the scene. Funny eh?

In the line ‘(we map the 855×480 texels to 480×320 pixels on screen -> bam minification). So let’s change the minification filter to linear as well:’ It should be Nearest filter instead of Linear filter in the line ‘So let’s change the minification filter to linear as well’