Hi. I'm not 'new' to opengl nor programming but I've only recently started work on independent projects outside of messing around in textbooks with fixed pipeline and programmable pipeline projects, glsl, etc. Basically the basics are basic to me ;)

However, I've only started experimenting with the texture filter modes. On a heightmap terrain I had the mag filter as GL_LINEAR and the min filter set as the same, and my app was crushingly slow. Now I set the min filter as GL_NEAREST_MIPMAP_LINEAR and the performance shot through the roof!

I was under the false assumption that using mipmaps would slow things down somewhat. So what gives?

Mipmapping selects the mipmap level where the size of the texels matches the size of the pixels the best. This is good for the cache where rasterinzing adjacent pixels also means sampling adjacent texels. If, instead, the much larger base-level texture is selected (which is effectively what happens when you have a non-mipmapping filter), then the texels are much smaller than the pixels on the screen and sampling the texture for adjacent pixels means sampling very distant texels; this is very bad for the cache.

Mipmapping, especially full linear filtering on all dimensions, is computationally more expensive. But that is only a problem if you are computationally saturated. In your case, you are probably saturated by the memory transfer and so reducing that and increasing the computational complexity is an overall win.

Mip-maps were designed at least partially for performance reasons. When a texture is filtered, pixels are sampled from an entire region of the texture. How large this region is depends upon the fragment. Say you have a quad mapped with a texture. If the quad is drawn very near the camera such that there is a 1:1 relationship between screen pixels and quad texels, then for each pixel that is drawn, one texel is sampled. Now if the same quad is drawn far from the camera so that it appears the size of one pixel on screen, then when it is drawn, each pixel samples and filters every texel in the texture. For large textures, this can be a seriously expensive pixel. Add many objects like that in the background, and you can lose framerate in a hurry.

Mip-mapping reduces the resolution so that when this quad is drawn far away, instead of sampling from the top-level mip map, it samples from a vastly reduced resolution version appropriate for the on-screen pixel size of the object. A pre-filtered version, you could say. It is conceivable that if you build mipmaps to a sufficient depth that the mipmap that is sampled in this case will be 1x1 pixels, so that the single pixel of that distant quad only samples a single texel, rather than 1024x1024 texels (or however large the original texture may be). By anybody's reckoning, sampling a single texel is going to be much faster than sampling and blending 1024x1024 texels.

Brother Bob. That is a very informative answer! I have always thought about the cache with regards to Data-Oriented programming and trying to 'sort' virtual function calls in arrays but that's the first time I ever thought about memory look ups on texels.

JTippetts, thank you also for your explanation. There can never be too many points of view of the same problem, even if they are similar, as the different wording of each answer can convey different aspects with more clarity.

In general, memory bandwidth is improving at a slower rate than processor arithmetic speeds are improving.

In relative terms, over the years, the amount of memory that we can transfer in the same time as one CPU clock-cycle has actually decreased!

All the same issues apply to GPUs -- bandwidth often becomes the bottleneck instead of ALU cycles. Using smaller data formats can often give huge speed boosts -- as above, mip-mapping is kinda-sorta a form of data compression for the case where you're viewing a texture at a distance

It's also interesting to look at the specs of cheap vs expensive models of video cards -- often one of the things that separates them is an order of magnitude difference in their memory bandwidth! e.g. picking some nVidia cards:

GeForce 205 -> 8 GiB/s

GeForce GTX 285 -> 159.0 GiB/s

Your high-end users can process a hell of a lot more data per frame than your low-end users can, which is why the low-end cards will have to use lower-resolution render-targets and textures, lower vertex counts, etc, etc...

CDProp, this might not be EXACTLY what you're looking for but Coursera.org has a great course on CUDA and GPU, you learn a lot about bandwidth issues like Hodgman has been mentioning. His input made me think about it actually.

Hodgman, do you know of a good resource for learning more about the hardware architecture of video cards as it relates to how data flows through the rendering pipeline?

The manuals and documentation that come with the console dev-kits are extremely useful, but aren't publicly available

nVidia and ATI occasionally publish programming guides for different generations of hardware... even the Cuda/etc ones, or presentations about certain hardware families (e.g. "fermi") might give some insight into how the hardware works.