Texture bind performance ?

We encountered a strange performance slowdown in our object renderer on NVIDIA hardware. I didn't have mesh sorting by material id implemented yet; I knew that this could be issue but I thought it could take 1ms top. But the actual slowdown was way bigger. Later I found out that it was indeed caused by glBindMultiTextureEXT. So I ran a couple of tests, first using one big texture (2048x2048) for all meshes, then adding the sorting by material id.

Objects in the test scene had about 400k tris, ~250 meshes and 42 textures
all textures had the same format, diffuse DXT1, normal 3DC/ATI2 etc.
Times for my test scene (the final object pass):

NV460drv 306.97,
306.94, 310.33

AMD 6850drv 12.10

without sorting

15.0

5.0

one texture

3.3

4.0

sort by material

6.8

4.3

Those numbers are pretty bad for NVIDIA. But the weird thing is that we have no such issue in the terrain renderer, which uses unique textures for every terrain tile, making about ~200 meshes/textures per frame, but the performance there is great. There is no difference in the render states, object renderer follows immediately after the terrain renderer without any changes to the render states. I have tried a lot of tests - a simple shader, non-DSA functions for texture binding, render without terrain etc., without luck - every time I started to change the textures per mesh I hit the wall. (I used two textures for this test and the time was almost 15 ms).

From what I remember of an Nvidia talk, changing the GL state will cause the GPU to be reprogrammed, and this introduces a bubble in the pipeline where it needs to wait for the previous work to finish before it can start on the new batch with the new state. In this case you're varying the number of times that you switch the texture state, so your results seem consistent with what they were saying.

Texture atlases or texture arrays are another way around this problem, if you run out of texture bindings.

From what I remember of an Nvidia talk, changing the GL state will cause the GPU to be reprogrammed, and this introduces a bubble in the pipeline where it needs to wait for the previous work to finish before it can start on the new batch with the new state. In this case you're varying the number of times that you switch the texture state, so your results seem consistent with what they were saying.

Texture atlases or texture arrays are another way around this problem, if you run out of texture bindings.

Thanks for you answer but it doesn't explain why this slowdown happens only in our object renderer. Terrain is much worse case and it runs with the same speed with one 2048 texture. I should try to isolate this into simple app sometime...but my workaround seems to be faster than simple material ordering, i'm just curious...

How big are your terrain meshes? It seems like your object meshes are ~1600 tris on average, given the stats you provided. Are the terrain meshes larger? Perhaps the terrain meshes keep the GPU busy enough to minimize the impact of changing textures. Are you accessing all of the textures in the fragment shader, or does the vertex shader access some?