I'm implementing manual interpolation between texels in a 3D texture to be able to discard some of them when needed. Compared to hardware interpolation, this process takes a lot of time. If I simply leverage hardware interpolation using texture function, rendering takes ~5ms, but with manual fetch using texelFetch it rises to ~13ms (using texture to fetch 8 texels manually increases frame time even further - to ~35ms). The multiplication/summing part of the manual lerp does't matter much - I've tried fetching same texel manually 8 times and the performance is close to hardware interpolation. I thought fetching neighboring texels should give me good cache behavior, but maybe that only applies to 2D textures because neighboring layers of 3D tex are not so cache friendly? Don't know what to think here.. It would be nice if there was some way to use hardware interpolation but discard some texels from the process. The absence of a textureGather function taking a 3D texture also makes me think I'm missing something here.

Well, textureGather is already a slight bit hacky in that it forces you to just a single channel for the apparently sole reason that it can't return 4 vec4s API-wise. So I'd guess the lack of a textureGather for 3D textures might be simply due to the interface problems of returning 8 values instead of 4 from a function.
– Christian RauMay 13 at 21:33

If so, that can slow you down. What you can do instead is to pass the texture coordinates for each sampler call into your vertex shader from your CPU code. You'd set it up as if you were doing multi-texturing, but all the texture units would point to the same texture, just sampled at different coordinates.

Taking the 2D case, instead of passing in a single texture coordinate like this:

They are absolutely inderect, yes. Idea to move calculation to vertex shader is interesting. There is a problem though. I don’t simply add or substract 1 from tex coords, I use floor and ceil to find the bounding box of a coordinate in texel space, and because of that, I don’t see how that can be solved in a vertex shader.
– Павел МуратовMay 17 at 7:03

5

This answer is somewhat outdated/wrong. First, doing math on the texture coordinates before sampling is not a big source of performance issues. You might be thinking of dependent texture fetches, where the texture coordinate depends on the output of an earlier texture sample — that will serialize those samples, and because they have a long latency, it can greatly extend the latency of the whole shader. But that's not what we have here; all the sample coords just depend on a single shader input, not on other samples.
– Nathan ReedJul 16 at 1:35

2

Moreover, vertex interpolation isn't free either; GPUs have limited space for interpolated values (using too many may hurt occupancy), and depending on hardware some pixel shader instructions may be needed to perform the actual interpolation anyway. So it's not necessarily a win to move a bunch of values into interpolators vs using a small amount of math to calculate them directly in the pixel shader.
– Nathan ReedJul 16 at 1:38

It may depend on exact hardware and OS. My experience has been that removing any sort of indirect texture fetch, not just dependent, has generally helped performance, particularly on mobile hardware. But I haven't used every piece of hardware out there, so my suggestion was mainly meant to address what I've experienced in case it was helpful.
– user1118321Jul 16 at 1:39