AMD glLinkProgram Performance Tips?

Hi,

I am having trouble with glLinkProgram on AMD drivers (both Windows and Linux). The compilation time is absolutely absurd, and is killing my game engine. I am seeing 5 to 10 seconds for single shader compiles, where on NV drivers it is immeasurable. The shaders in question are generated by the engine and are quite math-heavy.

Does anyone have general tips / advice for speeding up shader compilation on AMD? For example, should I try hand-unrolling heavily nested function calls, perhaps hand-unrolling, loops, etc? Given my lack of knowledge about shader compilers, I don't really know how to proceed with making my shaders more compiler-friendly.

Hmm, I don't know about the GLSL compilers, but those for C/C++ can sometimes run into performance issues with very large functions (blocks really) that have many variables, due to the use of algorithms that are quadratic in the number of instructions or variables for example.

Haven't used AMD cards in a while, but 5-10 secs sounds outrageously long to me. Are you using a debug context? You could try shader binaries (if your hardware supports them) that you cache on disk, that way you only pay the link time penalty once.

Hmm, I don't know about the GLSL compilers, but those for C/C++ can sometimes run into performance issues with very large functions (blocks really) that have many variables, due to the use of algorithms that are quadratic in the number of instructions or variables for example.

Haven't used AMD cards in a while, but 5-10 secs sounds outrageously long to me. Are you using a debug context? You could try shader binaries (if your hardware supports them) that you cache on disk, that way you only pay the link time penalty once.

Thanks very much Carsten, I was completely unaware of the GL binary facilities! Wish I had known about these sooner That will certainly help. Still open to compilation insights if anyone has them, in the mean time I am sure binaries will lift a lot of the load.

And no, it's not a debug context. It's pretty terrible because, as I said, on NV drivers it's virtually instant...ouch, come on now AMD...

Sometimes drivers trick you, as even when the compilation looks virtually instant, it could be because the driver just transmitted the actual compilation job to a separate thread and thus won't block your code to continue until the time when you actually try to use the shader (thus the latency didn't disappear, but just got delayed).

So, at first, I would make sure you measure compilation time properly. In order to do so, do the following:
1. Compile your shaders
2. Render some simple primitive using the shaders (e.g. a point)
3. Use glReadPixels or other mechanism to make sure the rendering actually happened and not delayed as well
4. Measure the time of all the 3 steps, it will give you a better estimate on how much time the compilation actually required.

Disclaimer: This is my personal profile. Whatever I write here is my personal opinion and none of my statements or speculations are anyhow related to my employer and as such should not be treated as accurate or valid and in no case should those be considered to represent the opinions of my employer.
Technical Blog: http://www.rastergrid.com/blog/

Sometimes drivers trick you, as even when the compilation looks virtually instant, it could be because the driver just transmitted the actual compilation job to a separate thread and thus won't block your code to continue until the time when you actually try to use the shader (thus the latency didn't disappear, but just got delayed).

So, at first, I would make sure you measure compilation time properly. In order to do so, do the following:
1. Compile your shaders
2. Render some simple primitive using the shaders (e.g. a point)
3. Use glReadPixels or other mechanism to make sure the rendering actually happened and not delayed as well
4. Measure the time of all the 3 steps, it will give you a better estimate on how much time the compilation actually required.

Thanks aqnuep, but the shaders are used immediately to generate geometry, so I am quite sure of the compilation time. On NV they are compiled and able to start displaying the geometry with virtually no delay, so I do think it's actually the AMD compiler But it is very surprising to me that the difference is so dramatic...

...the shaders are used immediately to generate geometry, so I am quite sure of the compilation time. On NV they are compiled and able to start displaying the geometry with virtually no delay

Unless you are nuking the NV-driver-internal on-disk precompiled GL shader cache before doing this test, don't be so sure.

If you've run with that shader before, it's probably just loading a precompiled version off-disk (or more likely, from a memory cache of that on-disk data thanks to the OS caching of disk accesses, so it's blindingly fast), not actually compiling it on-the-fly. There are precompiled caches for OpenCL/CUDA kernels as well.

On Linux, the default paths for these caches are: $HOME/.nv/GLCache and $HOME/.nv/ComputeCache, respectively.

On Windows, %APPDATA%\NVIDIA\GLCache and %APPDATA%\NVIDIA\ComputeCache, respectively.