If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

This commit only concerns r600 gpu, not radeonsi. I'm still working on scheduling though and this commit isn't likely to give the best from the gpu at the moment, however it makes performance more consistent (by gathering fetch instructions and thus using built-in hardware fetch latency hiding). In fact it fixes some situation where llvm generated code performance was worse than the ones of classic generated code.

This commit only concerns r600 gpu, not radeonsi. I'm still working on scheduling though and this commit isn't likely to give the best from the gpu at the moment, however it makes performance more consistent (by gathering fetch instructions and thus using built-in hardware fetch latency hiding). In fact it fixes some situation where llvm generated code performance was worse than the ones of classic generated code.

If I understood the mailing list conversations correctly, Vadim's shader optimization work is currently only useful for the TGSI back-end, not the LLVM back-end... Although there's chances to take the lessons learned to guide future LLVM back-end developments.

If I understood the mailing list conversations correctly, Vadim's shader optimization work is currently only useful for the TGSI back-end, not the LLVM back-end... Although there's chances to take the lessons learned to guide future LLVM back-end developments.

Yes, that is exactly what I meant. To use ideas from it, not code porting.

If I understood the mailing list conversations correctly, Vadim's shader optimization work is currently only useful for the TGSI back-end, not the LLVM back-end... Although there's chances to take the lessons learned to guide future LLVM back-end developments.

sb600 actually postprocess the bytecode emitted by either tgsi or llvm so it can be used with both backend. However sb600 applies optimisations also applied internally by llvm so it makes more sense to use it with tgsi.

Originally Posted by Drago

Yes, that is exactly what I meant. To use ideas from it, not code porting.

Indeed I compared code emitted by sb600 to find where llvm is not optimal when it comes to scheduling, but most of the design idea came with discussions on irc with vadimg. For instance future patches will bring bottom-up scheduling instead of top-bottom scheduling because work on sb600 showed it to be better in most situation. Vadimg has also better knowledge of the hw than I do and I figured out some constraints llvm puts on instructions that were not mandatory (on dot4 instructions mainly) ; his experience is really helpful.

Is there any hope that LLVM back-end will emit that optimal code as r600-sb?

Hopefully. There is really nothing that prevent LLVM to reach the quality of r600-sb wrt code quality, there is no real limitation on LLVM internal representation of instructions or control flow graph. And if there were, we can improve llvm core if necessary.
Currently LLVM has performance right between pure tgsi and sb600 in Unigine : LLVM backend is 20-30% faster than tgsi, and sb600 backend is 20-30% faster than LLVM. I suspect this is because LLVM doesn't convert if clause as much as sb600 does,
there is a LLVM pass that we could use but it would require some work to adapt it to R600 need.

On the other hand LLVM is tailored to cpu needs which means there are a lot of feature we don't use, like debug symbol (afaik gpu step-by-step debugging a la gdb is not possible, I don't think anyone can keep track of hundreds of threads at once...) or linking (glsl programs are self contained). This does not affect generated code but rather the compiler performance : if you ever tried llvm, you probably noticed some "freeze" when a shader is loaded, either when a new character/weapon is loaded in nexuiz, or at some place in Unigine Heaven benchmark. This is because shader compilation, which is almost immediate with pure tgsi code and sb600. It may be possible to slightly improve this but I don't think we can beat sb600 on this.
However in the long time the llvm backend is probably the only way to support opencl ; tgsi still lacks well tested representation of indirect addressing, functions, and pointers (for shared memory). Sb600 can be executed on llvm generated code, but I think it will be better if llvm backend was able to do all optimisations on its own.

Anyway IMHO there is no right or wrong answer to the utility of sb600 or LLVM backend ; having an already fast backend like sb600 allows vadimg to spot some non obvious bottleneck like the stack reservation (that benefits every backend) and provides a better option for people that want performances instead of opencl support. Having a LLVM backend for r600 allows me to rely on well tested algorithm used by clang and work on things that could be reused by radeonsi later.