If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

OpenCL, GLSL Back-End For LLVM May Soon Open Up

08-27-2011, 08:40 AM

Phoronix: OpenCL, GLSL Back-End For LLVM May Soon Open Up

A university student that successfully wrote OpenCL and GLSL back-ends to the Low Level Virtual Machine (LLVM) is arranging to have the code open-sourced if there is interest, which already LLVM developers are requesting...

Comment

Mesa is trying to compile GLSL and OpenCL into a form the hardware understands, while this is doing the opposite.

IIUC, it will allow you to write some code in C and have LLVM transform that into an OpenCL program.

I think the purpose is to use LLVM optimizers.
I just hope TGSI will be replaced with enhanced LLVM bitcode. One that supports graphics semantics, and Gallium drives will translate LLVM->hw. For more information...the site I gave above.

Comment

I think the purpose is to use LLVM optimizers.
I just hope TGSI will be replaced with enhanced LLVM bitcode. One that supports graphics semantics, and Gallium drives will translate LLVM->hw. For more information...the site I gave above.

the LLVM stuff is for software rendering only because the LLVM can not handle a VLIW gpu architecture.

means you can build a software pipe with that but not a gpu acceleration.

Comment

the LLVM stuff is for software rendering only because the LLVM can not handle a VLIW gpu architecture.

means you can build a software pipe with that but not a gpu acceleration.

What's the problem with LLVM and VLIW? Can someone of the devs explain briefly? Is that graphics compute load, is so much more different than non-graphics one?
Somewhere I read about differnt memory model for graphics, than non-graphics, but didn't understant it quite well.

Comment

The main issue as I understand it is that while one of the main attractions of llvm is the existing set of optimizing tools, those tools generally do not explicitly recognize VLIW and as a consequence many of the optimizer functions can't take advantage of the hardware in the way that a hardware-specific compiler could. You can write new optimizers specific to the HW but it's not obvious that writing those optimizers in the llvm framework is easier, or even as easy as writing a dedicated compiler back end.

This is mostly a graphics issue, in the sense that graphics workloads tend to use a lot of 3- and 4-element vector operations while compute does not. GPU hardware is often designed to directly support those operations in hardware, either via a fixed vector instruction set or a VLIW instruction set with registers capable of storing multiple components for easier and more efficient addressing on those short vector operations.

Going from a vector workload through a scalar compiler stack and back to vector hardware efficiently is tough. The LunarGLASS initiative tries to address this by moving to a two level IR, so that optimizers can be written in a more portable way and re-used more easily (at least that's what I remember on Sunday night ).

Compute is a different story in the sense that the workload tends to be either scalar or very long vector depending on whether you look at the innermost loop or the next one out. Either way there is essentially no short-vector workload so you aren't losing as much by going through an essentially scalar compiler stack.

That's how you end up with llvm "making sense for compute but not for graphics".

Of course we recently announced that some of our future cores would move away from VLIW, so the discussion of which compiler stack(s) and IR(s) to best cover recent, current and future hardware can get really "interesting".

Comment

That's how you end up with llvm "making sense for compute but not for graphics".

Of course we recently announced that some of our future cores would move away from VLIW, so the discussion of which compiler stack(s) and IR(s) to best cover recent, current and future hardware can get really "interesting".

It supports OpenCL 1.0, not 1.1, and it works on Cell, ARM, some DSPs, and I've used it in 32-bit x86. It probably wouldn't take much to get it working in 64-bit x86 as well (it compiles, but throws a run-time CPU detection error).

I've used it in a project I'm working on, and it functions as would be expected of an OpenCL 1.0-compliant runtime library.