Thursday, October 28, 2010

Intermediate representation again

I wrote about intermediate representation a few times already but I've been blogging so rarely that it feels like ions ago. It's an important topic and one that is made a bit more convoluted by Gallium.

The intermediate representation (IR) we use in Gallium is called Tokenized Gallium Shader Instructions or TGSI. In general when you think about IR you think about some middle layer. In other words you have a language (e.g. C, GLSL, Python, whatever) which is being compiled into some IR which then is transformed/optimized and finally compiled into some target language (e.g. X86 assembly).

This is not how Gallium and TGSI work or were meant to work. We realized that people were making that mistake so we tried to back-paddle on the usage of the term IR when referring to TGSI and started calling it a "transport" or "shader interface" which better describes its purpose but is still pretty confusing. TGSI was simply not designed as a transformable representation. It can be done, but it's a lot like a dance-off on a geek conference - painful and embarrassing for everyone involved.

The way it was meant to work was:

Language -> TGSI ->[ GPU specific IR -> transformations -> GPU ]

with the parts in the brackets living in the driver. Why like that? Because GPUs are so different that we thought each of them would require its own transformations and would need its own IR to operate on. Because we're not compiler experts and didn't think we could design something that would work well for everyone. Finally and most importantly because it's how Direct3D does it. Direct3D functional specification is unfortunately not public which makes it a bit hard to explain.

The idea behind it was great in both its simplicity and overabundance of misplaced optimism. All graphics companies had Direct3D drivers, they all had working code that compiled from Direct3D assembly to their respective GPUs. "If TGSI will be a lot like Direct3D assembly then TGSI will work with every GPU that works on Windows, plus wouldn't it be wonderful if all those companies could basically just take that Windows code and end up with a working shader compiler for GNU/Linux?!", we thought. Logically that would be a nice thing. Sadly companies do not like to release part of their Windows driver code as Free Software, sometimes it's not even possible. Sometimes Windows and Linux teams never talk to each other. Sometimes they just don't care. Either way our lofty goal of making the IR so much easier and quicker to adopt took a pretty severe beating. It's especially disappointing since if you look at some of the documentation e.g. for AMD Intermediate Language you'll notice that this stuff is essentially Direct3D assembly which is essentially TGSI (and most of the parts that are in AMD IL and not in TGSI are parts that will be added to TGSI) . So they have this code. In the case of AMD it's even sadder because the crucial code that we need for OpenCL right now is OpenCL C -> TGSI LLVM backend which AMD already does for their IL. Some poor schmuck will have to sit down and write more/less the same code. Of course if it's going to be me it's "poor, insanely handsome and definitely not a schmuck".

So we're left with Free Software developers who don't have access to the Direct3D functional spec and who are being confused by the IR which is unlike anything they've seen (pre-declared registers, typeless...) which on top of it is not easily transformable. TGSI is very readable, simple and pretty easy to debug though so it's not all negative. It's also great if you never have to optimize or transform its structure which unfortunately is rather rare.

If we abandon the hope of having the code from Windows drivers injected in the GNU/Linux drivers it becomes pretty clear that we could do better than TGSI. Personally I just abhor the idea of rolling out our own IR. IR in the true sense of that word. Crazy as it may sound I'd really like my compiler stuff to be written by compiler experts. It's the main reason why I really like the idea of using LLVM IR as our IR.

Ultimately it's all kind of taking the "science" out of "computer science" because it's very speculative. We know AMD and NVIDIA use it to some extend (and there's an open PTX backend for LLVM) , we like it, we use it in some places (llvmpipe), the people behind LLVM are great and know their stuff but how hard is it to use LLVM IR as the main IR in a graphics framework and how hard is it to code-generate directly from it for GPUs - we don't really know. It seems like a really good idea, good enough for folks from LunarG to give it a try which I think is really what we need; a proof that it is possible and doesn't require sacrificing any farm animals. Which, as a vegetarian, I'd be firmly against.

@Bryan: Right after Tungsten Graphics and all of its assets have been acquired by another company :) Ah, and thanks!@Christoph: Yea, it's its similarity to TGSI that interests me in this context rather than its relevance as a direct GPU code-generator.@knue: Ultimately removing TGSI would be the goal, but for now we have a number of drivers which use TGSI and keeping them working is quite frankly a must. So rather than writing a LLVM code-generator for every driver a LLVM code-generator for TGSI would keep all of the drivers working.

But if you use llvm completely, things like register allocation can be done by llvm. Besides that first translating to TGSI can introduce non optimal code. Is it possible to support both approaches? Newer drivers could just use llvm and the older ones can stick to TGSI.

@knue: I think you're misunderstanding something. We can't just use "LLVM completely" because it implies that we would have to fork our entire code-base for at least a year to write a LLVM backend for every piece of hardware that we support right now. Clearly that's ridicules, we need to keep things working. At this stage LLVM doesn't help with register allocation because every GPU has different registers and we have no backends for any of them. Also there is no first translation it's Source lang -> LLVM ir -> TGSI -> driversand obviously drivers that will be able to use LLVM ir will use it directly and once all of them will have LLVM backend than and only then we'll be able to simply remove TGSI from the equation.

@ZackIt's not that ridiculous. Basically all of the major players have to provide an instruction set that interfaces with llvm since they all need provide graphics for OSX at some point or another. OSX uses llvm for loads of things, including gpgpu task. If you outline such a project (and draw lines carefully), then they might agree to each fund one developer for one year to come up with a clean-roomish llvm implementation, then the open-source community could work on stitching things together.

@Christopher Friedt: I think you're missing Zack's point. What he's saying is that LLVM backends for drivers don't pop up overnight, so it's not practical to wait for direct LLVM code-generators to appear for every supported driver.

Instead, an LLVM to TGSI translator would provide compatibility with existing drivers. This allows existing drivers to use the LLVM infrastructure indirectly and then switch to a direct LLVM code-generator when it's ready.