The R300-R500 shader compiler

This document provides some information about the hardware-specific shader compiler for the R300-R500 family of chips. It is used in both the classic Mesa driver and the r300gGallium3D driver. The source code is located in src/mesa/drivers/dri/r300/compiler in the Mesa repository.

The compiler is for R300-R500 only. At some point we might want to investigate if it makes sense to extend it to R600+. Note that the instruction format of R600+ is quite different, so this may or may not make sense.

Building and dependency situation

Despite the fact that the source resides inside the r300 driver directory, the compiler is automatically built for both the classic and the r300g driver. For this reason, no code in the compiler may depend on any headers that are part of Mesa or of Gallium3D. All data structures are specific to the compiler. This causes some unfortunate minor code duplication, but it also allows us to be more flexible in changing our data structures.

The compiler is written in plain C99; it uses no additional external libraries (it might use some GNU C extensions; not sure about that).

High-level role of the shader compiler

One core thing to understand about the compiler is that it doesn't actually know about GLSL, or any of the ARB program extensions. There are many different flows of shader program code, and the following list contains only some examples, where (data formats) are in parentheses and [program modules] are in brackets:

If you use some other state tracker with r300g, such as st/xorg: [state tracker] -> (TGSI) -> [r300_tgsi_to_rc.c] -> (rc_program) -> [r300/compiler] -> (binary opcodes and register settings)
There are some small simplifications here, but this should give you the big picture about what happens to the shader code. Note that the ARB program parser and the GLSL compiler are actually hardware independent; they both reside below the src/mesa/shader directory in the Mesa repository.

As you can see, shaders are always eventually translated into the rc_program format, which is the only format that is ever seen by our shader compiler. The compiler then translates this into a hardware-executable program. Some of the paths shown above may seem a bit convoluted (and they are!), but keep in mind that they are only executed when the shader is first compiled and linked, which usually happens only during application startup. Once your application is running and all shaders are compiled into their binary, hardware-understandable representation, they only need to be loaded into the hardware registers for rendering, and all these long paths disappear entirely.

You should also understand the role of the compiler-produced binary opcodes in relation to the hardware. Rather simplified, the hardware pipeline contains the following steps:

Vertex Fetch loads vertex information and writes into the PVS' Input Vertex Memory (IVM). The PVS runs the vertex shader itself and writes result values into its Output Vertex Memory (OVM). Primitive Assembly pulls vertex data out of the OVM to assemble primitives, clip them, and send appropriate information to the Rasterizer. The Rasterizer determines which pixels (fragments) are covered by primitives. The Rasterizer invokes the Pixel Shader and initializes the Pixel Shader's temporary registers using interpolated varying values. The Pixel Shader runs the fragment shader itself and outputs result values to the Graphics Backend, which is responsible for alpha blending, testing, Z buffer, etc.

The binary code which is produced by the compiler configures only the PVS and US stages. In particular, the compiler does not decide how input registers and output registers should be laid out, or how the RS should be programmed, etc. This is the job of the rest of the driver, and there may be subtle differences between classic Mesa and r300g in that respect.

The compiler provides interfaces with which drivers can communicate how input and output registers should be laid out. Consult r300_vs.c and r300_fs.c for examples of how these interfaces are used.

How to learn about and hack the compiler

One useful tool for learning about the compiler is to set the RADEON_DEBUG environment variable to print out intermediate stages of the vertex and fragment programs as they are being compiled. The debug flags for r300g are vp and fp (e.g. set RADEON_DEBUG=vp), for classic Mesa they are verts and pixels. Also, you should obviously learn about how the hardware actually works, referring to AMD's documentation.

Hacking on the compiler is a rather subtle undertaking, because it's easy to forget about a certain special case or quirk in the hardware, or to miss some subtle invariant about what the intermediate states of programs look like. This can make hacking the compiler seem daunting at first. Unfortunately, the best piece advice that I can give you is: be daring, and use piglit. That is, after you wrote your first compiler patch and think it's looking good, test it using piglit. If your patch does not cause regressions there, there's a pretty good chance that it's correct, and does not break anybody's applications.

Program representation in the compiler

The structures representing programs in the compiler are mostly defined in radeon_program.h, with additional definitions in other headers that you can just browse through in the source. Most important are radeon_program_pair.h (only relevant for late stage fragment programs) and radeon_opcodes.h.

Programs are represented by an rc_program structure, which at the time of this writing consists essentially of a doubly linked list of rc_instructions, and that's really almost all there is to it. You can remove, insert, and modify instructions in a relatively care-free way. Some helper functions are provided, which you should make use of instead of rolling your own (they are not going to be documented here; for these details, the source code is the documentation).

Note that there are some subtle and ill-defined invariants at different stages of the compilation. For example, after the initial stages, programs should be free of opcodes that cannot be implemented directly in the hardware. Use common sense.

Also note that this very free-form program representation makes some operations rather time-consuming. For example, looking for all accesses to a temporary register requires scanning all instructions in the program (there are helper functions for this, too!). The only way to avoid this would be to create additional support structures with corresponding invariants that must be preserved at every modification of the program. Maintaining such invariants creates code that is more difficult to follow, which is why it isn't done yet. However, once we really know which kinds of support structures would be useful to speed up the compiler, it is possible that such structures and corresponding invariants will be introduced in the future.

Compiler passes

In order to make the compiler easier to understand and more maintainable, compilation is broken up into many passes that each do one specific thing. As a general rule of thumb, each pass should go into its own source file, to make the compiler source code more readable and maintainable.

The purpose of each pass should be described in the source code, and additional high-level information may be put here.