Thursday, November 24, 2011

Advanced HLSL using closures and function pointers

Shader languages like HLSL, Cg or GLSL are nowadays driving the most powerful processors in the world, but if you are developing with them, you may have been already a little bit frustrated by one of their expressiveness limitations: the common problem of abstraction and code reuse. In order to overcome this problem, solutions so far were mostly using a glue combination of #define/#include preprocessors directives in order to generate combinations of code, permutation of shaders, so called UberShaders. Recently, this problem has been addressed, for HLSL (new in Direct3D11), by providing the concept of Dynamic Linking, and for GLSL, the concept of SubRoutines, For Direct3D11, the new mechanism has been only available for Shader Model 5.0, meaning that even if this could greatly simplified the problem of abstraction, It is unfortunately only available for Direct3D11 class graphics card, which is of course a huge limitation...

But, here is the good news: While the classic usage of dynamic linking is not really possible from earlier version (like SM4.0 or SM3.0), I have found an interesting hack to bring some kind of closures and functions pointers to HLSL(!). This solution doesn't involve any kind of preprocessing directive and is able to work with SM3.0 and SM4.0, so It might be interesting for folks like me that like to abstract and reuse the code as often as possible! But let's see how It can be achieved...

A simple problem of abstraction and code reuse in HLSL

I have been working recently at my work on a GPU implementation of a versatile perlin/simplex/fbm/turbulence noise in HLSL. While some of the individual algorithm are pretty simples, it is often common to use several permutations of those functions in order to produce some nice noise and turbulences functions (like the worm-lava texture I did for Ergon 4k intro). Thus, they are an ideal candidate to demonstrate the use of closures and functions pointers. I won't explain here the basic principle of perlin and fbm noise generation to focus on the problem of code reuse in HLSL.

Here is a simplified version of a Turbulence Noise implemented in a Pixel Shader:

The problem with the previous code is that if we want to change the code behind AbsNoise called from FBMNoise (for example, apply cos/sin on the coordinates, or use of a simplex noise instead of the old Perlin Noise), we would have to duplicate the FBMNoise function to call the other function. Of course, we could use the preprocessor to inline the code, but It would end up in something less readable, less debuggable, error prone...etc.

Another example: Ken Perlin introduced some really cool functions to modify the noise, like the famous marble effect:

But wait! The MarbleNoise function could even be used in place of the AbsNoise function, in order to get another noise effect. So we could have a marble function calling a FBM... but we could also have a marble function called by a FBM... or both... ugh... so as we can see, It is possible to permute those functions to generate interesting patterns, but unfortunately, the shading language doesn't provide us a way to make those functions pluggable!... Almost! In fact, there is a small breach in the HLSL language and we are going to use it!

Introduction to Dynamic Linking in HLSL

So as I said in the introduction, Direct3D11 has introduced the concept of dynamic linking. I suggest the reader to go to an explanation on msdn "Interfaces and classes". Basically, the main feature introduced in the HLSL language is a bit of Object Oriented Programming (OOP) in order to address the problem of abstraction: Now HLSL has the class and interface keyword. But they were mainly introduced for dynamic linking of a shader, and as I said, dynamic linking is only available with SM5.0 profile.

As we can see, we need to declare the interface and classes variable globally, so that they can be accessed by the C++ program. This is the standard way to use dynamic linking in HLSL... but what If we want to use this differently?

Hacking function pointers in HLSL

The principle is very simple: Instead of using interface and classes as global variables, we can in fact use them as function parameters and even local variables from method. The way to use it is then straightforward:

The previous example could be compiled flawlessly with ps_4_0 (Shader Model 4) or ps_3_0 (with some minor changes for the pixel shader), and It would compile just fine!
So basically, the interface ICalculator is acting as a function pointer, that has two implementations available through the ClassicCalculator and ComplexCalculator classes. MyFunctionUsingICalculator doesn't have to change its signature to adapt to the underlying function, so as we can see, we have a suitable solution for developing function pointers in HLSL.

Now, lets try to see if we could use this model to build our flexible noise functions. Replace ICalculator by a INoise interface. We are seeing that an implementation would have to call another INoise interface. In fact, ideally, we would like to code something like this:

Unfortunately, HLSL doesn't permit the use of interface as variable members!. This limitation was quite annoying, as It excludes a whole range of combination, like aggregation, composition... making these function pointers useful only for a very limited set of cases...
I have tried to overcome this problem using abstract class instead of interface, as classes can be declared as variable members of classes... but, again, there is a huge limitation: The class variable is in fact acting a a final or const variable that cannot be changed, thus making its usage almost useless...
But I knew that HLSL permits lots of unusual constructions, and this is where closures are going to resolve this.

Hacking Closures in HLSL

So we know that interfaces can be used as function pointers, but their usage is limited as we cannot use anykind of composition. An interesting fact is that we can declare local variables in methods as being class or interfaces... The trick is to use a quite uncommon feature of HLSL: It is possible to declare local classes inside a method, that can access local parameters!
Therefore, It is possible to use a kind of deferred composition/aggregation using this technique. Let's rewrite our noise functions using this new closure technique:

1. Declare a INoise interface that is able to compute the noise by using a next INoise implementation.

// It is possible to compile this code under ps_4_0 and ps_3_0
// Declare our INoise interface
interface INoise {
// Here an interesting hack: We can declare a method that is returning a INoise
// interface. This method will be implemented by the pixel shaders.
INoise Next();
// The compute method of a Noise
float Compute(float2 pos);
};

2. Declare NoiseBase as an abstract implementation of INoise that is implementing the methods. If we had the keyword abstract in hlsl we wouldn't have to implement methods of this class.

// We are creating an abstract class from INoise in order
// to implement both methods
class NoiseBase : INoise {
INoise Next() {
// This code will never be used. It is only
// used to declare this class
NoiseBase base;
return base;
}
float Compute(float2 pos) {
// This code will never be used. It is only
// used to declare this class
return Next().Compute(pos);
}
};

3. Use NoiseBase to implement final INoise functions. If you look at AbsNoise, FbmNoise or MarbleNoise, they are using the INoise::Next() method to get an instance of the INoise interface they rely on.
This is where functions pointers are extremely useful here.

Et voila! As you can see, we are able to declare local classes from a pixel shader that are acting as closures. It is for example even possible to declare local classes that have a specific code in their Compute() methods.
Behind the scene, when chaining the INoise::Next() methods, the fxc HLSL compiler is seeing all thoses classes as "INoise*".
It is then possible to perform a fbm(marble(abs(perlin_noise()))) as well as a marble(fbm(abs(perlin_noise()))).

In the end, It is effectively possible to implement closures in HLSL that can be used in SM4.0 as well as SM3.0!

Improving closures chaining

From the previous example, we can extend the concept by
1. Adding static local constructors to each Noise function :

Further Considerations

This is a very exciting technique that could open lots of abstraction opportunities while developing in HLSL. Though, in order to use this technique, there are a couple of advantages and things to take into account:

An interface cannot inherit from another interface (that would be really interesting)

An interface can only have method members.

A class can inherit from another class and from several interfaces.

Unlike in C/C++, we cannot pre-declare an interface, but we can use a declaration being declared (See the example of the method INoise::Next, returning a INoise).

The compiler has a limitation against the reuse of an implementation in a call chain and will complain about a recursive call (even if there is no recursive call at all): For example, It is not possible to reuse twice the sample type of class closure in a call chain, meaning that it is not possible to make a call chain like this one: Marble => FBM => Marble => Abs => Perlin. The fxc compiler would complain about the second "Marble" as It would see it as a kind of recursive call. In order to reuse a function, we need to duplicate it, that's probably the only point that is annoying here.

Generated compiled asm output from closures are exactly the same as using standard inlining methods.

Before going to local class-closure, I have tried several techniques that were sometimes crashing fxc compiler.

Thus, as it is a way of hacking the usage HLSL, It is not guarantee that this will be supported in the future. But at least, if it is working for SM5.0, SM4.0 and 3.0, we can expect that we are safe for a while!

Also, the compilation time under vs_3_0/ps_3_0 profile seems to take more time, not sure if its the language construction or a regular behavior of 3.0 profiles.

Let me know if you are able to use this technique and If you are finding other interesting constructions or problems. That would be very interesting to dig a little more into the opportunities it opens. Lastly, I have done a small google search about this kind of technique, but didn't found anything... but It could have been used already by someone else, thus this whole technique is a new hypothetical discovery, but I enjoyed a lot to discover it!

You can declare a new class within a function!? Very nice indeed! I didn't know HLSL allowed this. I have been working on a different approach for using higher order functions. A library I am working on translates F# to HLSL. Check it out: https://github.com/rookboom/SharpShaders/wiki/Higher-order-functions