¯(°_o)/¯

Main menu

Category Archives: glsl

Some time ago I devised an original algorithm to convert from RGB to HSV using very few CPU instructions and I wrote a small article about it.

When looking for a GLSL or HLSL conversion routine, I have found implementations of my own algorithm. However they were almost all straightforward, failing to take full advantage of the GPU’s advanced swizzling features.

The lightweight version for coarse levels of detail

If I wanted a more lightweight fragment shader, for instance when implementing variable levels of shader complexity, I would want to do only one texture lookup, and use the vector with the largest weight:

But if you are familiar with SSE or AltiVec-style SIMD programming on the CPU, you will know this is not the usual way to do. Rather than 4 vectors of 3 elements, SIMD programming prefers to work in parallel on 3 vectors of 4 X, Y and Z components:

vec4 getColorShuffled(vec4 allx, vec4 ally, vec4 allz){/* Now what do we do here? */}

One nice thing to realise is that the equivalent of our previous step(a.z, c.z) and step(b.z, d.z) tests can be done in parallel:

Wow, that’s a bit complex. And even if we’re doing two calls to step() instead of three, there are now five calls to mix() instead of three. Fortunately, thanks to swizzling, we can combine most of these calls to mix():

That’s it! Only three mix() and two step() instructions. Quite a few swizzles, but these are extremely cheap on modern GPUs.

Afterthoughts

The above transformation was at the “cost” of a big data layout change known as array of structures to structure of arrays. When working in parallel on similar data, it is very often a good idea, and the GPU was no exception here.

This was actually a life saver when trying to get a fallback version of a shader to work on an i915 card, where mix and step must be emulated using ALU instructions, up to a maximum of 64. The result can be seen in this NaCl plugin.