There are a few places on the net that I visit pretty regularly. One such place is the forums at Beyond3D where several cool developers hang around and post good stuff. Some of my favourite Beyond3D contributors are DeanA (Dean Ashton, a colleague over at SCEE ATG, with a never-updated blog), and DeanoC (Dean Calver) and nAo (Marco Salvi) who happen to be lead and graphics programmers, respectively, on Heavenly Sword over at Ninja Theory. All fellow PS3 programmers in other words.

Of note in recent times is Dean C talking about the Atomic Cache facility of the SPEs both on his blog and on the forums. Though here I’m going to talk about something both Dean C and Marco have talked a lot about in the past, namely how they use LogLuv encoding as part of their HDR solution. (Lots of funny speculation ensued on the forums as they didn’t quite give enough info for people to connect the dots, even to the extent where websites felt they needed to conduct interviews on the topic.)

As most (good) developers do, when respected people talk about a piece of tech of theirs, which you are not currently employing, you investigate. So, back then, I decided to look into (amongst other things) what it would take to encode RGB into LogLuv in a pixel shader. Here’s what I found.

The pertinent information on converting from [R,G,B] to [Le,Ue,Ve] (i.e. LogLuv) is spread over multiple sections of Ward’s paper, so to save you some work, I’ll summarize. The conversion is done as follows:

where M is the 3×3 matrix [0.497,0.339,0.164; 0.256,0.678,0.066; 0.023,0.113,0.864].

To explain the “magic” constants in Ward’s math, we note that we support Y in the range (5.4*10-20, 1.8*1019), because log2() of these values give the range (-64.0,64.0), which for Ward’s Le calculation brings Le into the desired integer range [0, 215-1] (a 15-bit integer).

Ward states the gamut of perceivable u and v values lies in the range [0, 0.62] and he therefore scales the u and v values by 410 to result in an integer [0, 255]. For a fragment shader we need Le, Ue, and Ve to lie in the [0, 1] range, as the hardware will automatically turn floats in that range into a [0, 255] integer (clamped). However, we will in the end be splitting Le over two such integers, so we’ll turn Le into a float of range [0,256). Making the appropriate changes turns the math into:

There are quite a few optimizations we can do at this point. In an attempt at being educational, I’ll apply them one by one. First, substitute the expressions for x and y in the expressions for u’ and v’ and simplify, to obtain this calculation:

Here we note that it is possible to fold the dot product dot([1,15,3], [X,Y,Z]) into the vector-matrix multiplication so that it ends up in the Z component of the result (which I’ll call XYZ). The new math is then

The new matrix is M’ = M * [1,0,1; 0,1,15; 0,0,3] * [4/9,0,0; 0,1,0; 0,0,0.62/9]. At this point, there’s hardly any math left and no(?) optimizations left to apply, so now it’s time to code. However, turning this into production code we have two potential problem sources:

Division by zero.

log2() arguments less-than or equal to zero.

To avoid visible glitches both issues must be handled, which we can do by strategically adding in some small epsilons to force values to be strictly positive where it matters. When all that is done, we get the following code (Cg code, of course) as a result:

Running this code through NVShaderPerf gives (from memory) 5 cycles for 9 instructions. When inserted at the end of a longer shader where there is plenty of room for instruction pairing, the total overhead for the LogLuv conversion will be less than this, perhaps around 3 cycles. I haven’t checked with Marco to see how this compares to what he’s doing, but it matches the cycle numbers he mentioned in various posts so it’ll be pretty close.

As Marco discusses on e.g. Dean’s blog you might want to adjust this representation a little to avoid getting carry problems during interpolation, which I haven’t done here but left as an exercise to the reader. Another exercise is to do the conversion from LogLuv back to RGB. Enjoy!

nAo said,

Well done Christer!
Your implementation is actually faster than mine as I did not fold the dot product into the matrix multiplication and I also had to split the log luminance in a more convoluted way as using the same transformation you used I was not being able to perfectly go back to a RGB colour without losing a tiny bit of intensity.
BTW..do you know any game that is making use of the same base technique?
The only one I’m aware of is Heavenly Sword..

Hi Marco, good to see you here! Back when I looked at this I only worked out how to optimize the RGB->LogLuv code as per above, not the other way around, so I never looked into the precision issues but I did scribble in my notes that splitting the luminance value into two bytes could be an issue and that you might have to do something else. A possible option could be to write it this way:

Although this might not work so well either; I can’t remember if the pack/unpack instructions were hosed or not. (I’m not the one doing the shader coding.) I recall the unpack approach produced worse code too, but this was with a pretty old Cg compiler, so who knows.

I don’t know of any other games using LogLuv at this point. I considered it for our engine based on your posts about your approach, but we haven’t committed to how to deal with HDR yet so the ball is still in the air. (As you know quite well, there are several possible options and which is best depends a lot on what other choices you’ve made – and we haven’t made all our choices yet.)

BTW, I don’t think it’s a secret to mention publicly that I’ve seen builds of Heavenly Sword and the graphics are absolutely gorgeous. Kudos to you and the team!

nAo said,

You’re right, there are many different options when it comes down to render HDR images.
Since we can’t really do alpha blending using this color space, if a game really needs to blend in a HDR color space then this technique only make sense only if used in conjunction with multisampling, writing a custom AA resolve filter that downsample a LogLuv image to a FP16 image where we can do HDR blending
(Though I think that HDR blending is overrated, we can live without it just blending on a RGBA8 render target, tone mapping in our alpha blending pass pixel shaders, even better if we do it using exposure computed in the previous frame read back with the CPU so that we can avoid a texture read in our pxel shaders and set exposure as a pixel shader constant.)
It’s also worth to notice that the vast majority of games probably don’t fully use the full FP16 range bur rather a narrower range, in this case we can drop the logarithm and just store a linear luminance scaled to just fit the luminance range we want to support, even in this case I doubt we can tell the difference, can we? :)
BTW..the code I used to encode luminance is really ugly but I did not have much time to spend on it and it was the only code that did the job (perfect LogLuv -> RGB/FP64 conversion) as when I (sneakily!) introduced it the game already had a ton of content developed using FP16 and I did not want the artists/art director to scream in pain cause our images were slighty darker (!!):

[…] but even without re-introducing a floating point buffer (or some funky color space technique, see Christer Ericson’s blog entry about some of the work I did on Heavenly Sword and his very clever take on it) we can still […]

Just to round out the above blog post and its comments, I thought I’d mention that users MJP and remigius over at gamedev.net incorporated Marco’s packing code with my code snippet and also worked out the details of the matching LogLuv_Decode() function. So, for completeness, and with credits to MJP and remigus, here’s the full Encode/Decode pair (in HLSL):

I hope people who visit this post (and according to the stats, its a fairly popular post) will find this info useful. Make sure to visit the gamedev.net thread too (as linked above). Rim van Wersch (remigus) also posted a simple test project that you might want to check out.

[…] in many other PS3 games, as well. My actual shader implementation was helped along quite a bit by Christer Ericson's blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format. […]

[…] Since we really want a wide range of light intensity, a different approach is to use a different color space. Several people mentioned LogLUV, which I hear gives good results, at the expense of a high instruction cost for both packing and unpacking. Here is a detailed explanation. […]