Wednesday, May 16, 2012

Demosaicing the Fuji X-Pro1 and its X-Trans CMOS sensor

Looks like it's the time for oddball sensors. Or for me to write about them anyway. I've just finished updating PhotoRaw for the Fuji X-Pro1, and I thought it was worthwhile to document the journey, and what it means for the X-Pro1's X-Trans sensor. Specifically, whether it will deliver on the claims that Fuji has made for it.

BTW, the new version of PhotoRaw will be out early next week, assuming there are no glitches on Apple's side.

This is my personal view-point. It's based on spending a lot of time writing code, and more time pixel peeping X-Pro1 images than any sane person ought to do, but it's still just my personal view.

If you happen to live anywhere a lab where people develop software for decoding raw images hang out, you've probably been hearing a whole lot of swearing and cursing recently. You've probably also been hearing a lot of jokes with punchlines about "there's a reason why Fuji is a four letter word starting with f". Here's why:

Fuji has introduced the X-Pro1 a while ago, with a novel "X-Trans" sensor. A conventional bayer sensor is a two by two rangement, but the X-Trans has a six by six layout, intended to mimic the random grain structure of film. The layout is in the image below.

Fuji has a long history of innovative sensor design, for example the Super-CCD sensors with a mix of high and low sensitivity cells, as well as sensors with the Bayer array rotated at 45 degrees, etc. However, none of these have made a lasting impact on the market, with the major manufacturers resolutely sticking to conventional Bayer arrays.

The advantages of the X-Trans sensor

The big advantage of the X-Trans layout is moire suppression - because the layout isn't as regular as the conventional layout, moire (nasty colored pixels on sharp edges) isn't nearly as much of a problem as it would otherwise be. Fuji claim that moire is eliminated - that's probably optimistic, but I'd guess that moire would be suppressed by three to four times. That's a LOT. In turn, this means that while a sensor with a conventional layout needs an anti-aliasing (AA) filter to avoid moire, a camera built with the X-Trans sensor doesn't need an AA filter, or at least can use one that's a lot weaker. An AA filter is basically a filter that slightly blurs the image. No AA filter (or a weaker AA filter) means less blurring, and higher effective resolution.

End result is that the the Fuji X-Pro1, which has a 16 MP sensor, should out resolve cameras with larger full-frame sensors.

This is all sound theory, and there have been a fair number of rave reviews of the X-Pro1 in the press. So why all the the unhappy developers? And does the sensor really deliver?

The problems

The downside with the X-Trans sensor is that it's a bit of a pig to demosaic. Fuji openly say that it will take much more processing power, and that the X-Pro1 has a beefed processor to cope. Demosaicing involves interpolating pixels that are missing from the sensor array - e.g., in the case of a green pixel, the red and blue pixels need to to interpolated.

The easiest way to see the challenge that the X-Trans sensor poses is to take a look at the blue pixel in the first column of the diagram above. To interpolate effectively, the ideal situation is to have pixels that you can interpolate from close by. In addition, the best kind of pixels to interpolate from have their centers aligned - in other words, you can draw a line between a pair of pixels that passes through the center of of the pixel you want to interpolate. So, e.g., let's say you want to interpolate the red value of that blue pixel. Vertically, the closest other red pixel is three down, and three up, aka, a gap tof two pixels. In the conventional array, its two up and two down, a gap of one. Take a look at where the closes same color pixel, another blue pixel, is and it get worse - it's six pixels away vertically, versus two for the conventional bayer array. If you take a look at green pixels, in the conventional bayer array, any green pixel is between 4 other green pixels at a 45 degree angle, and has a gap of one in any direction vertically or horizontally. In the X-Trans sensor, the spacing is uneven - one pixel in one direction, two in another.

In a real demosaicing engine, things are more complex than what I laid out in the previous paragraph - a modern engine uses information from all the pixels near it, not just one color, and also takes decisions on the fly about what pixels to use to get the best result. But the the basic rule is still that close, aligned pixel give better results, and the X-Trans layout is short on close, aligned pixels.

To insult to injury, many interpolation implementations optimize based on the sensor pattern repeating on a power of two - two, four or eight. That allows you do bit masks rather than division/remainder operations. The X-Trans sensor repeats on a six pixel basis, not a power of two.

Finally, Fuji sprang the X-Pro1 on the market without providing pre-launch information (or indeed post-launch information!) to raw developers. Take a look on forums frequented by Adobe staff, and those close to Adobe, and you'll find a number of negative comments abouts Fuji's handling of the X-Pro1's launch.

Bottom line is, the X-Trans sensor trades off freedom from moire for a considerably more complex interpolation problem. And Fuji openly admit this. The question is whether the tradeoff is worth it - will the combination of better sensor level resolution because the AA filter isn't required anymore actually translate into a real advantage once the image is demosaiced. Which brings us to PhotoRaw's demosaicing engine.

Implementing X-Trans interpolation in PhotoRaw

Previous generations of PhotoRaw mainly used a modified AHD interpolation engine. It wasn't limited to two by two patterns, but it did make some assumptions about where it could find pixels. I say "mainly used" because not every sensor went through the AHD engine - some special cases were processed separately. Unhappily, the X-Trans sensor breaks the old engine's assumptions about pixel locations.

So PhotoRaw needed a new way of processing X-Trans images.

Attempt one - modified VNG: My first try at a solution was to find a quick and easy fix. Looking at the X-Trans sensor, it looked like it would be well suited to a gradients-type approach. The uneven pixel distances could be handled via weighting without much change to the underlying algorithm. Broadly speaking, calculating a gradient can be done similarly regardless of pixel distance. So I modified a VNG interpolation algorithm - VNG is threshold-based Variable Number of Gradients interpolation. The good news was, the solution worked, and worked a lot better than bilinear interpolation, which is the fallback when all else fails. The bad news was, it didn't work too well. Specifically, it fell over badly on large sharp transitions - in fact, it looked little better than bilinear interpolation there - blocky edges, etc.

Attempt two - generalized AHD: When the easy approach didn't work well enough, I had two choices. Either continue working the VNG solution, or try something new. I chose to try to implement an AHD (Adaptive Homogeneity-Directed interpolation) solution instead. This was for a number of reasons. Firstly, I know AHD better than I know VNG, so there would be less work on understanding how to modify the algorithm. Secondly, PhotoRaw already has a well proven AHD interpolator that's multithreaded, etc, etc. And I mean really multithreaded as in multiple threads working on one image, as opposed to thread safe. Finally, I prefer AHD to other interpolation methods - I've tried most of the others, and my experience is that AHD is more stable.

What I ended up implementing is a generalized AHD engine. It's table driven, so the way that it works is that initially PhotoRaw analyses the sensor pattern, and sets up a number of tables that describe how to interpolate the sensor. The advantage of this is that the analysis algorithm can be quite complex, and not necessarily efficient, because it only runs once. Then the multi-threaded core can just run through the tables very efficiently. The way its implemented, the analysis algorithm is general - it can analyze pretty much any sensor pattern and come up with a optimized set of interpolation tables. In essence, the analysis portion looks for optimal pixel combinations, how close the pixels are, whether the pixels are aligned, etc and choses the best combinations it can.

And that's what will be running Fuji X-Pro1 conversions in future. It won't be running the majority of conversion however - the way PhotoRaw now operates is that it uses a hard coded, very efficient, engine for two by two patterns (95% plus of cameras), and goes to the general engine for more complex patterns.

The results

Here are some crops of the end results versus Adobe Camera Raw (ACR), SILKYPIX and the Fuji in-camera JPEG renditions of a sample image. The SILKYPIX conversion is from Ario Arioldi (thanks Ario!!!)

Important note: these are 400% crops, and they are of an area on an image that I know will be really troublesome with raw converters - so bear in mind that we're looking at imperfections you probably won't ever see in practice. Also, the ACR crops below are a beta version of ACR.

Also bear in mind that PhotoRaw runs on the iPhone or iPad - it operates in an environment with about one twentieth the memory of a desktop machine, and several orders of magnitude less processing power than either a desktop machine, or the dedicated image processing chip in the Fuji X-Pro1.

This is PhotoRaw - good resolution, clean colors, very clean and well defined edges. However, some pixel noise in transitions from highlights, as seen in the reflections from the paperclips. That could be dealt with, e.g., by median filtering. The problems however is that a median filter, or similar algorithms, would add a lot more processing, and PhotoRaw really doesn't have scope for more processing. Maybe when the iPad 10 comes out.

Adobe Camera Raw V7.1 beta

ACR beta 7.1 is a bit different. At first glance, it's doing better than PhotoRaw. The paperclips are entirely clean - really impressively so. However, take a look at the "Product of Italy" text, especially the insides of the "A" and "Y". Lots of chroma smearing, and the letters are quite desaturated. This is an understandable trade-off by Adobe - ACR runs on the desktop, and so can implement complex filtering, and the resulting chroma smearing is less noticeable than PhotoRaw's pixel noise.

Fuji X-Pro1 in-camera JPEG

Thirdly, the Fuji in-camera JPEG - no problems with the paperclips, and while there is still some chroma smearing, it's less than the ACR beta. But the in-camera JPEG loses a little resolution to both the PhotoRaw and ACR rendition. E.g., take a look at the marks between the letters - in the PhotoRaw and ACR versions they're clearer.

SILKYPIX conversion

Finally, SILKYPIX. SILKYPIX is currently the only officially supported raw converter for the X-Pro1. Again, no problems with the paperclips, and while there is still some chroma smearing, it's less again than the in-camera JPEG, and a lot less than ACR. Resolutions appears slightly lower than PhotoRaw or ACR, but the difference is slight, and may be due to SILKPIX sharpening less by default. Really, the only issue here is that slight remaining chroma smear on the "A" and the "Y", and a slight loss of saturation in the red of the lettering.

If any readers know exactly what processing is being used by both Adobe, SILKYPIX and Fuji that gives that chroma smearing effect, I'd be interested in knowing. Update: this mystery is solved in part three.

Conclusions

I mentioned above that Fuji X-Pro1 images are a pig to interpolate. They are. Whatever you may think of PhotoRaw, ACR is a very,very good raw converter, and here it's having trouble with the Fuji X-Pro1 images. That may be because it's a beta, but the hard truth is that any interpolation engine is going to have trouble with pixels that are spread out, not aligned, and unevenly spaced. SILKYPIX, the official raw developer for the X-Pro1, does a much more credible job, but still shows slight chroma smearing.

It's worth going to the DPReview site and using their image comparison tool to compare the ACR conversion for the Nikon D7000 versus the Fuji X-Pro1 conversion, and the SILKYPIX conversion shown here. The Nikon and the Fuji X-Pro1 have virtually the same number of pixels (16.2 M versus 16.3 M), and approximately the same sized sensor. The Nikon D7000 conversion doesn't have the chroma smearing you see above.

So my conclusion is, sorry to say, that the Fuji X-Pro1 X-Trans sensor doesn't deliver the Fuji promise of outperforming similarly sized sensors. In fact, it underperforms similar DX sensored cameras - with the official SILKPIX raw developer, the underperformance is too slight to be noticeable under normal circumstances, but is still there if you look closely.

This is not to say that the Fuji X-Pro1 is a bad camera - far from it - the camera has great lenses, a really attractive viewfinder and design, and in many ways the sensor is great, with low noise and clean data, in the class of the recent Sony and Nikon sensors. If you use it with the official raw converter, it's within a whisker of the competition. But in my opinion, it would have been a better camera with a conventional two-by-two sensor layout.

Maybe tomorrow Fuji will send me an email that explains how to interpolate Fuji X-Pro1 images such that they deliver on the promise. And I'm certainly interested to see how a final version of ACR performs, and also how Dave Coffin's DCRaw handles the Fuji X-Pro1, once Dave updates DCRaw to handle it. (Update: part 2 of this post has results from a DCRaw beta.) But right now, I don't see the performance that's been claimed for the X-Pro1. (Second update: part three shows how a modified form of PhotoRaw gets very, very close to the performance of SILKYPIX.)

NOTE: This post has been updated with SILKYPIX images since it was first published.

No, I don't have a copy of SilkPix. If someone that does wants to download the raw from DPReview, and post a JPEG somewhere I can access it (converted at default settings), I'll add a crop to the post.

Thanks Sandy. This was a great read to someone like me who doesn't know much about the back end of raw conversions. However, I'm not so sure about basing performance on DPreview images. As 'scientific' as they are, some of their studio images make some really good cameras look terrible. Comparing NEX7 with M9 for example, makes NEX7 look better. Also, Fuji's lens seems to have a different DOF so that there are a lot more out of focus areas, which certainly makes it look like the XP1 has less resolving power.

The actual filter array pattern is: http://www.flickr.com/photos/kevin_g_purcell/6669978383/ - the pattern you posted looks like it's offset partially from its origin. I'm not saying it isn't more complicated to process - I'm just not sure why you chose an offset pattern?

Steve, that's the pattern as PhotoRaw views it starting from 0,0. But how any sensor appears depends on where you start from - the physical edge of the sensor, the start of any masked black level pixels, the start of the active area, etc. You can use any of those - or something else - as long as the raw converter is internally consistent.

There a number of reasons why you often get blue or yellow/orange pixels - firstly, there are usually a lot more green photosites on the array, so it's usually possible to do a better job of interpolating green. Secondly, the sensor is usually more sensitive to red, and least sensitive to blue. Depending on the interpolation method and where in the processing pipeline color conversion and white balance gets done, that can result in fringing on highlights where the most sensitive channel (red) saturates, but the least sensitive (blue) doesn't.

While not directly on the topic of X-Trans sensor demosaicking, I have a question related to colour 'leaking' at edges. Below is a link to a discussion with examples.

http://www.dl-c.com/board/viewtopic.php?f=4&t=453

I entertained different theories, including colour-selective diffraction, but after some more experiments it appears to me that it is a demosaicking problem. In my laymen terms, the sharp edge between a reddish or bluish bright area (where the B or R channel is nearly blown), prevents the algorithm to abruptly stop at this edge, but rather overshoots the edge and start filling the the adjacent pixels with blue or red 'halo', whichever seem to be the brightest, into the darker green foliage...

Does such explanation makes any sense? And what would be a course of action to reduce it properly?

Looking at the image, I'd agree that it seems to at least partially a demosaicing issue. Many demosaicing mechanisms have problem with transitions between colors; they tend to use transitions in luminance to effectively "detect" chroma transitions. Which is generally a good idea, but can result in what you're seeing - effectively, the demosaicer "misses" the transition, because its largely in chroma. The worst case for this is a blue <-> red transition. It does seem quite extreme in this case.

There's not actually much you can do, other than trying a different raw converter. One with a "classic" ahd implementation, e.g., dcraw, would probably be a good test. (or PhotoRaw Lite, if you have an iPad)

Just an idea: I'm incapable of testing it, but the word is that X-trans in-camera processed JPGs are good and don't suffer from the colour bleeding that computer RAW processors produce. Do you know of anyone trying to reverse-engineer what Fuji is doing, by e.g. replacing RAF data with known patterns, reprocessing it in the camera, and trying to guess what they do differently as compared with standard demosaicking?