That test is with a pure grayscale subject, so that CFA pixels of any color are near perfect proxies for luminance measurements, making it an unnaturally easy case for retaining resolution in demosaicing.

Only a bit easier, and luminance is the most important aspect of visual resolution. Besides, don't forget that even a pathological case (extremely unlikely to occur in real nature, next to each other from opposite sides of the spectrum) of a Blue/Red target has a significant luminance contrast difference. Only the even more unlikely situation of a Red tint and a Blue tint with approx. equal luminance contribution, would create a worst case scenario for resolution. To satisfy a few masochists, I've tried to create such a target. A link can be found at the first P.S. of this post.

Quote

On real world issues is that sharp luminance boundaries are likely to go with shifts in color, causing more oclor moiré issues and so need a stronger OLPF or stronger postprocessing to avoid aliasing artifacts. I prefer the rather consistent subjective observstion with natural subject matter that it takes about twice as many Bayer CFA photosites as X3 photosites to get comparableperceived sharpness.

That would presumably include using an OLPF on the Bayer CFA and no OLPF on the Foveon, and relatively large sensels, and no access to more than 16MP sensors. Hardly a realistic assumption, when you e.g. look at what a D800 produces compared to a D800E. Visibly more False color artfacts (but not impossible to correct in post-processing) and 1% more luminance (extinction) resolution for the AA-less version.

Limiting visual (extinction) resolution. Both sensors types would resolve close to the physical limit of the Nyquist frequency.

Quote

The whole game will shifts of course if we get to the regime of oversampling, with resolution limited almost entirely by the lens, not the sensor. Then the main potential advantage of X3 over CFA is the effect on low-light performance of counting most of the received photons versus only about 40% of them.

As explained before, that's not how the math works out, and the proof is in the differences in high ISO performance where photons are a limited commodity.

P. S. As Erik pointed out, the actual human fovea uses single color photodectors more like a CFA sensor: cones each of whcih gives a signal for one of red, blue, or green ... along with a few pure luminosity signals from rods. Not quite theGRBWtried by Kodak and Sony at times, but closer to that than to Foveon X3.

I am not sure that this line of reasoning have much relevance.

Our human senses may have all kinds of limitations, defects, nonlinearities etc. The goal of a camera/sound recorder/... is usually not to _mimic_ those senses, but to recreate reality in such a way as to _fool our senses_. Often, those two goals are overlapping (if our hearing only extends to 20kHz, then a recording system does not need to recreate anything > 20kHz), but at times, they may not (e.g. we may perceive a very dark scene, viewed physically, as "noisy", but if a camera/display system adds noise on top of that, the total perceived noise may be unrealisstic).

Quite right, except my comment was not a line of reasoning at all; it was just a comment on the ironical innacuracy of the brand name "Foveon". I see no reason to think that functioning more or less like the human eye makes a sensor better or worse.

Bart,Which math are you referring to? Your measurement of a mere 6% difference in extinction resolution with a pure gray scale target is not of much interest to me, though at least gray scale is far less bad than the other extreme of pure red/blue resolution charts. Do you have comparisons based on MTF50 or similar?

And please, let us separate the poor low light performance of Foveon's particular approach from the more general potential of other approaches of "X3" detection.

Bart,Which math are you referring to? Your measurement of a mere 6% difference in extinction resolution with a pure gray scale target is not of much interest to me, though at least gray scale is far less bad than the other extreme of pure red/blue resolution charts. Do you have comparisons based on MTF50 or similar?

Hi,

The Math is about only sampling a single channel does not mean a loss of sensitivity, requiring a longer exposure to compensate. The not sampled channels are added by interpolation.

There is nothing sacred about MTF50, it's just the spot in a system MTF curve at 50% modulation. The level of detail at that position depends on the the sensor (sensel pitch and number of sensels) and the subject contrast, and thus varies in absolute Cycles/mm.

The simplest way of comparing resolution is by inspection of the limiting visual resolution of a sinusoidal star target. The sinusoidal grating is important to avoid aliasing atitfacts in discrete sampling devices such as our sensor arrays. When we can no longer see the smaller detail, then its contrast has been reduced to zero. BTW contrast (a function of Lens MTF and sensor MTF) can be boosted in post processing, e.g. by wavelet decomposition or high pass sharpening for specific levels of detail, provided that there is some contrast (Signal > Noise) left to process.

Let's have a look at the following example, based on the central areas of the star targets from the above link:

A central crop of the regular RGB star target, and one of the special '~same luminance Red/Blue' version:

In Photoshop, I applied a Gaussian blur with radius 0.70 (which simulates a very good lens at its optimum aperture) to each, and I added a Multiply blending layer with a RGGB pattern, which produces these Bayerized versions:

I then used a program (PixInsight) that allows to demosaic a Bayer CFA image with the generic VNG demosaicing algorithm (much better than bilinear or bicubic interpolation, but maybe not the best that is possible), which produced the following:

I added a Red circle to mark the Nyquist frequency (@ 92 pixels diameter) , and a Green circle to mark the visual limiting resolution. The limiting resolution as I saw it for the RGB demosaiced version was at 96 pixels blur diameter, which was 4.2% (92/96) below Nyquist. Some could prefer to draw the limit at 98 pixels blur diameter, it's a bit arbitrary visually, which would be 6.1% below Nyquist. That's pretty consistent with my findings some 9 years ago, based on experiments with another Demosaicing algorithm.

The pathologically unlikely case of the adjacent colors from both opposite ends of the visual spectrum with about equal luminance to put the Bayer CFA at its most disadvantage, indeed produces the expected result. I've put the visual resolution blur limit at a diameter of 184 pixels (green circle), which is half (92/184) of the Nyquist frequency. That's exactly consistent with half the sampling density of the Red / Blue sensels compared to the Green sensels.

That demonstrates the potential effects of Demosaicing only(!). A Foveon like sensor also cannot exceed the Nyquist frequency. The constructed 'R/B' only Bayer CFA version has zero luminance contribution in the green channel, which is also extremely unlikely, because of the CFA filter characteristics (they are not perfect bandpass filters) and the effect of a possible OLPF (which spills some signal to adjacent sensels). It's also useful to realise that Demosaicing algorithms in general, use luminance differences to boost the Red and Blue resolution to virtually the same level as the Green resolution.

That is interesting if a bit vague. Are you saying that the last picture with the large cyan circle is a picture from a shot you took? Sorry, I have a bit of trouble following the post. I may need more coffee.

What I think I can interpret is that in de-bayering most detail comes from luminance information. The circle from a black white star pattern is very close to nyquist while the circle from your red blue is about double. So the ability to get a color right is much worse than getting luminance right, so color accuracy on fine color detail is suspect. That is the point of the thread right? To say getting real color data at a point is much better than guesstimating.

Bayer is great when you have sections of similar tones. Any fine random or fractal type pattern would look poor. I think that matches with our experience using the cameras. You need a few similar pixels for it to guess the color.

Another experiment we can all easily do is take a picture of a color checker or even better, IT8 target, then take another when the patches are far enough away to be about the size of 1 or 2 pixels. Will the measurement of the colors be the same? Take several, 1 pixel, 2x2, 3x3, 4x4. Then you know in the real world what length of lens you need to get proper color on the detail of a shot.

That is interesting if a bit vague. Are you saying that the last picture with the large cyan circle is a picture from a shot you took? Sorry, I have a bit of trouble following the post. I may need more coffee.

No cameras involved, only mosaicing and demosaicing, to show the influence of only sampling part of the color info versus all info (the originals at the top). Coffee might help to see what's the actual loss and what is not.

Quote

What I think I can interpret is that in de-bayering most detail comes from luminance information.

Correct, and color also carries a luminance component, so luminance is not only supplied by the Green filtered sensels.

Quote

The circle from a black white star pattern is very close to nyquist while the circle from your red blue is about double. So the ability to get a color right is much worse than getting luminance right, so color accuracy on fine color detail is suspect.

Not exactly. Only colors with virtually no luminance contrast will have reduced resolution, all others will not suffer as much from the partial color sampling per sensel. There may be some false color artifacts depending on the demosaicing algorithm, due to the different sampling densities between Blue/Red and Green (and thus different Nyquist limits and aliasing artifacts).

Quote

That is the point of the thread right? To say getting real color data at a point is much better than guesstimating.

The thread is about the different trade-offs, IMO.

Quote

Bayer is great when you have sections of similar tones. Any fine random or fractal type pattern would look poor. I think that matches with our experience using the cameras. You need a few similar pixels for it to guess the color.

Bart, Thanks for the details, though unfortunately they have the familiar limitations of a purely mathematical attempt to quantify a subject that involves the complications of the human visual system. All we get for sure is that a Bayer CFA sensor can retain some slight resolution at extremely low MTF (so that features might often by invisible in practice) almost up to Nyquist, while color details might be limited to as low as half that, while an X3 sensor can be close to Nyquist regardless of color issues, though the significance of "color resolution" is unclear, due to the way our eyes work.

Perhaps the easiest way out is to accept that sensor resoluton is heading towards being abundant ("oversampling is coming"), so that other comparisons like low light handling, dynamic range, and color accuracy are far more important.

The Noise Ninja Calibration target is a good one to use. Colors are fairly randomly spread out.

How's about someone with a Canon, Nikon, Sony, Pentax, Olympus each try shots with these colors near 1x1 to 5x5 pixels. You can take pictures of your screen. The absolute color doesn't matter, the color difference from size to size does.

It would also be interesting to see the differences from one raw converter to another if any.

Bart, Thanks for the details, though unfortunately they have the familiar limitations of a purely mathematical attempt to quantify a subject that involves the complications of the human visual system. All we get for sure is that a Bayer CFA sensor can retain some slight resolution at extremely low MTF (so that features might often by invisible in practice) almost up to Nyquist, while color details might be limited to as low as half that, while an X3 sensor can be close to Nyquist regardless of color issues, though the significance of "color resolution" is unclear, due to the way our eyes work.

Exactly, although the worst case scenario is very unlikely to occur. We can get an idea of what the relative importance of color is for resolution by looking at the 'ab' channels of a 'Lab' colorspace image. Color resolution doesn't fluctuate as rapidly as luminance does. That's why the Bayer CFA works as good as it does.

Quote

Perhaps the easiest way out is to accept that sensor resoluton is heading towards being abundant ("oversampling is coming"), so that other comparisons like low light handling, dynamic range, and color accuracy are far more important.

Yes, oversampling will make color resolution a moot issue and it will increase luminance resolution even further, and other issues like dynamic range are easier to solve with a Bayer CFA because of the larger silicon real estate for deep wells at a given sampling density. Only a single band of colors needs to be stored in a well, instead of 3 wells that are 1/3rd the size.

Exactly, although the worst case scenario is very unlikely to occur. We can get an idea of what the relative importance of color is for resolution by looking at the 'ab' channels of a 'Lab' colorspace image. Color resolution doesn't fluctuate as rapidly as luminance does. That's why the Bayer CFA works as good as it does.

Not quite 'lab', but perhaps sufficient for this discussion:http://en.wikipedia.org/wiki/Y%27CbCr"A color image and its Y, CB and CR components. The Y image is essentially a greyscale copy of the main image."

Not quite 'lab', but perhaps sufficient for this discussion:http://en.wikipedia.org/wiki/Y%27CbCr"A color image and its Y, CB and CR components. The Y image is essentially a greyscale copy of the main image."

Hi,

Exactly, and that BTW is also why such colorspaces compress so efficiently (even at high quality settings), because 2/3rd of the image has low frequency low modulation data which requires fewer bits to encode the per pixel differences.

Exactly, and that BTW is also why such colorspaces compress so efficiently (even at high quality settings), because 2/3rd of the image has low frequency low modulation data which requires fewer bits to encode the per pixel differences.

Cheers,Bart

It's actually that nature of scalar quantization in the usual compression formats that make dependence on a color space important. Effectively, the color channels in the various color spaces are the directions they point in. Some directions compress more with scalar quantization. However, if vector quantization is used, which typically offers more compression than scalar quantization, then directionality doesn't help. And effectively, RGB, YCbCr, or another matrix-derived color space should compress the same.

It's actually that nature of scalar quantization in the usual compression formats that make dependence on a color space important. Effectively, the color channels in the various color spaces are the directions they point in. Some directions compress more with scalar quantization. However, if vector quantization is used, which typically offers more compression than scalar quantization, then directionality doesn't help. And effectively, RGB, YCbCr, or another matrix-derived color space should compress the same.

I'd suggest that YCbCr combined with spatial downsampling of Cb/Cr channels (e.g. "4:2:2", "4:2:0" etc) is used because it:1. Is low cost computationally2. Performs bandwidth reduction before a codec (that may have a high processing cost per input pixel)3. Maps reasonably well to perceptual correlates, meaning that you can reduce precision (quantize) with fairly low complexity while still having reasonable rate/distortion trade-offs4. Maps reasonable well to redundancy in real-world images, meaning that the bandwidth that is dropped often contains little information

I imagine that all of this can be done with vector quantization, but I imagine that you'd spend lots of cycles and complexity achieveing what can, in practice be had a lot cheaper. It has been a while since I looked at vector quantization but as I remember it, it was essentially the solution to every problem - given that you could afford to build sufficiently large vectors, something that often is not realtistic?

I'd suggest that YCbCr combined with spatial downsampling of Cb/Cr channels (e.g. "4:2:2", "4:2:0" etc) is used because it:1. Is low cost computationally2. Performs bandwidth reduction before a codec (that may have a high processing cost per input pixel)3. Maps reasonably well to perceptual correlates, meaning that you can reduce precision (quantize) with fairly low complexity while still having reasonable rate/distortion trade-offs4. Maps reasonable well to redundancy in real-world images, meaning that the bandwidth that is dropped often contains little information

I imagine that all of this can be done with vector quantization

Yes. Especially, if you do VQ on each color channel separately. In that case, like scalar quantization, the directions in which color channels point is important, and different color spaces will compress differently. YCbCr is just one set of such directions. Not the optimal. Optimal depends upon image content. However, if we want to pick a fixed transformation for all images, much like YCbCr, there exists better transforms than YCbCr that result in more compression. YCbCr is not bad, but there are better choices. And, such choices will work better even with scalar quantization.

Astro images: as long as you have accurate tracking, you will be able to shoot the same field of stars, and the deep sky objects (stars and nebulae) don't move in relation to each other. Just align the R, G, and B (or other filter) shots and process!

Landscapes: features move with relation to each other. In the case of a triad of R, G, and B capture, fusing it with any intervening motion is going to be problematic and require interpolations - similar to the Bayer demosaicing process (not computationally, but spatially).

...YCbCr is just one set of such directions. Not the optimal. Optimal depends upon image content. However, if we want to pick a fixed transformation for all images, much like YCbCr, there exists better transforms than YCbCr that result in more compression. YCbCr is not bad, but there are better choices....

Given that the (perhaps) main feature of "4:2:0" is that it lays the ground for 10:1 or 100:1 lossy compression in codecs that seem to have been tuned to the characteristics of YCbCr (BT 601 or 709) in the way that it trades visual errors for bandwidth reduction in a way that can only be reliably measured using largish panels of viewers, how would you go about to make a replacement, and confirm that it improves the end-to-end characteristics significantly more than the measurement uncertainty?

By sensitivity I mean required exposure time at a given ISO setting. Some are suggesting that up to a stop can be gained by not filtering out 2/3rd of the spectrum at a given sampling position, which is not true because the 2/3rds are added through interpolation instead of being sampled directly.

I am having a little trouble understanding how 2/3rds of the information would be made up digitally in post. Aren't we confusing exposure with brightness, like when an underexposed image's brightness is increased through Compensation in post? The difference can be found in the noise - even just in the luminance channel.

Simplifying, assuming natural daylight and 3 contiguous sensels of the same size from the two sensors above (RGB for the one with CFA and L1L2L3 for the one without) with the same exposure so that the Ls are below saturation, aotbe the raw count in the RGB sensels is going to be 1/3 that of the L pixels, as can be seen in the histograms in joofa's post. Sure, one can increase their values digitally in post (let's call it 'interpolation':-) so that they are very similar, but their noise (SNR) would still be 1.6 stops worse.

If on the other hand one wanted similar results off the bat in terms of raw values and SNR performance, one would have to increase the exposure time of the sensor with CFA by three times aotbe... that would mean that the non-CFA sensor performs at ISO 300 how the CFA sensor does at ISO 100, a significant 1.6 stop noise advantage :-)

I am having a little trouble understanding how 2/3rds of the information would be made up digitally in post. Aren't we confusing exposure with brightness, like when an underexposed image's brightness is increased through Compensation in post? The difference can be found in the noise - even just in the luminance channel.

Hi Jack,

No we are not confusing exposure with brightness, at least I am not.

Let's zoom in on one single pixel, and for the sake of simplicity let's assume it records values from 0 to 255 for each channel, and let's disregard gamma. From an RGB Foveon type of sensor we may get a Raw recorded data reading of [128,128,128] because 3 channels are sampled, and from a Bayer CFA filtered we may get [128,0,0], or [0,128,0], or [0,0,128] depending on the color of the filter. So for the Bayer CFA we have 2/3rds missing, but the corresponding channel does record something.

Now we do a Bayer CFA demosaicing, which has nothing to do with amplification! The Bayer CFA demosaicing (a very clever interpolation) will use the surrounding sensel positions to estimate the most likely value for the missing channels. When we happen to be watching a uniform patch of gray, then the interpolation between surrounding Green filtered sensels will read 128 all around our pixel, so the interpolation decides that the missing Green channel should probably also be 128. Thus, after one interpolation, we get either [128,128,0], or [0,128,0], or [0,128,128] depending on the color data that was actually sampled. Likewise the interpolation from neighboring pixels suggests that the Red channels that were not sampled are probably 128, which gives us [128,128,0], or [128,128,0], or [128,128,128]. And after interpolating Blue from the surrounding Blue filtered sensels, we'd get [128,128,128], or [128,128,128], or [128,128,128] depending on the filter color.

As you can see, the interpolation guessed right regardless of the filter color (because a uniform patch is simple to interpolate) regardless of the amount of light that was recorded through the filter and the colors that were absorbed by the filter. The missing channel data and the level were interpolated/reconstructed from the surrounding pixels.

So despite of only really sampling 1/3rd of the light at each pixel position, the reconstruction by interpolation gives the same RGB output brightness for both types of sensor.

Cheers,Bart

P.S. When you zoom in on the earlier synthesized CFA images in the middle row, you'll see exactly what I described (single channel colors, either R, G, or B), but there things have more detail. The original brightness of the originals in the first row, has been reconstructed by interpolation.