Demosaicing and bit dimensions

OK, if we have a sensor that has, say, 16 megapixels, we expect that the final image produced also has 16 megapixels — one pixel per sensel, right?

While this seems common-sensical, are there practical mathematical reasons for a demosaicing algorithm to deliver another size, not mapping sensels to pixels 1:1? Not simply to resize, but because the it delivers superior results in one way or another?

If your subject was entirely red or entirely blue it might make sense to simply discard the other three pixels in each group.

For example, if you were photographing a poster printed in black ink on red paper.

In a typical scene, most colours are not highly saturated. The assumption that, if a green pixel is bright, a red pixel that was in the same place would also have been bright works quite well. This is the basis of the demosaic algorithms.

So we can generally treat each single colour pixel as recording all colours, and a 16 Megapixel camera as having 16 million full colour pixels rather than 4 million. But the result will never be quite as sharp for fine colour detail as from a Foveon sensor with the same number of pixels.

OK, if we have a sensor that has, say, 16 megapixels, we expect that the final image produced also has 16 megapixels — one pixel per sensel, right?

While this seems common-sensical, are there practical mathematical reasons for a demosaicing algorithm to deliver another size, not mapping sensels to pixels 1:1? Not simply to resize, but because the it delivers superior results in one way or another?

I've though for about a decade now, that increasing the pixel resolution of the render would be beneficial with under-sampled lenses. For example, any geometric distortion corrections, or CA corrections need more resolution to place the weighted edges in the right locations. Doing CA correction in an under-sampling resolution requires either sporadic softening or increased aliasing to shift the color planes.

Of course, future sensor with high pixel densities would not benefit as much or at all, nor would current pixel densities at very high f-stops or with pinholes. Most current digital photography under-samples lenses, though, in general.

I've though for about a decade now, that increasing the pixel resolution of the render would be beneficial with under-sampled lenses.

Interesting. When I was working on my first major photo project, back in 2008, about 40% of my photos were taken with low-resolution point-and-shoot cameras — the rest with a contemporary DSLR.

After lots of trial-and-error, I found out that I got the best results with the PnS cameras when I increased the resolution from the raw files from their native 5 and 6 MP to something larger — 11 MP if I remember, which also happens to be about the size needed for the images. As you mentioned, this makes it easer to correct for lens defects, and it made the images more docile to the extra sharpening that they needed. But I wonder if this was simply due to my lack of good upsizing algorithms — I only had Bicubic Smoother in Photoshop.

But this goes against the trend in smartphone cameras (and Nikon Dx cameras) of oversampling images and demosaicing them to pixel dimensions smaller than the sensor dimensions, or ‘binning’.

But this goes against the trend in smartphone cameras (and Nikon Dx cameras) of oversampling images and demosaicing them to pixel dimensions smaller than the sensor dimensions, or ‘binning’.

Existing cameras of all sensor sizes do not have enough pixels to oversample at typical f-stops. You need to use a pinhole on a DSLR, or a telephoto with stacked TCs to get f-stops that cause over-sampling. Most P&S and cellphone cameras do not allow their apertures to get small enough to cause over-sampling.

Over-sampling, to me, is when you have more than 6 pixels in a line-pair cycle, or it takes more than 4 pixels inclusive to render the sharpest possible optical transient at a useable contrast, and these needs double for independent color channels. Proper sampling is when the discrete capture is virtually analog. We're very far from that, with typical aperture sizes relative to our pixel sizes.

OK, if we have a sensor that has, say, 16 megapixels, we expect that the final image produced also has 16 megapixels — one pixel per sensel, right?

While this seems common-sensical, are there practical mathematical reasons for a demosaicing algorithm to deliver another size, not mapping sensels to pixels 1:1? Not simply to resize, but because the it delivers superior results in one way or another?

I do believe the best you can do with a Bayer grid is to rotate it 45 degrees, and then use the green detectors as basis for the output. Then a 10 MP sensor becomes a 5 MP sensor, but a superior one. If you want to get as much as possible out of that rotated sensor, you might also make a 20 MP image.

Oops, Fuji already has done this! And they got heavily attacked when they tried the 20 MP trick. So, they were forced to output 10 MP from the 10 MP sensor, which, of course, is a very bad idea. So - they stopped making those sensors, as they were market wise impossible to sell.

Ah, I defined oversampling as collecting more pixels at the sensor than the lens can sharply deliver.

Oh, I think you need some more to make it interesting. You really want to be able to apply some digital filtering to improve the quality of the image. Then you need at least 4x more than the lens can deliver I assume. Or maybe it is enough with 2 or 3?

OK, if we have a sensor that has, say, 16 megapixels, we expect that the final image produced also has 16 megapixels — one pixel per sensel, right?

While this seems common-sensical, are there practical mathematical reasons for a demosaicing algorithm to deliver another size, not mapping sensels to pixels 1:1? Not simply to resize, but because the it delivers superior results in one way or another?

I do believe the best you can do with a Bayer grid is to rotate it 45 degrees, and then use the green detectors as basis for the output. Then a 10 MP sensor becomes a 5 MP sensor, but a superior one. If you want to get as much as possible out of that rotated sensor, you might also make a 20 MP image.

Oops, Fuji already has done this! And they got heavily attacked when they tried the 20 MP trick. So, they were forced to output 10 MP from the 10 MP sensor, which, of course, is a very bad idea. So - they stopped making those sensors, as they were market wise impossible to sell.

With the Pentax Q i get working oversampling, with good FF/FX lenses. That's a 1.5µm pixel structure. With the Nikon 1V2, that has ~3µm pixel structures, some good lenses are still undersampled to such a degree that you get aliasing patterns at their best apertures. But that's probably right on the edge of any normal to very good FF lens after you've included losses for diffraction and the filter plate structures - the filter plates induces quite a lot of point spread, even when you don't have the birefringent AA filter structures included.

Working that 45º example out to a 36x24mm FF area, that's 96MP, to get a non-aliased 48MP image.

Comparing that to say a 48MP normal 0/90º Bayer image, I doubt you get ANY improvements at all - and I can bet quite a lot that the process to make the pixel structure sqrt(2) smaller involves quite a bit of losses in both angle sensitivity and overall sensitivity. This will balance out with the Bayer interpolation inaccuracies to get you a net sum of nothing, or 1:1 ratio of improvement if you put it that way. So: Pay more to get nothing.

OK, if we have a sensor that has, say, 16 megapixels, we expect that the final image produced also has 16 megapixels — one pixel per sensel, right?

While this seems common-sensical, are there practical mathematical reasons for a demosaicing algorithm to deliver another size, not mapping sensels to pixels 1:1? Not simply to resize, but because the it delivers superior results in one way or another?

Not really, since we're still firmly in undersampled territory with 16MP captures - both with APS and FF sensors. No matter what you do, the image is undersampled, meaning that you have large voids in the data map that's supposed to be a correct rendering of the object space.

And - as soon as we get pixels small enough to give correct sampling, any enlargement of the output image is pointless. The only way to improve a correctly sampled image is to downsample it.

But what you're suggesting has already been done... Though in a quite different scenario.

Using very simple lenses with known aberrations, aberrations that give effective point spreads several pixels wide, you can arrive at a very correct (and VERY sharp!) image that's about half the original size if you do the Bayer interpolation for each position on the sensor by solving backwards for the optical PSF result. This is one way to sharpen an oversampled original data map, or if you put that in another way; get the optimal, correctly sampled resolution from a system.

Unfortunately, this does of course only work with "known" lenses with PSF functions that are neither to big or to small over the entire image surface - which limits the flexibility of the system.

Ah, I defined oversampling as collecting more pixels at the sensor than the lens can sharply deliver.

Oh, I think you need some more to make it interesting. You really want to be able to apply some digital filtering to improve the quality of the image. Then you need at least 4x more than the lens can deliver I assume. Or maybe it is enough with 2 or 3?

Several years back I did some simulations with blurred B&W edges, box filtering them at various sizes, and upsampling them back to original size and also pixelating them, and it really wasn't until it took 4 pixels inclusive (or 3 exclusive) that luck of alignment had no significant distortive effect on the shape of the edges. That's my standard; virtual analog. Anything less is under-sampling, IMO. I don't believe the common "wisdom" of Nyquist sampling, and reconstruction. It doesn't work. It doesn't even work perfectly well for audio, but we get away with it with audio because audio is experienced after the fact and we can not do the auditory equivalent of staring at and studying the waveform, and real-world sounds rarely have perfectly stable pitches, and have some frequency modulation, so artifacts are very short in occurrence. For imaging, even the slightest distortion is visible, especially with video, as it shimmers with slight camera/subject registration changes.

I know that with current computers and storage media, virtually-analog imaging is somewhat impractical (unless one is using very small apertures, focus is not achieved, or there is circular camera blur), but that is where ideal imaging lies.

I do believe the best you can do with a Bayer grid is to rotate it 45 degrees, and then use the green detectors as basis for the output. Then a 10 MP sensor becomes a 5 MP sensor, but a superior one.

It's going to be more aliased, of course, without the witness of the mixing sensels.

If you want to get as much as possible out of that rotated sensor, you might also make a 20 MP image.

Oops, Fuji already has done this! And they got heavily attacked when they tried the 20 MP trick. So, they were forced to output 10 MP from the 10 MP sensor, which, of course, is a very bad idea. So - they stopped making those sensors, as they were market wise impossible to sell.

Well, doubling the pixel count in the output is absolutely necessary if you don't want a lot of artifacts, with a diagonal-grid sensor. The idea behind it, IIRC, was that it resolved horizontal and vertical lines better than the traditional orientation, and these edges are more prevalent in most photos.

There you have my total agreement. This is also one of my own favourite subjects in digital imaging: The discrepancy between capture resolution you need to get a correctly sampled image, and the amount of "unnecessary" data this entails. Data redundancy, if you put it that way.

But you're wrong at one point - Bayer saves only one data-point per image point - so it carries a 3:1 compression ratio just by being what it is. A fully populated file has three data points per image point, making it three times "heavier" per image point than the Bayer-coded image.

I want a camera with a ~50-60MP+ capture. For pure image quality reasons. But what I DON'T want is a 60MP = 100MB+ (?) raw file...

In the end, I'd be very satisfied with a ~15-20MP raw format scale of image data, data that has not yet been color-transformed or treated to any other kind of non-linear transform - and I want it in a good tone resolution format. And I want each and every single data point in the raw file I save to have an objective meaning - any surplus weight is dead weight, just megabytes of crap to suffer through in file handling.

We had this discussion before, and 20MP's of a linear-format three-layer data plane saved in a gamma 2.0 data representation in 10-bit depth format gives ~4 bytes per pixel. Without compression, that's 80MB. With any reasonably close-to-lossless compression, it's a 30MB raw file. Where every bit carries real image information.

Since the data is already in three-plane form, it would also speed up the raw conversion by quite a lot - it's the Bayer-interpolation that takes up most of the overhead when reviewing and developing the raw files we have today.

....................

So, oversampling image capture, downsampling before image save. IMO that's the optimal way to go, until we finally get the working full-sampled sensor (three colors per image point) working well. And I'm sorry - the Foveon isn't even close. It has huge color problems, making it useless for any kind of application that demands accurate color reproduction. And it has extremely low light efficiency, making it fairly useless for any low-light scenario. It's a one-trick pony, only able to give good resolution in good light, as long as you don't care about color accuracy. Only a very small part of the photography going on today is covered by that performance envelope.

Using very simple lenses with known aberrations, aberrations that give effective point spreads several pixels wide, you can arrive at a very correct (and VERY sharp!) image that's about half the original size if you do the Bayer interpolation for each position on the sensor by solving backwards for the optical PSF result. This is one way to sharpen an oversampled original data map, or if you put that in another way; get the optimal, correctly sampled resolution from a system.

Unfortunately, this does of course only work with "known" lenses with PSF functions that are neither to big or to small over the entire image surface - which limits the flexibility of the system.

This seems to be the problem with Photoshop’s new Shake Reduction feature. It hasn’t worked with any image I’ve yet to throw at it. Even where I have really obvious camera motions, evidenced by multiple point sources of light, the algorithm never comes close to nailing it. At best it seems to work as a sharpen function.

Ah, I defined oversampling as collecting more pixels at the sensor than the lens can sharply deliver.

Oh, I think you need some more to make it interesting. You really want to be able to apply some digital filtering to improve the quality of the image. Then you need at least 4x more than the lens can deliver I assume. Or maybe it is enough with 2 or 3?

Several years back I did some simulations with blurred B&W edges, box filtering them

A box filter is a horrible choice. If you choose a decent filter like Lanczos then you don't need much oversampling.

at various sizes, and upsampling them back to original size and also pixelating them, and it really wasn't until it took 4 pixels inclusive (or 3 exclusive) that luck of alignment had no significant distortive effect on the shape of the edges. That's my standard; virtual analog. Anything less is under-sampling, IMO.

I don't believe the common "wisdom" of Nyquist sampling, and reconstruction. It doesn't work.

As I have shown you before, it most certainly does work. Remember this thread?

It doesn't even work perfectly well for audio, but we get away with it with audio because audio is experienced after the fact and we can not do the auditory equivalent of staring at and studying the waveform, and real-world sounds rarely have perfectly stable pitches, and have some frequency modulation, so artifacts are very short in occurrence. For imaging, even the slightest distortion is visible, especially with video, as it shimmers with slight camera/subject registration changes.

Not if you do the processing properly.

You do need to be a bit above Nyquist for reconstruction, ex. 2.5 samples per period. There is also a consideration of the effect on the noise of a pixel's area integration versus extreme oversampling followed by optimal filtering: but the difference becomes negligible above 4 samples per period (2x oversampled versus Nyquist). So the extreme oversampling that you recommend is just not warranted.