I tried out the Yaroslavsky/Happonen sinc interpolation routines on a few images. It turns out that his sliding window method performs as advertised and is very effective at avoiding ripple artifacts. Unfortunately there is a trade-off in that the sliding window method rolls off the higher frequencies enough that, in the end, it was no better in my tests than convention methods like bicubic. (The sliding window has other attractive features such as optional noise reduction, but I did not test that.)

Thanks for your experiments. They are really instructive. Do you know if the sliding window method as implemented applies a windows (such as Kaiser, etc.) to the blocks/segments of data? It is not surprising that windowing would give a better visual output. Some of the drawbacks of using full image data for interpolation, etc., are well-known. It is one of the reasons that one has interpolation schemes limited to say cubic (x^3), etc., but you won't normally see (x^100, etc.), especially if noise is a concern. Polynomial interpolation on short segments is more useful than on whole data.

DCT can certainly help as compared to DFT since its energy compaction is greater. That is why you see DCT in compression stuff such as MPEG, etc., but not DFT.

As far as ringing is concerned, it is a product of both the reconstruction filter kernel and also the method used to measure it. I mentioned before that l_1 space is better for ringing suppression, but it appears people are more interested in l_2.

Thanks for your experiments. They are really instructive. Do you know if the sliding window method as implemented applies a windows (such as Kaiser, etc.) to the blocks/segments of data? It is not surprising that windowing would give a better visual output. Some of the drawbacks of using full image data for interpolation, etc., are well-known. It is one of the reasons that one has interpolation schemes limited to say cubic (x^3), etc., but you won't normally see (x^100, etc.), especially if noise is a concern. Polynomial interpolation on short segments is more useful than on whole data.

He shows a windowed sinc but I haven't found any reference to the type of window. See the figure on page 48 here. The answer might be in Happonen's code.

Quote

As far as ringing is concerned, it is a product of both the reconstruction filter kernel and also the method used to measure it. I mentioned before that l_1 space is better for ringing suppression, but it appears people are more interested in l_2.

In reconstruction by approximation both filter kernel shape and how the error between original and reconstructed is measured is important. If you keep the method of measuring error fixed, then the reconstruction kernel determines the amount of ringing. Similarly, the method of measuring error may also be varied to reduce ringing. l_2 is least squares approximation; it is popular because it is straightforward, and differentiable. l_1 is absolute error. The reconstruction error is given as ||f-sum_i(c_i*phi_i)||_p, where, f is the original function, c_i are some coefficients we need to determine, phi_i are reconstruction filter kernel, and p>=1 are the l_p space parameter, ||.|| is the norm, and sum_i(c_i*phi_i) is the reconstructed signal. If c_i are determined by measuring error as l_1, then it is observed that at signal discontinuity the effect of ringing is less, some what reduced; in classical treatment by 1/3 the amount in l_2 case.

There are workarounds in l_2. Stuff such as windowing the sinc function -> lanczos filter, number of negative lobes of reconstruction kernel, etc.

I love this discussion from a theoretical standpoint, but I think it miss the mark a bit from a practical perspective.

Using the sinc() interpolation supports nifty features such as "I can rotate the image umpteen times and get back to the original image"*.If this is important to you, then yes - modeling the image in terms of set of basis functions that can be fully reconstructed from its samples is a great idea.

In practice, since you also want to be spatially invariant, the simplest model is to use sinosoids** as your basis functions, which after doing a ton of math leads to the sinc() interpolation.That is, you model the image as having a limited bandwidth.

However... are you sure this is actually what you want?

You give up a lot of things if you restrict your images to have a limited bandwidth:* Pixel-level sharpness (you can not have a pixel-sharp edge)* Halos are inherent (you can not avoid an overshoot; this is a fundamental result)* Images will, in the general case, contain negative values (otherwise you get a bias)

Perhaps we will eventually accept that we want soft images in 100% because it gives us more latitude and we can apply the same sampling theory we use so successfully everywhere else in engineering. While a CD (sampled at 44100 Hz) can theoretically represent signals up to 22050Hz that is not what we do. We use a nice cut-off filter so the upper frequencies are effectively zero. That is, we actually do not want steep sample-to-sample variations.

However, for now what I see around me is that people compare 100% views and want to see really crisp images. A lit window on a distant office building at dusk should show up as a nicely rendered bright pixel. If this is what we want we have to work around the limitations of the sinc() interpolation.

Perhaps we simply need to figure out what we really want. Given that everyone and their dog have their own private ways to resize images (in particular uprezzing!) I'm not convinced that we really know what goals we are trying to attain.

Regards,

Esben Høgh-Rasmussen Myosotis

*: Up to rounding errors and after the resolution has been limited to the resolution of the horizontal/vertical direction.**: This is a theoretical result - the sinusoids are the only real-valued eigenfunctions to a linear spatial-invariant linear operation. Those fortunate enough not to know what the heck this geek is talking about can ignore this. However this is the dirty mathematical nitty gritty reason that sinusoids are so special that most of our theory is placed on top if it.

Perhaps we will eventually accept that we want soft images in 100% because it gives us more latitude and we can apply the same sampling theory we use so successfully everywhere else in engineering. While a CD (sampled at 44100 Hz) can theoretically represent signals up to 22050Hz that is not what we do. We use a nice cut-off filter so the upper frequencies are effectively zero. That is, we actually do not want steep sample-to-sample variations.

However, for now what I see around me is that people compare 100% views and want to see really crisp images. A lit window on a distant office building at dusk should show up as a nicely rendered bright pixel. If this is what we want we have to work around the limitations of the sinc() interpolation.

People don't want soft images at 100%, nor do they want images with jagged edges at 100%. How about sharp, jaggie-free images at 50%?

I'm not advocating the sinc, with its drawbacks - there are many other kinds of interpolation that can do the job. The point is that the representation of pixels as jagged little squares is not the only option.

Quote

Perhaps we simply need to figure out what we really want. Given that everyone and their dog have their own private ways to resize images (in particular uprezzing!) I'm not convinced that we really know what goals we are trying to attain.

Forgive me while I continue to beat this horse. Here is something practical to try.

Start with an image that you want to sharpen.

1. Up-res by 2x using bicubic (or bicubic smoother).2. Sharpen the image with 2x the radius you would normally use.3. Down-res by 2x using bicubic (or bicubic sharper).

If you are sharpening for print, you can preferably skip step 3 and just print at 2x the pixels/inch (for example, 720ppi instead of 360ppi).

By interpolating ("reconstructing") before sharpening, we can minimize the sharpening of spectral replicas that cause jaggies. The real image data gets sharpened, but not the jaggies. Potentially a higher amount of sharpening can be achieved before the image starts to fall apart.

Here is a 100% crop from the Atkinson "Lab Test Page.tif".

[attachment=19798:upres_sharpen.jpg]

The left side was sharpened with USM amount 500%, radius 0.8, threshold 0. The right side was interpolated 2x, sharpened with USM amount 500%, radius 1.6, threshold 0, then down-sampled by 1/2.

The difference is clearly visible in prints: jaggies are reduced, and there is less accentuation of noise. Smoother and cleaner looking, while just as sharp.

Undoubtedly, there are other image processing operations that will benefit from interpolation in this way.

I'm confused about your use of the term "spectral replicas". The upsampled image is generated (I would think) with the property that the spectral content is band-limited, with no information beyond Nyquist of the original image. So what is one gaining by upsampling? And furthermore, to the extent that upsampling is done by a linear filter, USM is a linear filter, downsampling is a linear filter, then why isn't the sharpening equivalent to a linear filter with slightly different kernel than USM?

Forgive me while I continue to beat this horse. Here is something practical to try.

There's a difference in trying to beat some sense into people, and beating a dead horse. This subject should not be seen as a dead horse. Which category the readers fit in is up to themselves. I welcome a good discussion, and thank you for your contributions.

Quote

By interpolating ("reconstructing") before sharpening, we can minimize the sharpening of spectral replicas that cause jaggies. The real image data gets sharpened, but not the jaggies. Potentially a higher amount of sharpening can be achieved before the image starts to fall apart.

Indeed, a good point to stress, and it is another practical reason why one should interpolate to the native resolution of a printer with a good algorithm, instead of an unknown (most likely sub-optimal bilinear) filter in firmware, and then sharpen. BTW it's one of the reasons why people like the Qimage output better than the same print directly from Photoshop.

I have made very sharp (virtually artifact free) enlarged output by deconvolution sharpening of interpolated image data. The results are much better than sharpening at the native capture size and then letting the printer do the interpolation, as is advocated by some 'authorities'.

I'm confused about your use of the term "spectral replicas". The upsampled image is generated (I would think) with the property that the spectral content is band-limited, with no information beyond Nyquist of the original image. So what is one gaining by upsampling? And furthermore, to the extent that upsampling is done by a linear filter, USM is a linear filter, downsampling is a linear filter, then why isn't the sharpening equivalent to a linear filter with slightly different kernel than USM?

The following illustration is from Pratt's "Digital Image Processing" shows the 1-dimensional case:[attachment=19803:Pratt_Di...ng_pg118.png]The original image is ( a ), with the base-band spectrum around ws=0, and the spectral replicas at +/- ws, +/- 2ws, etc.

By interpolating, we're trying to make the spectrum more like ( c ) the sinc filtered case, but with bicubic the spectrum will be somewhere between ( c ) and ( e ). What I'm saying is that when sharpening we should sharpen ( c ) the base-band image, rather than ( a ) which has all the spectral replicas that appear as jaggies.

You're right that equivalent sharpening could be done with a linear filter, but to do it in Photoshop you would have to have some way to design it and have it fit within the limits of the 5x5 Custom Filter.

Indeed, a good point to stress, and it is another practical reason why one should interpolate to the native resolution of a printer with a good algorithm, instead of an unknown (most likely sub-optimal bilinear) filter in firmware, and then sharpen. BTW it's one of the reasons why people like the Qimage output better than the same print directly from Photoshop.

I have made very sharp (virtually artifact free) enlarged output by deconvolution sharpening of interpolated image data. The results are much better than sharpening at the native capture size and then letting the printer do the interpolation, as is advocated by some 'authorities'.

Thank you for your comments, Bart.

Yes, that is the best way to sharpen for enlarged print output. Less obvious I think, is that interpolation can improve the results of sharpening even in the case where the image is not being enlarged.

Once I get PixelClarity's deconvolution capability fully functional (I'm currently finishing PSF generation code, the database code needed to store and retrieve the PSF data, and the code needed to merge 5 dimensions worth of data down to the 3 dimensions needed for a given image), I'm going to experiment with upsizing before deconvolution. I've also have a few interesting theoretical thoughts regarding using a point spread function for interpolation vs sinc:

Imagine you have a row of 100 point light sources, a lens, and a sensor with 100 photosites in a row, arranged so that if the lens had no aberrations, the light from each point source would be focused exactly on one and only one photosite on the sensor. In such an ideal arrangement, altering the intensity of any given point source would affect the output of one and only one photosite, completely independent of all the others. With a real-world lens and diffraction, this ideal is not achieved; altering the intensity of one point source will affect the outputs of several photosites.

The issue I see with sinc, bicubic, and other interpolation algorithms is this: increasing a single value in a data series can cause neighboring interpolated values to decrease. But this is contrary to what happens in real life; increasing the intensity of a single point light source will increase the outputs of its associated photosite and some of its neighbors, but can never cause any photosite output value to decrease. Conversely, decreasing the intensity of a single point source will cause the outputs of its associated photosite and some of its neighbors to decrease, but can never cause any photosite output value to increase. Therefore, sinc and other common interpolation algorithms work in a way that at times is opposite of the behavior of the real world (because increasing one output value can cause some interpolated values to decrease, and vice versa), and alternative methods should be investigated.

Given these principles, the thought that occurred to me is this: if one could devise a way to interpolate using a curve derived from the appropriate point spread function instead of the sinc function, one could simultaneously perform reconstruction/upsizing AND correct for lens blur, while reducing or completely eliminating clipping/ringing artifacts. Thoughts?

Imagine you have a row of 100 point light sources, a lens, and a sensor with 100 photosites in a row, arranged so that if the lens had no aberrations, the light from each point source would be focused exactly on one and only one photosite on the sensor. In such an ideal arrangement, altering the intensity of any given point source would affect the output of one and only one photosite, completely independent of all the others. With a real-world lens and diffraction, this ideal is not achieved; altering the intensity of one point source will affect the outputs of several photosites.

The issue I see with sinc, bicubic, and other interpolation algorithms is this: increasing a single value in a data series can cause neighboring interpolated values to decrease. But this is contrary to what happens in real life; increasing the intensity of a single point light source will increase the outputs of its associated photosite and some of its neighbors, but can never cause any photosite output value to decrease. Conversely, decreasing the intensity of a single point source will cause the outputs of its associated photosite and some of its neighbors to decrease, but can never cause any photosite output value to increase. Therefore, sinc and other common interpolation algorithms work in a way that at times is opposite of the behavior of the real world (because increasing one output value can cause some interpolated values to decrease, and vice versa), and alternative methods should be investigated.

Jonathan,

I don't follow the part about how increasing a photosite value causes a decrease in neighboring photosites. If the input to your hypothetical system is band-limited (as required for sampling), interpolation should work as expected. A point-source won't light up a single pixel in band-limited system. After passing through an optical system, a focused point source will be an Airy disk (jinc^2), spread around among local pixels.

I don't follow the part about how increasing a photosite value causes a decrease in neighboring photosites. If the input to your hypothetical system is band-limited (as required for sampling), interpolation should work as expected. A point-source won't light up a single pixel in band-limited system. After passing through an optical system, a focused point source will be an Airy disk (jinc^2), spread around among local pixels.

Increasing a value doesn't decrease neighboring sampled values, but depending on the interpolation algorithm, it can cause interpolated values to decrease. The general point I'm making is that increasing the intensity of one of the point light sources cannot result in the decrease of any sampled value, for obvious reasons. It shouldn't be allowed to decrease any interpolated values between samples, either.

If you have a natural cubic spline with a row of data points of 0.2, with one data point in the middle of 0.5, the interpolated spline value will drop to about 0.16 ±1.5 samples from the 0.5 data point. If you increase the peak data value to 0.9, the interpolated spline values will drop to about 0.11 ±1.5 samples from the 0.9 data point. If the interpolation algorithm accurately reflected the realities of optics, the interpolated values ±1.5 samples from the peak value should be ≥0.2 in both cases, with the interpolated value near the 0.9 data point being greater than the interpolated value near the 0.5 data point.

Increasing a value doesn't decrease neighboring sampled values, but depending on the interpolation algorithm, it can cause interpolated values to decrease. The general point I'm making is that increasing the intensity of one of the point light sources cannot result in the decrease of any sampled value, for obvious reasons. It shouldn't be allowed to decrease any interpolated values between samples, either.

If you have a natural cubic spline with a row of data points of 0.2, with one data point in the middle of 0.5, the interpolated spline value will drop to about 0.16 ±1.5 samples from the 0.5 data point. If you increase the peak data value to 0.9, the interpolated spline values will drop to about 0.11 ±1.5 samples from the 0.9 data point. If the interpolation algorithm accurately reflected the realities of optics, the interpolated values ±1.5 samples from the peak value should be ≥0.2 in both cases, with the interpolated value near the 0.9 data point being greater than the interpolated value near the 0.5 data point.

Ok, I think what's happening is that changing the middle value while keeping the other samples constant does not model an optical system without aliasing. If you were to convolve your samples with a realistic psf/otf before interpolating, it should work in a more realistic way. The blur of the optical system will cause a point source to affect more than a single pixel, so if the intensity of a point source changes by factor x, all the pixels within the influence of the psf should change by factor x.

The mtf of an Airy disk is cone shaped in 2D. To be un-aliased, the width of the cone should not exceed half the sampling frequency. This corresponds to an Airy disk a few pixels wide.

If you were to convolve your samples with a realistic psf/otf before interpolating, it should work in a more realistic way.

That's where I'm going with this line of thought, though I don't have a clear idea of exactly how to do this. I'd like to combine the deconvolution-with-PSF and upsampling if possible, with the goals of preserving the greatest possible amount of true image detail, avoiding jaggies and reconstruction artifacts, and minimizing interpolation errors and distortions like clipping and ringing. Has there been any research published along those lines, or is this a new idea?

That's where I'm going with this line of thought, though I don't have a clear idea of exactly how to do this. I'd like to combine the deconvolution-with-PSF and upsampling if possible, with the goals of preserving the greatest possible amount of true image detail, avoiding jaggies and reconstruction artifacts, and minimizing interpolation errors and distortions like clipping and ringing. Has there been any research published along those lines, or is this a new idea?

I'd be surprised if it's a new idea. There is a vast amount of scientific literature on image reconstruction. For example a quick Google found this. I hope you have a university library nearby so you don't have to pay to download a lot of papers! Or if you're lucky you belong to an alumni association that provides free online access to journals.

You might want to take another look at Yaroslavsky's sliding window which, along with interpolation, also allows modification of the DCT coefficients in each window for filtering and de-noising.

That's where I'm going with this line of thought, though I don't have a clear idea of exactly how to do this. I'd like to combine the deconvolution-with-PSF and upsampling if possible, with the goals of preserving the greatest possible amount of true image detail, avoiding jaggies and reconstruction artifacts, and minimizing interpolation errors and distortions like clipping and ringing. Has there been any research published along those lines, or is this a new idea?

[ One of the best features of Matlab is that they often include references in the documentation, so check out the bottom. ]

To do this you give up a smooth 2nd derivative in that case here you would have an overshoot. That should be fine I would think.You also give up the ability to put a peak between pixels, which is more problematic.I think it should be possible to modify the method so it only suppress undershoots but leaves the spline behavior at a peak.This should fit well into your spline-based interpolation as far as I understand your algorithm.

However, over- and under-shooting in the interpolated samples is not a bad thing per se. If you assume, crudely, that the value of a pixel represents the integral over the area (i.e. the number of photons per second over a very long exposure, and the light is a continuous function, then it follows that you will have values above and below that value (well it could be constant). Now, if the neighbor has an increased value, then a continuous interpolating function should increase near it also. To keep the integral constant the undershoot would need to decrease.

The alternative is a discontinuous interpolation, where nearest neighbor is the simplest example. I think this approach has a lot going for it but it is hard - try to Google for "weak membrane". The idea is to have a piecewise smooth interpolation that can "break" at discontinuous edges.

The alternative is a discontinuous interpolation, where nearest neighbor is the simplest example. I think this approach has a lot going for it but it is hard - try to Google for "weak membrane". The idea is to have a piecewise smooth interpolation that can "break" at discontinuous edges.

I'm already doing something along those lines with my modified 2-D spline interpolation code. I use the standard method to calculate the z coefficients for each spline knot, where z is the parameter used in this equation:

Once z has been calculated for a given spline knot, I clip it to ±((MaxValue - MinValue) / N), where N is chosen to minimize ringing artifacts without affecting the spline's behavior away from high-contrast edges:

So far, an N value of 8 seems to be about optimal. Over/undershoot near high-contrast edges is effectively limited without altering the behavior of the spline away from such high-contrast edges. And the effect of clipping this parameter is paradoxically gradual; if the clipped value is only slightly different from the unclipped value, the effect on the spline is slight.

Indeed, a good point to stress, and it is another practical reason why one should interpolate to the native resolution of a printer with a good algorithm, instead of an unknown (most likely sub-optimal bilinear) filter in firmware, and then sharpen. BTW it's one of the reasons why people like the Qimage output better than the same print directly from Photoshop.

I have made very sharp (virtually artifact free) enlarged output by deconvolution sharpening of interpolated image data. The results are much better than sharpening at the native capture size and then letting the printer do the interpolation, as is advocated by some 'authorities'.

Bart,

Your posts are always informative and thought provoking, and your statement about sending the file to the printer at the native resolution of the device makes sense, notwithstanding the advice of certain "authorities". The native resolution of the printer can be difficult to determine in the case of an inkjet using diffusion dither. The "native resolution" of Epson printers is often stated to be 360 dpi or multiples thereof; however, Rags Gardner has done some experiments using interference patterns with the Epson 2200 and found that the "native resolution" was 288 dpi. He also reported that the hardware resolution may differ from that used in the software for resampling. Have you seen this?

Sometimes, a little blurring of the image is desirable. Some time ago I had some images printed on the Fuji Pictrography device, which was a high resolution contone device. I sized the images to the native resolution of the printer and every little defect in my image stood out in great detail.

Also, what deconvolution algorithm do you use for image restoration and how do you determine the PSP? Jonathan Wienke, who is also contributing to this thread, has used FocusMagic with good results for capture sharpening. I have read that deconvolution can correct to some extent the softening due to the blur filter.

Your posts are always informative and thought provoking, and your statement about sending the file to the printer at the native resolution of the device makes sense, notwithstanding the advice of certain "authorities". The native resolution of the printer can be difficult to determine in the case of an inkjet using diffusion dither. The "native resolution" of Epson printers is often stated to be 360 dpi or multiples thereof; however, Rags Gardner has done some experiments using interference patterns with the Epson 2200 and found that the "native resolution" was 288 dpi. He also reported that the hardware resolution may differ from that used in the software for resampling. Have you seen this?

Hi Bill,

Thanks for the kind words. I hadn't seen Rags' article but I just did read it. What he demonstrated IMHO is more due to (limitations in) the dithering algorithm than to the native resolution of the printer itself. A different test pattern (parallel lines) would probably have revealed something else, as in this test. The native resolution is apparently reported back by the printer(driver) after an interrogation from a function call to the printer driver. The reported number can differ depending on the paper (settings) used in the driver (e.g. matte paper may cause to report a lower resolution than glossy paper). The significance of a lines target is because of the human eye's vernier acuity which is perhaps 4x higher than its angular resolution capability would suggest, and inkjet printers have a better capability to offer such resolutions.

The way I understand it, the dithering algorithm assumes a certain PPI input (e.g. 720 PPI for Epson, and 600 PPI for Canon/HP, if glossy paper is selected). Very Large Format printers may have lower PPIs (lower resolution requirements and increase of print speed). If the image size requested and the number of pixels supplied do not result in that native resolution in PPI, then the data supplied will be (bilinearly) interpolated/decimated, after which the dithering engine can do its stuff to mix colors and droplet sizes and use error diffusion to hide color inaccuracies and abrupt boundaries between head passes.

A program like Qimage does such interrogation of the printer driver, and displays it in the user interface. It also resamples the incoming pixels on the fly to match that native resolution, and sharpens at that resolution (to compensate for printer/paper/ink losses of micro-contrast)!

Quote

Sometimes, a little blurring of the image is desirable. Some time ago I had some images printed on the Fuji Pictrography device, which was a high resolution contone device. I sized the images to the native resolution of the printer and every little defect in my image stood out in great detail.

Sometimes we get more than we bargained for ...

Quote

Also, what deconvolution algorithm do you use for image restoration and how do you determine the PSP? Jonathan Wienke, who is also contributing to this thread, has used FocusMagic with good results for capture sharpening. I have read that deconvolution can correct to some extent the softening due to the blur filter.

Yes, I also use FocusMagic, but it is not ready for 64-bit computer platforms (yet), although a friend of mine has it running on a 64-bit Win7 machine, albeit as a plug-in of a 32-bit version of Photoshop CS4. I occasionally also use an implementation of the adaptive Richardson Lucy deconvolution which allows to balance between restoration of noise and enhancing detail. Both are capable of restoring resolution from blur instead of boosting acutance or edge contrast. How successful that restoration is depends on the quality of the image (RL uses a maximum likelyhood approach, so multiple solutions can be chosen from an ill posed problem, not necessarily the best).

I approximate the PSFs of my lens+OLPF+sensel pitch/aperture of my camera (at various lens apertures) with the use of Imatest's SFR (=MTF) determination, and a proprietary method I devised, but I must admit that FocusMagic does an amazing job with just a single radius parameter input (+ an amount setting). It deals with defocus blur and also reasonably well with diffraction blur, despite the different shapes of their PSF. The OLPF/AA-filter is just part of the total system MTF. If only FocusMagic would be recoded for 64-bit hardware.