Stuff

Resizing an image to a different size requires a basic image processing technology called a resampler. This is one of the elementary operations in any image processing toolkit. Yet, I've seen many, many cases where people get resampling algorithms subtly wrong, slightly wrong, or even blatantly wrong. Some of these were even in supposedly professional image processing applications! It isn't that hard to create a quality resampler. It does take some careful thought, and a little ingenuity to implement, but doesn't cost a lot of performance and makes the resampler act a lot more consistently.

I will freely confess to having committed violations of all the rules I'm about to outline in various older versions of VirtualDub, where sometimes you would see a line of pink pixels on the right, or the image is a tiny bit larger or smaller than it's supposed to be, etc. All of these should be worked out in modern versions and the resize filter should obey all of the rules. If you think there is a violation, feel free to ask about it.

0. Definitions.

The resampler takes a source image of size (sw, sh) and converts it to a destination image of size (dw, dh), where sw/sh/dw/dh are in pixels.

The coordinate system for the source image is from (0,0)-(sw,sh), and for the destination image it is (0,0)-(dw,dh). The right and bottom edges of each rectangle are not included in the image (this is a top-left convention).

Each pixel occupies a square from (x,y)-(x+1, y+1), where x and y are integers. Again, the right and bottom borders are not included. The pixel center is located at (x+0.5, y+0.5) and indicates the location where the pixel's color is authoritative. For the destination image, the color at this point represents the color for the whole pixel.

The image is processed via a reverse mapping that correlates each point in the destination image to a single point in the source image. The destination image is produced by iterating over every pixel in the destination, converting the location of each destination pixel's center to a source location, and taking the color of the source image at that point. This results in exact coverage of the destination image, with no overdraw or dropouts.

Point sampling is the simplest resampling algorithm with the lowest quality. It simply chooses the color of the source pixel whose center is closest to the desired source point.

1. Resampling the image by N% shall scale image features by N%.

If I scale my portrait by 2x, I want my face to be twice the size and my nose to be twice the size. Not 1.97x, not 2.03x, exactly 2x. This means that a span of M pixels in the source must correspond to M*N% pixels in the destination. So, if my nose is 20 pixels wide originally, I expect it to be 40 pixels wide afterward.

In a point-sampled resampler, where pixels can only be duplicated or removed, it isn't possible to do this exactly at every point in the source or output image. The source pixels chosen, though, should still be as close as possible to the ideal location, and on average the image features should land in the right spots.

Violating this rule causes chains of resample operations to be inconsistent. If you do a 2x enlargement three times, you would expect the result to correspond to a single 8x enlargement. If the resampler gives you 1.97x in features when the frame doubles in size, though, then three consecutive 2x operations would give you 1.97^3 = 7.65x, whereas a single 8x operation would give 7.88x.

2. There shall be no image shift.

This is desirable for similar reasons to rule (1). If you enlarge an image by 200% and then reduce it by 50%, you'd expect that you'd get the same size image, just with some possible degredation from the resampling. It shouldn't be shifted overall three pixels to the left.

This rule is simple to devise and tricky to get right. You can reduce it to either matching the centers or matching the corners. A continuous reverse mapping from destination to source that satisfies this criterion might look like this:

dst(x, y) = src(x * (sw/dw), y * (sh/dh))

The catch is that this is a continuous mapping. You want to sample with (x,y) being on half-pixel coordinates, so that the first pixel is (0.5, 0.5), the second (1.5, 0.5), etc. This places the destination points exactly on pixel centers. For point sampling, the source pixel is then chosen by flooring the coordinates to the nearest equal or lower integer. If you use integer coordinates for this instead, you will get a half-pixel shift. This can be fixed by mapping the corners instead, which results in ((sw-1)/(dw-1)) and ((sh-1)/(dh-1)) for the ratios, but that then violates rule (1).

When point sampling, an integer enlarging factor should result in a regular pattern throughout the image. For instance, 300% enlargement will make each source pixel into a 3x3 block. There should be no runt columns or rows of 1 or 2 pixels on the borders.

A good stress test for a resampler is to do a big series of forward and inverse transforms, such as 200% followed by 50% ten times. Not only does this expose subtle subpixel shifts in a resampler, but it also shows how good the resampling filter is.

3. When doing an identity transform, the image shall remain exactly the same.

This one is common sense. If I ask for a 320x240 image to be resampled to 320x240, there shouldn't be any change. If you got rules (1) and (2) right, this should easily follow.

What isn't as sensical is that this applies for a single axis as well. That is, resampling a 320xN image to 320xM should result in only a vertical resampling — there should be no crosstalk between columns, which should be entirely independent in their processing. This follows because the mapping equations treat the horizontal and vertical axes independently, so a change in one doesn't affect the other. Most resampling filters are separable and thus implemented as separate row and column passes, which automatically guarantees this property.

It's easy to get this right with simple point sampling, but it's often broken when filtering is involved. An interpolation filter should return exactly one pixel's value when asked to sample exactly on top of a source pixel center; it shouldn't blend in any other adjacent pixels. Otherwise, you'll get a subpixel shift in the image, which violates rule (2).

Sometimes it is advantageous to choose a filter that applies mild blurring in order to reduce aliasing artifacts. In this case, the result won't be exactly the same, but at least it should be unbiased. One way to check is to flip the image on input and output and look for differences in the result.

Bilinear interpolation improves the quality of a resampler by doing crude linear interpolation between the 2x2 block of pixels closest to the desired source point. The closer the source point is to one of the pixels, the more that pixel contributes to the output, and if the source point is exactly on top of the pixel, the result is just that pixel. For pixels A-D in book order within the 2x2 block, and fractional offsets x and y in range 0-1 from A, the result is lerp(lerp(A, B, x), lerp(C, D, x), y), where lerp(E, F, r) = E*(1-r) + F*r = E + (F-E)*r.

If you stretch an image in a way consistent with the above rules, some of the sample points will fall on the outer edges of the border pixels of the image. The source sampling point will never fall outside of the source image, but they'll get closer than 1/2 pixel on the border. Problem is, if you're filtering, the filter window requires pixels outside of the image, even for the itty-bitty 2x2 bilinear window. Forgetting this results in junk in the image or a crash in the resampler. All too often, I see people fix this by just shrinking the source bounds until the problem goes away. Don't do this! It not only breaks rule (1), but it also requires an adjustment that depends on the size of the filter kernel, which makes no sense.

The way you solve this is by introducing a rule that defines the source pixels that fall outside the source rectangle. Some useful rules are:

Clamp: Choose the nearest pixel along the border. This takes the border pixels and extends them out to infinity. I use this one, because it's the fastest rule.

Mirror: Take the source point and reflect it across the border. This extends out the entire image by alternately flipping it out to infinity. This one avoids streaking, but can still look odd with a "bounce" effect at the border. One bug that I often see when this rule is implemented is only mirroring the source point coordinate once. If the filter kernel is large or the scaling factor is very high, the coordinate may be sufficiently far out that mirroring it across one border results in it still being out of bounds past the opposite one. Bouncing it back and forth across both borders eventually puts it in-bounds.

Wrap/tile: Pretend the source image is tiled infinitely and wrap the coordinates to the opposite side when a border is crossed. This rule isn't very useful unless the source image is naturally repeating, such as a texture that's meant to be tiled. Wrapping the source coordinates when resampling such an image, however, prevents seams from developing when the resampled image is tiled.

The size of the border that requires this handling is half a source pixel. A 300% enlargement results a 1.5 pixel wide clamped/wrapped/mirrored border in the output. This means that for most usual factors, the border is hardly noticeable.

I test resamplers for mistakes in border handling by creating a 2x2 black-and-white checkerboard image and then stretching it to 1000x1000. One popular image editing program I tried this on gave me a giant green blotch in the output, almost certainly the result of reading memory outside of the source image. Oops.

5. Resampling a solid color should give a solid color.

If I stretch or shrink a solid red image, it should remain solid red. Doesn't matter if I pick point sampling, bilinear, bicubic, Lanczos3, or 256-tap windowed sinc. Obviously some concessions can be made for limited computing precision under extreme conditions, but in general, the smaller the difference, the better, and ideally it is zero.

What this means is that any resampling filter used should have all weights in its kernel sum to exactly one. This is called unity gain. If it doesn't, the sum is multiplied into all colors in the output. It also means that if there are weights smaller than zero or larger than 1, that intermediate results can't be clamped to 0-1, only the final result, or else artifacts will show up in the image where the clamping occurs. Even worse, these artifacts will be position-dependent.

Those of you with 3D experience may realize at this point that there is no dependence on integer coordinates in the above code. Indeed, if you have constructed the destination-to-source mapping correctly and are careful with fill conventions, there is no reason you cannot resample a 320.4x760.5 region to 480.6x360.2. VirtualDub's internal resampler allows this, and this is how the resize filter supports fractional target sizes.

In a similar vein, the 3D analogy also implies that the above rules also apply to rotation engines as well as resamplers; they shouldn't shift or distort either. You can test this case with VirtualDub's rotate2 video filter.

10 comments | Feb 02, 2006 at 02:56 | default

Comments

Comments posted:

Now, the most important question the reader must ask itself, is this: which professional appications did you encounter this with?

/Zaro

Zaro - 02 02 06 - 06:04

Guess: Adobe Photoshop?

Murmel - 02 02 06 - 15:54

I don't want to name... oh, whatever. Paint Shop Pro 7, IIRC.

Phaeron - 03 02 06 - 02:25

what about the interpolation function? should it be "linear", FIR-based (the real linear), bicubic or other? if FIR, which window function to use? should negative values be allowed in tap coefficients?

are there different answers for streching and shrinking? is it better/faster/etc to do this multi-step (x2 -> x4 ->x8) or in one big step?

what about extremes - what is the recommended technique for 1/100x (thumbnails)? most thumbnails i see have various artifacts/aliasing even when using "high quality" interpolation

User - 04 02 06 - 08:42

also, most, if not all, interpolation functions forget to convert the pixel values to linear scale (alpha=1, aka "the number of photons hitting the camera pixel") before doing the interpolation. this error causes darker midtones/shadows, noticeable in large shrinking...

User - 04 02 06 - 08:47

Both triangle ("linear") and cubic kernels are usually implemented as FIR kernels, which are linear by definition. Most small filter kernels have windowing included as part of their definition; if you're using something infinite or otherwise too-long, such as windowed-sinc, then you get to choose the window. Using a larger filter kernel can give better results, but only to a limited extent; rippling out from edges can become excessive, and beyond a certain point the tails round off enough to be insignificant anyway. The difference between bilinear and bicubic is somewhat noticeable, but it's hard to distinguish between Lanczos3 and a bicubic filter with the right A parameter.

Negative tap coefficients are almost always desired; any filter that has only positive coefficients will have a blurring effect. A 4-tap B-spline filter is considerably blurrier than a 4-tap cardinal spline filter, which is why I almost never use B-spline. Using a positive-only filter kernel like a B-spline or a Gaussian, though, can avoid edge ringing in the output, which is bad for thresholding (alpha-test) or gradient determination.

For a filter defined in source space (the desired case for reverse mapping), stretching is treated as a special case where the filter cutoff is locked at 0.5Fs. This means that a cubic interpolation kernel, for instance, is always 4-taps in either direction. For shrinking, the filter becomes wider and is sampled more often (thus more taps). Failing to take this into account and using an interpolation kernel for shrinking results in aliasing rather quickly as the shrinking ratio increases, blunting the advantages of using a good filter.

Although I don't have numbers, I'd guess that in most common cases it is better to do a resampling operation in one pass. Resampling filters are usually small enough that memory traffic becomes a significant factor in performance, which multipassing exacerbates -- especially since the intermediate passes need to be stored in a wider format than the input and output pixel formats. I suppose you could get to ridiculous filters and ratios that would cause cache thrashing, but in that case there are even more exotic algorithms that could be tried than just multipassing, such as FFT convolution.

For thumbnail generation, I usually apply extreme levels of sharpening, like >radius-20 unsharp mask, and then apply a bicubic resample to shrink. You need the sharpening for the thumbnail to register, because otherwise all of the detail is lost. Aliasing matters less here, because the thumbnail isn't a moving video.

Gamma correction is another factor, but it shouldn't be considered part of the interpolator/filter. You wouldn't undo and redo the gamma correction between separable passes. You'd do it before and after the whole resampler. Thing is, you can't really do gamma correction very well with 8 bits/channel, because you'll get nasty amounts of quantization noise at the low end. You'd want something like 12 bits/channel or more. One of the things I want to implement eventually is a resampler that has support for ~12-15 bit integer and 32-bit float channels.

Phaeron - 06 02 06 - 04:24

A few words...
When doing separable scaling (first vertical scaling and then horizontal or vide versa), the result of the first scaling can be out range (e.g. negative) and must be preserved. This happens when the interpolator uses negative coefs (bi-cubic, Lanczos, etc). Not doing so will break the mathematical equality to the 2D versions of these scalers and cause artifacts.

Up scaling and down scaling should be treated differently:
Down scaling should be low passed and then resampled.
Up scaling only need to resample.

The low pass filter for down scaling should remove (as much as possible) the high frequncies that exist in the hi res image and doesn't exist in hte low res image. Some interpolators like Lanczos already have the low pass coefs in the interpolator, but others (bi-cubic, bi-linear) must have a pre filter or nasty aliasing will occur.
The meaning of aliasing is "folding" of high frequncies onto low frequencies when a singal (image) is sampled with too little samples.

Working in 8 bit values is always bad as the accumulated rounding that occurs after each image processing operation can add up to a noticeable error. Current image processing in hardware (pro video processors) use 10 bit and soon 12 bit. The best way is to work (in software) in 16 bit and dither the result at the end of the processing path.
From a programming point of view, the best way is to have a PolyPhase scaler (like Lanczos) meaning that you keep an array of filter coefs for each sampling phase. Bi cubic, bi-linear (not optimal) and many other scalers can be implemnted by using different coefs for each phase.

So all you have to is find which pixels fall into the interpolation window, find the phase (0-0.999 where 0 is the pixel left of the center of the window and 1.0 is the one right of it).

If you want to scale the image by from size N to M the distance between sampling points will be (N-1)/(M-1). So factor of 2 will not keep the original pixel for every 2nd sample, but cause a phase shift along the image which might unpleasing if the image has fine details. If you want to keep the original value for every 2nd pixel, the output image will (2N)-1 pixels. This can be solved by increasing the size of the orignal image by 1 pixel in one of the methods Avery mentioned.

Anyway, the "best" scalers are true 2D scalers, usualy statistical scalers that looks at the 2D neighborhood, classify it and a apply a 2D filter to find the result. These are too heavy for real time software (and very expensive in ASIC).

A good way to avoid/reduce ringing (and lower noise) is to analize the window, if the window (or part of the iwndow) is smooth (pixels are very similar) then fall back to bilinear. Such a method should have a soft decision (not true/false) and have a range of values that switch between the sharp scaler (Lanczos) and the smooth scaler (linear).

Cheers,
Technium

Technium - 18 03 06 - 18:48

Good points. A few replies to your points:

Regarding premature clamping of negative results between separable passes, that's true. In practice, you can get away with it, as the negative lobe is usually quite small, and it avoids having to code the dot product differently between row and column passes. Given a decision between clamping and having to go from 8 bits to 7 bits for intermediate results, I think I would choose to clamp.

An interpolation filter is still a low-pass filter; it just happens to be at the Nyquist limit (half the sampling rate).

As I noted above, you really want to use N/M as the source step factor instead of (N-1)/(M-1). For an exact 2:1 upsample with correct centering, the phases will be 1/4 and 3/4; none of the samples will be exactly on input samples. For a 2:1 downsample, all of the samples will be exactly between source pixel centers. Getting this right is very important in some circles; this particular case is very important in 3D texture mipmap generation.

Phaeron - 18 03 06 - 19:20

Thanks for the post and many useful comments by all.

Can someone elaborate on how best to handle fractional source offsets with a FIR kernel?

Let's say you wanted to implement a "digital zoom" effect. e.g. 5.8x zoom on a 640x480 frame. This is equivalent to a source crop from top-left point (264.8, 198.6) to bottom-right (375.2, 281.4) then upsampling the corresponding 110.4x82.8 cut-out to a destination 640x480 frame.

For each destination pixel index we could inverse map its center to the crop region above: first add (0.5, 0.5) then divide by the 5.8 zoom factor, then offset by the top-left source point (264.8, 198.6). e.g. for destination pixel index [0, 0] the inverse map is (264.9, 198.7) in source coordinates.

Point sampling from here is straightforward. What about a FIR filter? If this inverse mapped point is the center of the FIR kernel, how do we determine the FIR weights for the pixel intensities in the source image?

Are the weights simply the FIR kernel evaluated at the points corresponding to the centers of the source pixels?

i.e. should we determine the distance from the center of the FIR kernel above to the center of each nearby source pixel and evaluate the FIR kernel at that offset? For the example above, the nearby source pixels have x-coordinates {..., 263.5, 264.5, 265.5, 266.5, ...} corresponding to source x-indices {..., 263, 264, 265, 266, ...}. The distance from the FIR filter center is then {..., -1.4, -0.4, 0.6, 1.6, ...}. Is this where the FIR kernel should be evaluated to produce the correct filtering weights?

Any insight appreciated. :-)

jpap - 16 02 12 - 21:07

Yes, you have the general idea. The FIR kernel is a continuous signal that is itself being sampled. You can look at it in either of two ways: either the destination pixel is represented by a kernel sampled at each of the source pixel locations, or each of the source pixels are represented by a kernels that are all being sampled at the destination point.

Phaeron - 17 02 12 - 06:24

Comment form

Please keep comments on-topic for this entry. If you have unrelated comments about VirtualDub, the forum is a better place to post them.

Name:

Remember personal info?
YesNo

Email (Optional):

Your email address is only revealed to the blog owner and is not shown to the public.

URL (Optional):

Comment:

/

An authentication dialog may appear when you click Post Comment. Simply type in "post" as the user and "now" as the password. I have had to do this to stop automated comment spam.

Small print: All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.