3/24/2011

03-24-11 - Image filters and Gradients

A friend recently pointed me at John Costella's
supposedly superior edge detector .
It's a little bit tricky to figure out what's going on there because his writing is quite obtuse,
so I thought I'd record it for posterity.

You may recognize Costella's name as the guy who made Unblock
which is a rather interesting and outside-the-norm deblocker. He doesn't have an image science background, and in the
case of Unblock that led him to some ideas that normal research didn't find. Did he do it again with his
edge detector?

Well, no.

First of all, the edge detector is based on what he calls the magic kernel .
If you look at that page, something is clearly amiss.

The discrete 1d "magic kernel" for upsampling is [1,3,3,1] (unnormalized). Let's back up a second, we wish to
upsample an image without offseting it. That is, we replace one pixel with four and they cover the same area :

+---+ +-+-+
| | | | |
| | -> +-+-+
| | | | |
+---+ +-+-+

A 1d box upsample would be convolution with [1,1] , where the output discrete taps are half the distance apart
of the original taps, and offset by 1/4.

The [1331] filter means you take each original pixel A and add the four values A*[1331] into the output. Or
if you prefer, each output pixel is made from (3*A + 1*B)/4 , where A is the original pixel closer to the
output and B is the one farther :

but clever readers will already recongize that this is just a bilinear filter. The center of P is 1/4 of
an original pixel distance to A, and 3/4 of a pixel distance to B, so the 3,1 taps are just a linear filter.

So the "magic kernel" is just bilinear upsampling.

Costella shows that Lanczos and Bicubic create nasty grid artifacts. This is not true, he simply has a bug
in his upsamplers.

The easiest way to write your filters correctly is using only box operations and odd symmetric filters. Let
me talk about this for a moment.

In all cases I'm talking about discrete symmetric filters. Filters can be of odd width, in which case they
have a single center tap, eg. [ a,b,c,b,a ] , or even width, in which case the center tap is duplicated :
[a,b,c,c,b,a].

Any even filter can be made from an odd filter by convolution with the box , [1,1]. (However, it should
be noted that an even "Sinc" is not made by taking an odd "Sinc" and convolving with box, it changes the
function).

That means all your library needs is odd filters and box resamplers. Odd filters can be done "in place",
that is from an image to an image of the same size. Box upsample means replicate a pixel with four identical
ones, and box downsample means take four pixels are replace them with their average.

To downsample you just do : odd filter, then box downsample.
To upsample you just do : box upsample, then odd filter.

For example, the "magic kernel" (aka bilinear filter) can be done using an odd filter of [1,2,1]. You just
box upsample then convolve with 121, and that's equivalent to upsampling with 1331.

Okay, so now let's get back to edge detection. First of all let's clarify something : edge detectors
and gradients are not the same thing. Gradients are slopes in the image; eg. big planar ramps may have
large gradients. "edges" are difficult to define things, and different applications may have different
ideas of what should constitute an "edge". Sobel kernels and such are *gradient* operators not edge
detectors. The goal of the gradient operator is reasonably well defined, in the sense that if our image
is a height map, the gradient should be the slope of the terrain. So henceforth we are talking about gradients
not edges.

The basic centered difference operator is [-1,0,1] and gives you a gradient at the middle of the filter.
The "naive difference" (Costella's terminology) is [-1,1] and gives you a gradient half way between the
original pixels.

First of all note that if you take the naive difference at two adjacent pels, you get two gradients at half
pel locations; if you want the gradient at the integer pixel location between them you would combine the taps -
[-1,1,0] and [0,-1,1] - the sum is just [-1,0,1] , the central difference.

Costella basically proposes using some kind of upsampler and the naive difference. Note that the naive
difference operator and the upsampler are both just linear filters. That means you can do them in either
order, since convolution commutes, A*B = B*A, and it also means you could just make a single filter that
does both.

In particular, if you do "magic upsampler" (bilinear upsampler) , naive difference, and then box downsample
the taps that lie within an original pixel, what you get is :

-1 0 1
-6 0 6
-1 0 1

A sort of Sobel-like gradient operator (but a bad one). (this comes from 1331 and the 3's are in the same
original pixel).

So upsampling and naive difference is really just another form of linear filter. But of course anybody who's
serious about gradient detection knows this already. You don't just use the Sobel operator. For example in
the ancient/classic Canny paper, they use a Gaussian filter with the Sobel operator.

One approach to making edge detection operators is to use a Gaussian Derivative, and then find the discrete
approximation in a 3x3 or 5x5 window (the Scharr operator is pretty close to the Gaussian Derivative in a 3x3
window, though Kroon finds a slightly better one). Of course even Gaussian Derivatives are not necessarily
"optimal" in terms of getting the direction and magnitude of the gradient right, and various people (Kroon,
Scharr, etc.) have worked out better filters in recent papers.

Costella does point out something that may not be obvious, so we should appreciate that :

Gradients at the original res of the image do suffer from aliasing. For example, if your original image is
[..,0,1,0,1,0,1,..] , where's the gradient? Well, there are gradients between each pair of pixels, but if you only
look at original image pixel locations you can't place a gradient anywhere. That is, convolution with [-1,0,1]
gives you zero everywhere.

However, to address this we don't need any "magic". We can just double the resolution of our image using
whatever filter we want, and then apply any normal gradient detector at the higher resolution. If we did that
on the [0,1,0,1] example we would get gradients at all the half taps.

Now, finally, I should point out that "edge detection" is a whole other can of worms than gradient operators,
since you want to do things like suppress noise, connect lines, look for human perceptual effects in edges, etc.
There are tons and tons of papers on these topics and if you really care about visual edge detection you
should go read them. A good start is to use a bilateral or median filter before the sharpen operator (the bilateral filter
suppresses speckle noise and joins up dotted edges), and then sharpen should be some kind of laplacian of gaussian approximation.

10 comments:

The half-pel gradient thing is one of those things that should be obvious but usually isn't, since most people think grid-aligned.

An old co-worker of mine once told me that there was "more information" in a gradient image because you have a value at each edge instead of each center. He was obviously smoking crack, but there is some merit to the point that it is sometimes better to think of those differences living on half-pels. Apparently this helps for fluid solving.

"Costella shows that Lanczos and Bicubic create nasty grid artifacts. This is not true, he simply has a bug in his upsamplers."Note his description on that page. He doesn't do *one* resampling operation - to get the 8x size image, he doubles the size 3 times.

The effective upsampling kernel you get from that operation isn't a real bicubic / Lanczos resample, and has different (worse) properties. Bilinear fares better in his comparison simply because bilin_2x(bilin_2x(img)) happens to be the same as bilin_4x(img). But of course for more general upsamplers, a small upscaling filter iterated twice isn't the same as a large upscaling filter!

Also it occurs to me that for the case of 3d graphics where we have uv mapping of the texture, you could just shift the uv's.

That is, say you have some height map on a surface and you wish to make a normal map. You could do the "naive gradient" (the [-1,1] filter) which is offset by half a pel, then just shift your uv mapping to compensate.

Sure you can do it, but why would you? The cost of the central difference operator is for all practical purposes identical to forwards/backwards diffs (both for CPU and GPU calculations). And there's a real and nontrivial cost to having two different UV mappings where you used to have one - extra shader instructions, but more importantly, extra complexity. It adds a texture size-dependent offset you need to pass to the shader, so now texture changes might imply constant changes. That kind of dependency is really awkward to have.

However, it does give a normal map that's much sharper, for free. In most cases, you don't actually want that sharpness; eg. you would run some kind of blur filter to smooth out the sampling anyway. But in some cases you might.

eg. if you have a heightmap that's like a wire circuit diagram with lots of fine lines and hard steps. Using half-pel offset normals would give you much sharper lines, it's effectively double the resolution.

(BTW this is related to what I call the "sampling fallacy" ; when an artist draws a hard step in a bitmap, they do not mean that they have sampled an infinite resolution image at this sampling step and those are the results, they are actually encoding additional information - eg. they are implying that the hard step exists at higher resolution than they can draw)

Yeah that extra sampling/filtering step is a fundamental issue with the height map to normal map model.

I'm not sure if that's even used that much anymore though. If you bake normal maps from a higher-resolution model, that particular "step ambiguity" is a non-issue - artists can add extra geometry to make edges however sharp they like. And at my old company, the artists started using this baking-style workflow for all normal maps, including stuff like tiled terrain textures. Basically just sculpt your surface structure in Modo/ZBrush then bake a normal map onto a planar mesh. Way more intuitive to work with since the surface structure is explicit instead of implied by the grayscale intensities, and again it's easy to add extra detail where necessary.

Besides, height maps really suck to edit in other ways too. The implicit [0,1] clamping combined with the fact that you're only interested in height deltas not the absolute values makes for a really awkward workflow. Basically you'd need to plan out so you're at a fairly low (or high) value whenever you want to encode a sharp edge. And then you want most other height deltas to be significantly smaller (at least 2x) so the rest of the texture is really low-contrast and hard to edit. It just sucks.

Height maps are a reasonable representation for actual sampled height data where you know the min/max ahead of time and can just normalize stuff to work out, but for manual painting they just suck.

Thanks for the factorization into 1 1 nearest neighbor upsampling followed by 1 2 1 convolution. I've been looking for a decomposition like that to make it easier to explain. (It's not efficient to compute it like that of course, but conceptually it hits the spot.)

The bicubic and Lanczos upsampling was a single direct 800% upsampling, not three repeated doublings as stated in one comment. And they came straight out of GIMP, so if there's a bug then it's in GIMP. (I will say that I've seen the same artifacts using Photoshop bicubic in the past, so the result didn't surprise me.)

You're right that in principle one could upsample an image using any upsampling method and then apply any of the standard operators. That message needs to be disseminated. But the nice thing about the naive difference operator and the "magic kernel" is that they're about as efficient as you could ask for (a few bit shifts and adds/subtracts and that's it).

Even with gradient filters the task is not that well-defined. Do you care about gradient magnitude? Rotational accuracy? Rotational invariance? Localisation precision?

I tend to use Simoncelli filters for rotation invariance (Scharr should be similar) and nonlinear filters if I care about localization or magnitude. Another cool trick are "complementary linear bias" gradient filters which make sense in stuff like diffusion (http://dx.doi.org/10.1109/ICPR.2000.903474)