We apparently cannot compute the output of a bilateral filter (BF) using convolution (with the image) because the BF is a non-linear filter. In general, why can convolution only be applied to compute the output of a linear filter?

2 Answers
2

This is a good question and something that I remember asking myself when I first learned about impulse responses and convolution.

To understand this, it is first necessary to understand the significance of impulses and impulse responses. Referring to the image below, you can see that an impulse is an instantaneous like input and the impulse response is the decaying output.

So why is the impulse input significant? It's significant because you represent any arbitrary signal input as an array of impulses! Literally any input you could ever want to feed into a filter can be thought of as an array of impulse inputs.

Now that we understand the significance of impulse input, what is the significance of the impulse response? Well for linear systems we know that the output must follow certain properties for given inputs:

Additivity: y(a + b) = y(a) + y(b)

Homogeneity: y(cx) = c y(x)

What this tells us is that if we feed in multiple inputs (i.e. feed in an array of impulse inputs), we should be able to easily compute the output as a summation time shifted impulses responses. Convolution is simply the mathematical operation that performs this process. If you look at the diagram below, you can see that the convolution operation is simply summing together time shifted impulse responses.

So to answer your question as to why convolution only works on linear filters. It comes down to the fact that convolution relies on the two linear properties (additivity and homogeneity) in order to be accurate in predicting an output. If these two properties aren't true for a system, then the impulse responses can not be summed to calculate the output thus "breaking" the usefulness of convolution.

Convolution is equivalent to calculating an output pixel $y[i,j]$ as a weighted sum of the nearby input pixels $x[i+k,j+l]$, with the weight being a function of the relative spatial location $(k,l)$. In bilateral filtering the formula for an output pixel has a part $f(|x[i+k,j+l] - x[i,j]|)$ which is a non-linear function of pixel values. This makes bilateral filtering a non-linear operation that can't be described by said kind of weighted sum.

$\begingroup$Why is that part of the BF a non-linear function of the pixels? Intuitively, I think it is because the variation of the intensity of e.g. an image (but any signal) is not linearly depending on the coordinates, that is, at pixel $(i, j)$ we might have an intensity $c$, but at pixel $(i, j + 1)$, we might have a completely different intensity than $c$.$\endgroup$
– nbroApr 9 at 14:19

1

$\begingroup$Absolute value $f(x) = |x|$ cannot be described as a linear function $f(x) = cx$ where $c$ is constant. (These are not the $x$ and $f$ of the answer. I am just recycling symbols.)$\endgroup$
– Olli NiemitaloApr 9 at 14:24