Hi,
> I've noticed that SciPy implements Gaussian filtering (in the ndimage
> package) using convolution. It is implemented by loops in Python, not
> even using a vectorized numpy.convolve or an FFT. For the sake of
> speed,
> SciPy is using the worst thinkable implementation.
What makes you think this?
A brief look at the implementation in scipy/ndimage/filters.py shows
that scipy.ndimage.gaussian_filter, after doing some set-up, defers to
gaussian_filter1d to do the filtering on each dimension (as Gaussian
filtering is separable). Then gaussian_filter1d uses for-loops to do
some further set-up, like calculating the filter kernel, and then
defers to correlate1d for the actual filtering. This function does
some error-checking, and then calls _nd_image.correlate1d. The
_nd_image library is a C extension module: _nd_image.correlate1d
corresponds to Py_Correlate1D, which is defined in scipy/ndimage/src/
nd_image.c, and which calls to NI_Correlate1D to do the lifting.
NI_Correlate1D, in scipy/ndimage/src/ni_filters.c, implements the
filtering in a straightforward loop, but also takes advantage where it
can of filter-kernel symmetry or anti-symmetry.
Aside from the several thin layers of error-checking and set-up, the
code does the filtering in C in about the fastest way possible that
doesn't use recursion or FFT. (And even then, I don't know under what
circumstances the direct-convolution method might still win --
probably with small kernels.)
> To make it fast:
>> Gaussian filtering can be implemented using recursion (Young & Vliet,
> 1995; Signal Processing, 44: 139-151). The recursive approximation is
> very accurate and does not introduce ringing. It is anti-causal
> (forward-backward) and has zero phase response. The filtering can be
> done in C using scipy's signal.lfilter function.
Your implementation looks pretty straightforward. Here are some
timings (sm_gaussian is your Gaussian2D, nd_gaussian is
scipy.ndimage.gaussian_filter):
In: img = numpy.random.rand(100, 100)
In: timeit sm_gaussian(img, 2)
1000 loops, best of 3: 1.54 ms per loop
In: timeit nd_gaussian(img, 2)
1000 loops, best of 3: 1.26 ms per loop
In: timeit sm_gaussian(img, 20)
1000 loops, best of 3: 1.55 ms per loop
In: timeit nd_gaussian(img, 20)
100 loops, best of 3: 7.41 ms per loop
In: img = numpy.random.rand(1000, 1000)
In: timeit sm_gaussian(img, 2)
10 loops, best of 3: 168 ms per loop
In: timeit nd_gaussian(img, 2)
10 loops, best of 3: 92.8 ms per loop
In: timeit sm_gaussian(img, 20)
10 loops, best of 3: 167 ms per loop
In: timeit nd_gaussian(img, 20)
10 loops, best of 3: 657 ms per loop
As expected, the recursive approach is a definite win with larger
kernels, but with small kernels the constant factors dominate and the
straightforward implementation is narrowly faster. At the very least,
you should post this implementation on the cookbook page...
Zach