Wednesday, June 15, 2016

The Savitzky-Golay Filter

In 1964, Abraham Savitzky and Marcel Golay published a paper "Smoothing and differentiation of data by simplified least squares procedures" in Analytical Chemistry, which has been heralded as one of the 10 most influential papers in the journal's history.

The Savitzky-Golay filter (SGF) is a digital filter used to smooth noisy data. The basic idea is to chop the dataset into subsets, and then use a low order polynomial to fit successive subsets.

Implementations are available in Octave/Matlab and in recent versions (>0.16) of scipy for python.

Here is a potential use case. I did a Lennard-Jones melt simulation using LAMMPS, and obtained the following pair correlation function g(r) [click to enlarge].

The second argument (15) has to be an odd number and is the window or subset size, and the third (4) argument is the degree of polynomial to regress. When I plot the smoothed curve:

plt.plot(rFG,grFG,'.')plt.plot(rFG,gsm1,label='SG')

If you look closely, again:

One can experiment with the window size and the degree of polynomial. In general, a larger window, and a higher degree polynomial make the curve smoother. The figure below shows a window of size 7 and 31 with a degree 4 polynomial.