Finding outliers in oscillating data

News

Finding outliers in oscillating data

I would like to describe a method to clean up oscillating data and find the outliers using Fast Fourier Transforms. FFTs are really cool when trying to find the periodicity of data. I elected to use FFTs rather than Moving averages (but the principle would be similar, linearize -> find outliers) because they lend themselves nicely to seasonal data.

The first thing we need to do is move from time domain to frequency domain. Fortunately R has a really nice library to take the hard math out of this. We then want to calculate the magnitude (the phase is not important at this stage, because we want to know how big the waves are and not their angle.)

As we’d expect from the data we have a few very low frequencies in the data, and lots of noise in the higher frequencies.

Now we do something you can do in numbers and not really filters (and some of the artifacts of this will be visible later). We filter top half of the data out (set it zero, and not delete it otherwise the reverse FFT breaks) and we filter the top 10 frequencies which should give us a good match to the base waveform in the data. Enter the perfect low pass filter (frequency domain only). This is a little trick I’ve seen in industry where you mess with the frequency domain directly rather than trying to reverse the filter design into time domain.

As further work you can try to combine both the linear and the fft version for trending data. There is also nothing stopping you from using Laplace transforms and z-transforms to filter out noisy streaming data to find outliers (in fact, if I were deploying this in a streaming environment that is most likely what I’d use)