I have time-series data of brain cell spiking. It's basically got a baseline of random noise with large spikes interspersed. I want to be able to algorithmically cluster the spike portions of the scatter from the baseline noise. How can I do this (preferably a solution in R)? K-means definitely does not work.

$\begingroup$Several solutions are offered to the similar question at stats.stackexchange.com/questions/1142/…. (That one asks for an online algorithm, which imposes special considerations on how the most recent data are analyzed and raises an ongoing multiple-comparisons type of problem.)$\endgroup$
– whuber♦Jul 2 '13 at 16:37

2 Answers
2

This does not sound like clustering to me. (In particular, clustering usually works with multiple multi-dimensional instances; and commonly has no awareness of time, but you have a single 1-dimensional time series AFAICT)

More as if you just want to remove parts of the series that are outside the some-sigma range of the trend.

In Spike Sorting (Scholarpedia), clustering is the step you do after you have extracted the interesting spikes. You seem to be stuck at the first two steps: filtering and spike extraction.

Normally the term clustering is restricted to the analysis of multiple characteristics. I have coined the term "single dimension cluster analysis" to describe the OP's problem. The tool to isolate the "unusual" is called Intervention Detection. It requires the simultaneous identification of ARIMA structure to characterize the "standard data" and procedures to identify Pulses,Level Shifts,Seasonal Pulses and Local Time Trends to characterize the so-called "non-standard data". You can pursue http://www.unc.edu/~jbhill/tsay.pdf and http://www.autobox.com/abox.exe for free software to learn more. I am one of the authors of AUTOBOX.