Where there is unlikely to be improvements: spike sorting and spiking models.

Where there are likely to be dramatic improvements: non-stationarity of recorded waveforms, limitations of a linear mappings between neural activity and movement kinematics, and the low signal to noise ratio of the neural data.

A frequent task in the lab is to sort spikes (extracellular neural action potentials) from background noise. In the lab we are working on doing this wirelessly; to minimize power consumption, spike sorting is done before the radio. In this way only times of spikes need be transmitted, saving bandwidth and power. (This necessitates a bidirectional radio protocol, but this is a worthy sacrifice).

In most sorting programs (e.g. Plexon), the raw signal is first thresholded, then waveform snippets (typically 32 samples long) are compared to a template to accept/reject them, or to sort them into different units. The comparison metric is usually the mean-squared error, MSE, aka the L2 norm. This makes sense, as the spike shapes are assumed to be stereotyped (they may very well not be), and the noise white / uncorrelated (another debatable assumption).

On the headstage we are working with for wireless neural recording, jumps and memory moves are expensive operations, hence we've elected to do no waveform extraction, and instead match continuously match. By using the built-in MPEG compression opcodes, we can compute the L1 norm at a rate of 4 samples / clock -- very efficient. However, this was more motivated by hardware considerations an not actual spike sorting practice. Literature suggests that for isolating a fixed-pattern signal embedded in noise, the best solution is instead a matched filter.

Hence, a careful study of spike-sorting was attempted in matlab, given the following assumptions: fixed spike shape (this was extracted from real data), and uncorrelated band-limited noise. The later was just white noise passed through a bandpass filter, e.g.

cheby1(3, 2, [500/15e3 7.5/15])

Where the passband edges are 500 Hz and 15kHz, at a sampling rate of 30kHz. (Actual rate is 31.25kHz). Since the spike times are known, we can rigorously compare the Receiver Operating Characteristic (ROC) and the area under curve (AUC) for different sorting algorithms. Four were tried: L1 (as mentioned above, motivated by the MPEG opcodes), L2 (Plexon), FIR matched filter, and IIR matched filter.

The latter was very much an experiment -- IIR filters are efficiently implemented on the blackfin processor, and they generally require fewer taps than their equivalent FIR implementation. To find an IIR equivalent to a given FIR matched filter (whose impulse response closely looks like the actual waveshape, just time-reversed), the filter parameters were simply optimized to match the two impulse responses. To facilitate the search, the denominator was specified in terms of complex conjugate pole locations (thereby constraining the form of the filter), while the numerator coefficients were individually optimized. Note that this is not optimizing given the objective to maximize sorting quality -- rather, it is to make the IIR filter impulse response as close as possible to the FIR matched filter, hence computationally light.

And yet: the IIR filter outperforms the FIR matched filter, even though the IIR filter has 1/3 the coefficients (10 vs 32)! Below is the AUC quality metric for the four methods.

And here are representative ROC curves at varying spike SNR ratios.

The remarkable thing is that even at very low SNR, the matched IIR filter can reliably sort cells from noise. (Note that the acceptable false positive here should be weighted more highly; in the present analysis true positive and false positive are weighted equally, which is decidedly non-Bayesian given most of the time there is no spike.) The matched IIR filter is far superior to the normal MSE to template / L2 norm method -- seems we've been doing it wrong all along?

As for reliably finding spikes / templates / filters when the SNR < 0, the tests above - which assume an equal number of spike samples and non-spike samples -- are highly biased; spikes are not normally sortable when the SNR < 0.

Upon looking at the code again, I realized three important things:

The false positive rate need to be integrated over all time where there is no spike, just the same as the true positive is over all time where there is a spike.

All methods need to be tested with 'distractors', or other spikes with a different shape.

The FIR matched filter was backwards!

Including #1 above, as expected, dramatically increased the false positive rate, which is to be expected and how the filters will be used in the real world. #2 did not dramatically impact any of the discriminators, which is good. #3 alleviated the gap between the IIR and FIR filters, and indeed the FIR matched filter performance now slightly exceeds the IIR matched filer.

Below, AUC metric for 4 methods.

And corresponding ROC for 6 different SNR ratios (note the SNRs sampled are slightly different, due to the higher false positive rate).

One thing to note: as implemented, the IIR filter requires careful matching of poles and zeros, and is may not work with 1.15 fixed-point math on the Blackfin. The method really deserves to be tested in vivo, which I shall do shortly.

More updates:

See www.aicit.org/jcit/ppl/JCIT0509_05.pdf -- they add an 'adjustment' function to the matched filter due to variance in the amplitude of spikes, which adds a little performance at low SNRs.

F(t)=[x(t)kσe&dot;1−x(t)kσ]n

Sigma is the standard deviation of x(t), n and k determine 'zoom intensity and zoom center'. The paper is not particularly well written - there are some typos, and their idea seems unjustified. Still the references are interesting:

They use a real matched filter to detect extracellular action potentials.

Update: It is not to difficult to convert FIR filters to IIR filters using simple numerical optimization. Within my client program, this is done using simulated annealing; have tested this using fminsearch in matlab. To investigate the IIR-filter fitting problem more fully, I sliced the 10-dimensional optimization space along pairs of dimensions about the optimum point as found using fminsearch.

The parameters are as follows:

Two poles, stored as four values (a real and imaginary part for each pole pair). These are expanded to denominator coefficients before evaluating the IIR filter.

Five numerator coeficients.

One delay coefficient (to match the left/right shift).

The figure below plots the +-1 beyond the optimum for each axis pair. Click for full resolution image. Note that the last parameter is discrete, hence steps in the objective function. Also note that the problem is perfectly quadratic for the numerator, as expected, which is why LMS works so well.

Note that for the denominator pole locations, the volume of the optimum is small, and there are interesting features beyond this. Some spaces have multiple optima.

The next figure plots +-0.1 beyond the optimum for each axis vs. every other one. It shows that, at least on a small scale, the problem becomes very quadratic in all axes hence amenable to line or conjugate gradient search.

Moving away from planes that pass through a found optima, what does the space look like? E.g. From a naive start, how hard is it to find at least one workable solution? To test this, I perturbed the found optimum with white noise in the parameters std 0.2, and plotted the objective function as before, albeit at higher resolution (600 x 600 points for each slice).

These figures show that there can be several optima in the denominator, but again it appears that a very rough exploration followed by gradient descent should arrive at an optima.

Glass coated tungsten microeletrodes have high capacitance; they compensate for this by spraying colloidal silver over the outside sheath of the glass, insulating that with varnish, and driving the shield in a positive-feedback way (stabillized in some way?) This negates the capacitance. 'low impedance capacitance compensated'.

Capacitance compensation really matters!!

Were able to record from single units for 40-100um range (average: 50um) with SNRs 2:1 to 7:1.

Some units had SNRs that could reach 15:1 (!!!), these could be recorded for 600 um of descent.

more than 3 units could usually be recognized at each recording point by visual inspection of the oscilloscope, and in some cases up to 6 units could be distinguished

Is there some clever RF way of neutralizing the capacitance of everything but the electrode tip? Hmm. Might as well try to minimize it.

Bandpass 300 Hz - 10 kHz.

When the signal crossed the threshold level, it was retained and assumed to be a spike if the duration of the first component was between 70 and 1000 us.

This 70 us lower limit was determined on a preliminary study as a fairly good rise time threshold for separation of fiber spikes from somatic or dendritic spikes.

I really need to do some single electrode recordings. Platt?

Would it be possible to implement this algorithm in realtime on the DSP?

Describe clustering based on PCA.

Programming this computer (PDP-12) must have been crazy!

They analyzed 20k spikes. Mango gives billions.

First principal component (F1) represented 60-65% of total information was based mostly on amplitude

Second principal component, 15-20% of total information represented mainly time parameters.

These are all useful features, though template matching seems the standard now..

Gerstein and Clark 1964 -- stored spikes on tape, then sampled the tape until a threshold was exceeded. 32 samples of the waveform around threshold crossing were stored for analysis on the computer; up to 7000 points could be saved.

also looked at cross-correlation of a spike with a template -- back in 1968 on a LINC-8!

Reviews a good number of other very clever spike sorting techniques for using the lmiited hardware available.

Talk about template realignment and resampling Mambrito and De Luca 1983

when the units on a single channel are similarly tuned, you don't loose much information by grouping all spikes as coming from one source. And the opposite effect is true when you have very differently tuned neurons on the same channel - the information becomes more ambiguous.

PMID-8768391[0]Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey

This temporal modulation is stimulus dependent, being present for highly dynamic random motion but absent when the stimulus translates rigidly -- that is, the response is markedly reproducable and precise to a few milliseconds.

PMID-16339894[1]Neurons of the cerebral cortex exhibit precise interspike timing in correspondence to behavior.

dist=pdist(psi); %This finds the Euclidean distances for all of the points (waveforms) in psi;
%dist is of the form of a row vector of length m(m-1)/2. Could convert into a
%distance matrix via squareform function, but is computationally inefficient.
%m is the number of waveforms in psit.
link=linkage(dist); %This performs a nearest neighbor linkage on the distance matrix and returns
%a matrix of size (m-1)x3. Cols 1 and 2 contain the indices of the objects
%were linked in pairs to form a new cluster. This new cluster is assigned the
%index value m+i. There are m-1 higher clusters that correspond to the interior
%nodes of the hierarchical cluster tree. Col 3 contains the corresponding linkage
%distances between the objects paired in the clusters at each row i.
[H,T]=dendrogram(link,0); %This creates a dendrogram; 0 instructs the function to plot all nodes in
%the tree. H is vector of line handles, and T a vector of the cluster
%number assignment for each waveform in psit.

It looks real nice in theory, and computes very quickly on 2000 x 32 waveform data (provided you don't want to plot) -- however, I'm not sure if it works properly on synthetic data. Here are the commands that i tried:

quote: " when a cortical neuron is repeatedly injected with the same fluctuating current stimulus, the timing of the spikes is highly precise from trial to trial and the spike pattern appears to be unique"

though: I'd imagine that somebody has characterized the actual transfer function of this.

mais: we conclude that the prestimulus history of a neuron may influence the precise timing of the spikes in repsonse to a stimulus over a wide range of time scales.

in vivo, it is hard to find patterns because neurons may jump between paterns & there is a large ammount of neuronal noise in there too. or there may be neural "attractors".

uses a bayesian model to fit nonstationary data. spike data is broken up into overlapping regions with sufficient data to fit mixtures-of gaussians to PCA clusters. Clusters are penalized for moving with a divergence-based measure.