Hi Annemarie, John, Martin
(My apologies for the length of this)
Great discussion!
The Steinschneider paper is extremely useful because they presented their
data in systematic form, so one has a fairly good sense of what the
neural ensembles
at the sites that they recorded from are doing. I think this highlights
the utility of
averaged local field potentials and current source density analysis.
In interpreting their data, however, one needs to remember that their averaged
potentials are reflections of the synchronized component of the
population response.
What they see is very reminiscent of what one sees in the auditory nerve
if one
takes responses from all CF's and sums them together to form a
population PST.
The fundamentals of click trains below about 150 Hz can be seen, but
above this F0
frequency cochlear delays smear out the PST and one sees little temporal structure.
(Of course if one looks at the interspike intervals across the
population, one will
see time structure up to 5kHz, so that this temporal limit of the
synchronized
component of the ensemble response is not necessarily the temporal limit of
all the information available in spike trains.)
If one high-pass filters the click trains or looks only in high CF
regions, the F0 limit
where one can see F0-related time structure in the ensemble PST
increases to
300 or 400 Hz. This is due to the smaller range of cochlear delays present
for fibers with CF's above 2kHz. We presented some of these population PST's
in our analysis of the Flanagan-Gutman "click rate " (buzz) pitch, see
Cariani, P. A.and Delgutte, B. (1996). Neural correlates of the pitch of
complex tones.
I. Pitch and pitch salience.
II. Pitch shift, pitch ambiguity, phase-invariance, pitch circularity,
and the dominance region for pitch. J. Neurophysiol. 76(3), 1698-1734.
[I believe that the buzziness of high-pass filtered click trains has to do
with this high degree of interneural synchrony on a population-wide scale,
and the softer tonal quality of low-pass (<1kHz) filtered trains has to do
with the lack of interneural synchrony in the lower CF regions that are
primarily activated.]
Now this general situation with population PSTs in the auditory nerve
is very similar to what Steinschneider et al saw in their MUA and
CSD data. What it means is that
1) one should not take the upper limit of response periodicities that
are seen
in averaged potentials as the upper limit of timing information that is available
in a neural population and
2) the LOWER limit of periodicity information available at the input
layers of the
awake auditory cortex is 300-400 Hz.
I disagree with the interpretation that this means (in accord with the
Schouten's old
idea of the "residue") that temporal cues are primarily for unresolved harmonics
(due to the interactions of partials) that mainly occur in higher CF regions
(higher harmonic numbers). Yes, there is more interneural synchrony in high
CF regions, but most of the temporal information in the AN lies in
intraneural interval
patterns rather than interneural ones.
If we look at the situation in the low-frequency recording sites,
Steinschneider et
al presented systematically shifted partials (so one sees how well
individual partials
can be distinguished on the basis of activation patterns). My
recollection is that
F0's needed to be on the order of 300 Hz for an 800 Hz BF recording site
in order
for there to be separation of adjacent harmonics. This is also very
similar to what
we saw in the auditory nerve when we looked at rate-profiles for vowels
with F0's
of 150 and 350 Hz (no resolution of harmonics with 150 Hz spacings, but
a crude
resolution of harmonics with 350 Hz spacings):
Hirahara, T. et al. (1996). Representation of low-frequency vowel
formants in the auditory nerve. In Proceedings
European Speech Communication Association (ESCA)
Research Workshop on The Auditory Basis of Speech Perception,
Keele University, United Kingdom, July 15 - 19, 1996 (pp. 4).
The acid test for a neural representation is how well one can predict the
percept from the neural data. In my opinion, the two biggest problems that
we have in the auditory system are
1) to account for the precision of perceptual discriminations (on the
order of fractions of
a percent of frequency in the case of pitch perception, on the order of
tens of usecs in ITD
in the case of binaural localization and
2) to simultaneously account for the extremely robust nature of these precisions
It's hard to tell whether even for 300 Hz harmonic separations whether
one could
predict the frequencies of the partials with anything like the requisite precisions
(or even within an order of magnitude) by using only the spatial
activation profiles.
Of course, one can never rule out the possibility that some other local
group of
neurons has better information (there is always more information, be it
rate or temporal,
in the individual spike trains than in averaged population responses).
So their findings
don't give a great deal of support to the notion of a spectral pattern
representation in
the auditory cortex, but, as usual, nothing can thus far be ruled out entirely.
On the issue of a time-to-place transformation in the midbrain, the
critical issues
have to do with whether the observed MTF's can actually support a neural representation
of pitch that is both sufficiently precise and robust. Unless one wants
to postulate various
complicated ad-hoc mechanisms that subserve pitch perception at
different sound pressure levels,
one wants a single, unified mechanism that does it all in seamless fashion.
We want to see a neural representation that is capable of representing
periodicities
to within a precision of less than 1 percent over an SPL range of
40-100+ dB SPL.
Suresh Krishna's recent excellent paper bears directly on these issues
(as does
earlier work by Palmer, Rees, and Moller:
Krishna, B. S. and Semple, M. N. Auditory temporal processing:
responses to sinusoidally amplitude-modulated tones in the inferior
colliculus. Journal of Neurophysiology Jul 2000; 84(1) 255-273
As I interpret their data, IC MTF's are relatively broad to begin with and
tend to broaden further at higher intensities. THis would not be so problematic
if we also saw more highly tuned MTF's in higher stations,
but their conspicuous absence generates a
great deal of cognitive dissonance in my mind, this discrepancy of about
2 orders of magnitude between MTF tunings and the precisions of
the percepts. The MTF's look to me to be consequences of recovery characteristics
of neurons (taking into account relative balances of excitation and
inhibition) rather
than dedicated "pitch detectors". Spontaneous rates and best MTF's
decline together
as one ascends the pathway.
The other difficulty is that MTF's are not the right way of encoding
pitch. The pitches
produced by harmonics below about 1.5 kHz follow an autocorrelation-like
pattern
(e.g. de Boer's rule) rather than one based on waveform envelope or
adjacent peaks
in filtered waveforms. In interval terms, the low pitches produced by
these complexes
follow all-order intervals rather than first-order intervals (which,
like MTF's are associated with
renewal processes rather than correlations). IF one intersperses random
clicks in between
the regular clicks of an isochronous click train composed of harmonics <
1.5 kHz,, one still hears
the pitch of the click train. It seems that this basic observation is
inconsistent with an MTF-based
representation of pitch. For higher-frequency stimuli, above 2kHz,
pitches are affected by
envelopes and such masking does occur and is quite striking (see
Kaernbach, C. and Demany, L. (1998). "Psychophysical evidence against
the autocorrelation theory of auditory temporal processing." J. Acoust.
Soc. Am., 104, 2298-2306.) The psychophysics of pitch resembles an
autocorrelation-based pattern for lower
harmonics (some would argue lower harmonic numbers or both), while it
resembles an envelope-based
pattern analysis for higher ones. A population-interval representation
of pitch is consistent with this
picture if temporal discharge patterns of auditory nerve fibers reflect
individual partials for low
harmonics and interactions of partials for high ones.
But by far the stronger pitches are those produced by lower harmonics.
In other words, to account for pitch shifts of inharmonic complex tones
and the
relative transparency of pitch representations (we can hear two pitches
when we listen to
double vowels with different F0's -- they don't obliterate each other),
we need something more
like an autocorrelator rather than an envelope or MTF-based analyzer.
This would mean
either comb filter rate tunings or all-order intervals. All-order
intervals related to pitch
are everywhere, and comb filter rate tunings are almost nowhere to be
found.
The big, big advantage of the intervals is that they faithfully and
precisely represent
stimulus periodicities at all relevant
sound intensities. Converting to a place code in mibrain re-introduces
all of the problems and
complexities of traditional rate-place codes, albeit at a higher station
(of course, we can always pass the gnarly processing
buck up to some more central, more omniscent processors......).
I have just one point about binaural pitches.
John Culling wrote:
>It depends a little on your theoretical position about what auditory
> processing gives rise to dichotic pitches, but, if you believe that they
> are produced by a mechanism that detects interaural decorrelation (or
> more precisely "incoherence"), then they are purely spectral pitches.
> They only occur below about 1500 Hz, because (in part) the process of
> analysing the correlation is dependent upon phase locking. The result
> of this analysis, however, is a channel-by-channel coherence measurement.
> So, a "purely spectral" pitch may be achieved by, for instance, replacing
> one sub-band of a diotic noise at one ear with an independently generated
> band of noise. The result is a noise which is diotic at most frequencies,
> but uncorrelated in one sub-band. The stimulus at each ear is (and sounds
> like) white noise. When both earphones are used, however, a distinct
> whistling sound is heard above this noise. The pitch of this whistling
> sound corresponds with the centre-frequency of the manipulated band.
Of course, the binaural temporal cancellation models (yours and Alain de Cheveigne's)
best account for these decorrelation-based binaurally-created pitches.
And the
usual assumption, following Jeffress, is that the binaural correlator uses
coincidence counters that integrate the rates of coincidence detections,
and that
then the result is read out in a rate-place profile, a central spectrum.
But the output of the coincidence detectors also has time structure that
is related
to the pitch. If one runs these kinds of stimuli through a filterbank
and through
a binaural coincidence net (e.g. Huggins pitch, Bilsen multiple phase
delay pitch, other interaurally decorrelated signals, I did this several
years ago),
and one looks at the summary autocorrelation of the output of the
binaural temporal cross-correlator,
one finds that there are dips in the interval distribution at the pitch
period and
its multiples. I constructed some monaural stimuli with flat
autocorrelations
except for one dip at tau0, and these also create noisy pitches (odd
that they are) at
tau0; This led me to believe that the auditory system has the means of analyzing
temporal correlations both positive and negative. (There exist arrays of
spatial binocular
anticorrelation units in the visual system).
A temporal anticoincidence process would yield positive peaks at the
pitch period and its multiples. A central autocorrelation analysis could therefore
also potentially account for these pitches if there exist
anticoincidence detectors
in the pathway (EI units). As much as I like your cancellation models,
they don't
exhaust the realm of possibilities.
I think therefore that we can't say it MUST be spectral because the only
models that
come immediately to mind are of the Jeffress-type time-to-place networks.
The arguments for spectral pattern analysis that are based on the
existence of
binaural pitches are IMHO overstated, they rely more on conventional
assumptions rather than
any kind of logical necessity.
(The older arguments along these lines erected a
false dichotomy between temporal models for pitch where harmonics needed to
interact in the same cochlea,( e.g. "residue" temporal models) vs.
spectral pattern models.
The existence of binaurally created pitches rules out temporal
interactions between harmonics
as the only mechanism, but it does not rule out in any way other kinds
of temporal mechanisms
that rely on summation of intervals rather than interacting harmonics.)
-- Peter Cariani
Martin Braun wrote:
>
> Annemarie wrote:
>
> "I had in mind the study by Steinschneider et al., JASA, 104 (5), 1998,
> 2935 ff. who found that at the level of the primary auditory cortex
> phase locked responses occurred only at sites with high best frequencies
> up to about 200 Hz (stimuli: alternating polarity click trains),
> ............
>
> Does that mean that the temporal code might not play a role at all in
> the low frequency channels or is it more likely that phase locking had
> been transformed into a rate-place code before the A1 (perhaps in the
> midbrain)?..........."
>
> Answers:
>
> 1) As soon as a harmonic is resolved in the cochlea, spectral coding takes
> place and then runs along the complete auditory pathway.
>
> 2) If the spectral information is poor in the cochlea, as with click
> stimuli, it is also poor anywhere else in the auditory system.
>
> 3) Current evidence indicates that f0 in the main speech and music range is
> transcoded from a temporal to a place code in the central nucleus of the
> inferior colliculus (ICC). In other words, time-locking in this f0 range
> disappears above the ICC, and the extracted f0-pitch is coded at its
> frequency place by discharge rate, as most other information that is
> transported into and around the cortex. (See references below)
>
> 4) Phase-locking to acoustic frequencies recorded in the cortex possibly is
> not related to pitch extraction at all. It may be a by-product of other
> functions of the auditory system, e.g. orientation in space.
>
> In conclusion:
>
> A) In the cortex, f0-pitch in the main speech and music range is coded
> purely spectrally. (No phase-locking in pitch coding)
>
> B) Up to the ICC, f0-pitch in the main speech and music range can be coded
> purely temporally, but for all natural, i.e. non-laboratory, complex tones
> it is coded spectrally and temporally. (Phase-locking necessary for pitch
> coding)
>
> Langner, G., 1992. Periodicity coding in the auditory system. Hear. Res. 60,
> 115-142.
>
> Schreiner, C.E., Langner, G., 1997. Laminar fine structure of frequency
> organization in auditory midbrain. Nature 388, 383-386.
>
> Langner, G., Schreiner, C.E., Biebel, U.W., 1998. Functional implications of
> frequency and periodicity coding in auditory midbrain. In: Palmer, A.R.,
> Rees, A., Summerfield, A.Q., Meddis, R. (Eds.), Psychophysical and
> Physiological Advances in Hearing. Whurr, London, pp. 277-285.
>
> Braun, M., 1999. Auditory midbrain laminar structure appears adapted to f0
> extraction: further evidence and implications of the double critical
> bandwidth. Hear. Res. 129, 71-82.
>
> Braun, M., 2000. Inferior colliculus as candidate for pitch extraction:
> multiple support from statistics of bilateral spontaneous otoacoustic
> emissions. Hear. Res. 145, 130-140.
>
> Martin
>
> Martin Braun
> Neuroscience of Music
> Gansbyn 14
> S-671 95 Klässbol
> Sweden
> nombraun@post.netlink.se