Here is a message sent to me to be forwarded to the list.
- Al
--------------------------------------------------------------
Date: Mon, 19 May 1997 15:03:21 +0000
From: Peter Cariani <peter@epl.meei.harvard.edu>
To: AUDITORY@MCGILL1.BITNET
Subject: Re: An Auditory Illusion
Christian Kaernbach wrote:
> Dear Al,
> > But consider the following two cases:
> > 1. One sees a red ball on a blue table.
> > 2. One sees a blue ball on a red table.
> > According to a "node" theory, in both cases the nodes representing
> > red, blue, ball, on and table are activated. What then is the
> > difference? The two cases require a different arrangement of the
> > same ideas, but node-based theories cannot express this.
>
> The (now) "classical" solution to this problem is synchronicity, The
> cell assembly for RED and the cell assembly for BALL fire in
> synchrony if they are to express that it is the ball that is red.
> This idea dates back to the 70es (Christoph von der Malsburg, "A
> correlation theory of brain function", he actually cites someone from
> last century who had the same idea) but was confirmed only recently
> (1991 or so, Gray, Konig, Engel, Singer, in Nature: binding by
> motion).
I'm sympathetic to the notion that synchrony
could play a role in the binding of local features,
but I think that there exists a
great deal of uncertainty (and/or skepticism)
in the vision community about how robust the effects are
(whether they are seen in unanesthetized preps for example).
I wouldn't go so far as to say that the hypothesis is
"confirmed". It would also seem that by changing the
timings of presentations of different visual
objects that one should be able to get them to fuse in different
ways (I don't know of any perspicuous demonstrations of this
kind -- does anyone know of any?).
One of the main classical arguments
against "scanning models" of perception (e.g. based on phases
re: alpha-waves or on absolute synchrony, e.g. models by
Pitts & McCulloch and Walter in the 1950's) was that cortical
discharge patterns could be driven by many different kinds of
stimuli (clicks, flashes) that don't necessarily interfere with
perceptual integration (see the discussion of McCulloch's
Why the Mind is in the Head in Jeffress, Cerebral Mechanisms
of Behavior (the Hixon Symposium), 1951).
On the other hand, Chistovitch's experiments with trains of
alternating single-formant vowels (JASA 77(3):785-809, 1985)
seem to indicate that spectral components need to occur
within a common 10-15 ms window in order for spectral
integration to occur (for the 1-formant vowels to fuse
together into a 2-formant percept). But this can't be the
whole story, because we also can (better) separate
2-formant vowels with different F0's, such that multiple
auditory objects don't necessarily fuse even though they
overlap in time.
Somehow one needs a neural mechanism for binding
together multiple independent perceptual attributes
of a stimulus. The strategy of using "feature detectors"
(nodes, "place-coded" representational systems)
runs into problems for objects with multiple attributes.
Either one needs to have tuned neural arrays
that encompass all possible combinations
(ensembles of "combination-detectors"), or one needs
some other means of encoding the conjunction of the
different kinds of neural information.
Synchrony (common time of activation) is one way
of doing this, but then one needs a separate time-slot for each
independent object. Lisman and others who work on "phase-codes"
in the hippocampus have proposed a division of the hippocampal
theta wave based on these kinds of notions (with 7 +/- 2 slots).
An alternative (or complement) to binding-through-synchrony
mechanisms is binding through common time (phase) structure.
When we have a harmonic complex with a mistuned component
or when we have two harmonic complexes (double vowels)
with different F0's, the time (phase) relations within
each object (complex vs. mistuned component; vowel1 vs. vowel2)
are constant from one fundamental
period to the next. The time (phase) relations across objects,
however, are constantly changing. I think that any mechanism
that groups by common time pattern from period to period should be
able to segregate out multiple objects this way (Patterson's
strobed temporal integration model, JASA 98(4);1890-4, 1995
is in the right direction, but I'm not sure how well the
triggering algorithm would handle multiple objects with
different F0's).
If one thinks about the visual example with two overhead
transparencies moved independently, a similar
situation obtains: similar spatial phase relations group
together -- it's easy to separate the two images. When
the two images move together, they fuse and it's hard
to separate them. Like in audition, visual neurons
"phase-lock" to sinusoidal gratings and respond with
precise latencies to contrast transients (lines, edges).
(It is interesting that the responses of simple cells in
V1 to drifting gratings look much like the responses of units
in auditory cortex to AM tones with comparable modulation
rates (Shamma, Computation in Neural Systems 7 (1996):
439-476).) Each image-object presumably produces
a common spatio-temporal correlation structure
that encodes all the edges and shapes
(but as far as I know, there are no models of
spatial form perception that use phase-locking
to form spatio-temporal correlation patterns.)
There is also a small literature on
temporal response patterns generated by light of
different wavelengths (e.g. Kozak & Reitboeck,
Vision Res. 14:1890-1894, 1974), so that
color-related time patterns could be mixed in with
spatial form and texture information in the same
channels (there is also the work of Optican, Richmond,
McClurkin et al on multiplexing of these kinds of
information).
The point is that the problem of segmentation
and binding is generated by the assumption
that we are trying to assemble the outputs of
"nodes" or local "feature detectors",
and that these outputs themselves have little
internal structure.
Once we admit the possibility of time playing a role --
there being functionally-significant temporal
(or spatial) microstructure to our inputs,
either through synchrony or through common time pattern --
then much more flexible modes of association become
possible. Suddenly we are able to do many things that
traditional, explicitly-coded logic systems are able to
do quite easily (but that are difficult to accomplish
using std. connectionist nets). It is because
suddenly, instead of operating on scalar signals
(1 variable per node), we are able to have signals
with higher dimensionality. This is the equivalent
of stringing together symbols that denote
different properties, as in symbolic logic.
The red ball on the blue table is different from the blue
ball on the red table because the representation of
each object carries form and color information that
are multiplexed together. This obviates the need for
an ensemble of detectors for all of the combinations,
or for the objects to be segregated and coordinated
in time.
Peter Cariani