Categorical Perception

ABSTRACT: Differences can be perceived as gradual and
quantitative, as with different shades of gray, or they can be perceived
as more abrupt and qualitative, as with different colors. The first is called
continuous perception and the second categorical perception. Categorical
perception (CP) can be inborn or can be induced by learning. Formerly thought
to be peculiar to speech and color perception, CP turns out to be far more
general, and may be related to how the neural networks in our brains detect
the features that allow us to sort the things in the world into their proper
categories, "warping" perceived similarities and differences so as to compress
some things into the same category and separate others into different categories.

Categories: categorical and continuous. A category, or kind, is
a set of things. Membership in the category may be (1) all-or-none, as with
"bird": Something either is a bird or it isn't a bird; a penguin is 100%
bird, a platypus is 100% not-bird. In this case we would call the category
"categorical." Or membership might be (2) a matter of degree, as with "big":
Some things are more big and some things are less big. In this case the category
is "continuous" (or rather, degree of membership corresponds to some point
along a continuum). There are range or context effects as well: elephants
are relatively big in the context of animals, relatively small in the context
of bodies in general, if we include planets.

Many categories, however, particularly concrete sensorimotor categories
(things we can see and touch), are a mixture of the two: categorical at an
everyday level of magnification, but continuous at a more microscopic level.
Color categories are good examples: Central reds are clearly reds, and not
shades of yellow. But in the orange region of the spectral continuum, red/yellow
is a matter of degree; context and contrast effects can also move these regions
around somewhat. Perhaps even with "bird," an artist or genetic-engineer
could design intermediate cases in which their "birdness" was only a matter
of degree.

Resolving the "blooming, buzzing confusion." Categories are important
because they determine how we see and act upon the world. As William James
noted, we do not see a continuum of "blooming, buzzing confusion" but an orderly
world of discrete objects. Some of these categories are "prepared" in advance
by evolution: The frog's brain is born already able to detect "flies"; it
needs only normal exposure rather than any special learning in order to recognize
and catch them. Humans have such innate category-detectors too: The human
face itself is probably an example. So too are our basic color categories,
although according to the "Whorf Hypothesis" (Whorf 1956; also called the
"linguistic relativity" hypothesis), colors are determined by how our culture
and language happens to subdivide the spectrum (we will return to this).

But if one opens up a dictionary at random and picks out a content word,
chances are that it names a category we have learned to detect, rather than
one that our brains were innately prepared in advance by evolution to detect.
The generic human face may be an innate category for us, perhaps even the
various basic emotions it can express, but surely all the specific people
we know and can name are not. "Red" and "yellow" may be inborn, but "scarlet"
and "crimson"?

The motor theory of speech perception. And what about the
very building blocks of the language we use to name categories: Are our speech-sounds
-- ba, da, ga -- innate or learned? The first question
we must answer about them is whether they are categorical categories at all,
or merely arbitrary points along a continuum. It turns out that if one analyzes
the sound spectrogram of ba and pa, for example, both are found
to lie along an acoustic continuum called "voice-onset-time." With a technique
similar to the one used in "morphing" visual images continuously into one
another, it is possible to "morph" a ba gradually into a pa
and beyond by gradually increasing the voicing parameter.

Liberman et al. (1957) reported that when people listen to sounds that
vary along the voicing continuum, they hear only ba's and pa
's, nothing in between. This effect -- in which a perceived quality jumps
abruptly from one category to another at a certain point along a continuum,
instead of changing gradually -- he dubbed "categorical perception" (CP).
He suggested that CP was unique to speech, that CP made speech special, and,
in what came to be called "the motor theory of speech perception," he suggested
that CP's explanation lay in the anatomy of speech production:

According to the (now abandoned) motor theory, the reason we perceive
an abrupt change between ba and pa is that the way we hear speech sounds
is influenced by the way we produce them when we speak. What is varying along
this continuum is voice-onset-time: the "b" in ba is voiced and the
"p" in pa is not. But unlike the synthetic "morphing" apparatus, our
natural vocal apparatus is not capable of producing anything in between
ba and pa. So when I hear a sound from the voicing continuum,
my brain perceives it by trying to match it with what it would have had to
do to produce it. Since the only thing I can produce is ba or pa
, I will perceive any of the synthetic stimuli along the continuum as either
ba or pa, whichever it is closer to. A similar CP effect
is found with ba/da; these too lie along a continuum acoustically,
but vocally, ba is formed with the two lips, da with the tip
of the tongue and the hard palate, and our anatomy does not allow any intermediates.

The motor theory of speech perception explained how speech was special
and why speech-sounds are perceived categorically: sensory perception is mediated
by motor production. Wherever production is categorical, perception will
be categorical; where production is continuous, perception will be continuous.
And indeed vowel categories like a/u were found to be much less categorical
than ba/pa or ba/da. (Less categorical, but not altogether
continuous either: we will return to this.)

Acquired distinctiveness. If motor production mediates sensory
perception, then one assumes that this CP effect is a result of learning to
produce speech. Eimas et al. (1971), however, found that infants already
have speech CP before they begin to speak. Perhaps, then, it is an innate
effect, evolved to "prepare" us to learn to speak. But Kuhl (1987) found
that chinchillas also have "speech CP" even though they never learn to speak,
and presumably did not evolve to do so. Lane (1965) went on to show that
CP effects can be induced by learning alone, with a purely sensory (visual)
continuum in which there is no motor production discontinuity to mediate
the perceptual discontinuity. He concluded that speech CP is not special
after all, but merely a special case of Lawrence's classic demonstration
that stimuli to which you learn to make a different response become more
distinctive and stimuli to which you learn to make the same response become
more similar.

It also became clear that CP was not quite the all-or-none effect Liberman
had originally thought it was: It is not that all pa's are indistinguishable
and all ba's are indistinguishable: We can hear the differences, just
as we can see the differences between different shades of red. It is just
that the within-category differences (pa1/pa2 or red1/red2) sound/look much
smaller than the between-category differences (pa2/ba1 or red2/yellow1), even
when the size of the underlying physical differences (voicing, wave-length)
are actually the same.

Within-category compression and between-category separation.
This evolved into the contemporary definition of CP, which is no longer peculiar
to speech or dependent on the motor theory: CP occurs whenever perceived within-category
differences are compressed and/or between-category differences are separated,
relative to some baseline of comparison. The baseline might be the actual
size of the physical differences involved, or, in the case of learned CP,
it might be the perceived similarity or discriminability within and between
categories before the categories were learned, compared to after.

The typical learned CP experiment would be the following: A set of stimuli
is tested (usually in pairs) for similarity or discriminability. In the case
of similarity, multidimensional scaling might be used to scale the rated
pairwise similarity of the set of stimuli. In the case of discriminability,
same/different judgments and signal detection analysis might be used to estimate
the pairwise discriminability of a set of stimuli. Then the same subjects
or a different set are trained, using trial and error and corrective feedback,
to sort the stimuli into two or more categories. After the categorization
has been learned, similarity or discriminability are tested again, and compared
against the untrained data. If there is significant within-category compression
and/or between-category separation, this is operationally defined as CP (Harnad
1987).

The Whorf Hypothesis. We can now return both to the "Whorf Hypothesis"
and the "weaker" CP for vowels: According to the Whorf Hypothesis (of which
Lawrence's acquired similarity/distinctiveness effects would simply be a
special case), colors are perceived categorically only because they happen
to be named categorically: Our subdivisions of the spectrum are arbitrary,
learned, and vary across cultures and languages. But Berlin & Kay (1969)
showed that this was not so: Not only do most cultures and languages subdivide
and name the color spectrum the same way, but even for those who don't, the
regions of compression and separation are the same. We all see blues as more
alike and greens as more alike, with a fuzzy boundary in between, whether
or not we have named the difference. So there is no Whorfian learning effect
with colors: Or is there?

Evolved CP. First, back to vowels. The signature of CP is within-category
compression and/or between-category separation. The size of the CP effect
is merely a scaling factor; it is this compression/separation "accordion
effect," that is CP's distinctive feature. In this respect, the "weaker"
CP effect for vowels, whose motor production is continuous rather than categorical,
but whose perception is by this criterion categorical, is every bit as much
of a CP effect as the ba/pa and ba/da effects. But, as with
colors, it looks as if the effect is an innate one: Our sensory category detectors
for both color and speech sounds are born already "biased" by evolution:
Our perceived color and speech-sound spectrum is already "warped" with these
compression/separations.

Learned CP. Is that all there is to it? Apparently not. There
are still the Lane/Lawrence demonstrations, lately replicated and extended
by Goldstone (1994), that CP can be induced by learning alone. And there
are also the countless categories catalogued in our dictionaries that could
not possibly be inborn (though nativist theorists such as Fodor [1983] have
sometimes seemed to suggest that all of our categories are inborn). There
are even recent demonstrations that although the primary color and speech
categories are probably inborn, their boundaries can be modified or even
lost as a result of learning, and weaker secondary boundaries can be generated
by learning alone (Roberson et al. 2000).

Perhaps CP performs some useful function in categorization? In the case
of innate CP, our categorically biased sensory detectors pick out their prepared
color and speech-sound categories far more readily and reliably than if our
perception had been continuous. Could something similar be the case for our
repertoire of learned categories too?

Computational and neural models of CP. Computational modeling
(Tijsseling & Harnad 1997; Damper & Harnad 2000) has shown that many
types of category-learning mechanisms (e.g. both back-propagation and competitive
networks) display CP-like effects. In back-propagation nets, the hidden-unit
activation patterns that "represent" an input build up within-category compression
and between-category separation as they learn; other kinds of nets display
similar effects. CP seems to be a means to an end: Inputs that differ among
themselves are "compressed" onto similar internal representations if they
must all generate the same output; and they become more separate if they
must generate different outputs. The network's "bias" is what filters inputs
onto their correct output category. The nets accomplish this by selectively
detecting (after much trial and error, guided by error-correcting feedback)
the invariant features that are shared by the members of the same category
and that reliably distinguish them from members of different categories;
the nets learn to ignore all other variation as irrelevant to the categorization.

Very little is known yet about the brain mechanisms of category perception
and learning. The computational models are really causal hypotheses about
what the brain might be doing. Neural data provide correlates of CP and of
learning (Sharma & Dorman 1999). Differences between event-related potentials
recorded from the brain have been found to be correlated with differences
in the perceived category of the stimulus viewed by the subject. Neural imaging
studies have shown that these effects are localized and even lateralized
to certain brain regions in subjects who have successfully learned the category,
and are absent in subjects who have not (Seger et al. 2000).

Language-induced CP. Both innate and learned CP are sensorimotor
effects: The compression/separation biases are sensorimotor biases, and presumably
had sensorimotor origins, whether during the sensorimotor life-history of
the organism, in the case of learned CP, or the sensorimotor life-history
of the species, in the case of innate CP. The neural net I/O models are also
compatible with this fact: Their I/O biases derive from their I/O history.
But when we look at our repertoire of categories in a dictionary, it is highly
unlikely that many of them had a direct sensorimotor history during our lifetimes,
and even less likely in our ancestors' lifetimes. How many of us have seen
a unicorn in real life? We have seen pictures of them, but what had those
who first drew those pictures seen? And what about categories I cannot draw
or see (or taste or touch): What about the most abstract categories, such
as goodness and truth?

Some of our categories must originate from another source than direct
sensorimotor experience, and here we return to language and the Whorf Hypothesis:
Can categories, and their accompanying CP, be acquired through language alone?
Again, there are some neural net simulation results suggesting that once
a set of category names has been "grounded" through direct sensorimotor experience,
they can be combined into Boolean combinations (man = male
& human) and into still higher-order combinations (bachelor
= unmarried & man) which not only pick out the more abstract,
higher-order categories much the way the direct sensorimotor detectors do,
but also inherit their CP effects, as well as generating some of their own.
Bachelor inherits the compression/separation of unmarried
and man, and adds a layer of separation/compression of its own (Cangelosi
et al. 2000, Cangelosi & Harnad 2001).

These language-induced CP-effects remain to be directly demonstrated
in human subjects; so far only learned and innate sensorimotor CP have been
demonstrated (Pevtzow & Harnad 1997; Livingston et al. 1998). The latter
shows the Whorfian power of naming and categorization, in warping our perception
of the world. That is enough to rehabilitate the Whorf Hypothesis from its
apparent failure on color terms (and perhaps also from its apparent failure
on eskimo snow terms, Pullum 1989), but to show that it is a full-blown language
effect, and not merely a vocabulary effect, it will have to be shown that
our perception of the world can also be warped, not just by how things are
named but by what we are told about them.