ON THE NATURE AND NURTURE OF LANGUAGEElizabeth BatesUniversity of California, San DiegoSupport for the work described here has been provided by NIH/NIDCD-R01-DC00216 (“Cross-linguistic studies in aphasia”), NIH-NIDCD P50 DC1289-9351 (“Origins of communicationdisorders”), NIH/NINDS P50 NS22343 (“Center for the Study of the Neural Bases of Languageand Learning”), NIH 1-R01-AG13474 (“Aging and Bilingualism”), and by a grant from the JohnD. and Catherine T. MacArthur Foundation Research Network on Early Childhood Transitions.Please address all correspondence to Elizabeth Bates, Center for Research in Language 0526,University of California at San Diego, La Jolla, CA 92093-0526, or bates@crl.ucsd.edu.ON THE NATURE AND NURTURE OF LANGUAGEElizabeth BatesLanguage is the crowning achievement of the is so abstract, Chomsky believes that it could not behuman species, and it is something that all normal learned at all, stating thathumans can do. The average man is neither a “Linguistic theory, the theory of UGShakespeare nor a Caravaggio, but he is capable of [Universal Grammar]... is an innate property of thefluent speech, even if he cannot paint at all. In fact, the human mind.... [and].... the growth of language [is]average speaker produces approximately 150 words per analogous to the development of a bodily organ”. minute, each word chosen from somewhere betweenOf course Chomsky acknowledges that French20,000 and 40,000 alternatives, at error rates belowchildren learn French words, Chinese children learn0.1%. The average child is already well on her wayChinese words, and so on. But he believes that thetoward that remarkable level of performance by 5 yearsabstract underlying principles that govern language areof age, with a vocabulary of more than 6000 words andnot learned at all, arguing that “A general learningproductive control over almost every aspect of soundtheory ... seems to me dubious, unargued, and withoutand grammar in her language. any empirical support”.Given the magnitude of this achievement, and theBecause this theory has been so influential inspeed with which we attain it, some theorists havemodern linguistics and psycholinguistics, it is impor-proposed that the capacity for language must be builttant to understand exactly what Chomsky means bydirectly into the human brain, maturing like an arm or a“innate.” Everyone would agree that there is somethingkidney. Others have proposed instead that we haveunique about the human brain that makes languagelanguage because we have powerful brains that can learnpossible. But in the absence of evidence to themany things, and because we are extraordinarily socialcontrary, that “something” could be nothing other thananimals who value communication above everythingthe fact that our brains are very large, a giant all-else. Is language innate? Is it learned? Or, alterna-purpose computer with trillions of processing elements.tively, does language emerge anew in every generation,Chomsky’s version of the theory of innateness is muchbecause it is the best solution to the problems that westronger than the “big brain” view, and involves twocare about, problems that only humans can solve?logically and empirically separate claims: that languageThese are the debates that have raged for centuries in theis innate, and that our brains contain a dedicated,various sciences that study language. They are alsospecial-purpose learning device that has evolved forvariants of a broader debate about the nature of the mindlanguage alone. The latter claim is the one that isand the process by which minds are constructed inreally controversial, a doctrine that goes under varioushuman children. names including “domain specificity”, “autonomy” andThe first position is called “nativism”, defined as“modularity”. the belief that knowledge originates in human nature.The second position is called “empiricism”, definedThis idea goes back to Plato and Kant, but in modernas the belief that knowledge originates in thetimes it is most clearly associated with the linguistenvironment, and comes in through the senses. ThisNoam Chomsky (see photograph). Chomsky’s viewsapproach (also called “behaviorism” and “associa-on this matter are very strong indeed, starting with histionism”) is also an ancient one, going back (at least) tofirst book in 1957, and repeated with great consistencyAristotle, but in modern times it is closely associatedfor the next 40 years. Chomsky has explicated the tiewith the psychologist B.F. Skinner (see photograph). between his views on the innateness of language andAccording to Skinner, there are no limits to what aPlato's original position on the nature of mind, ashuman being can become, given time, opportunity andfollows:the application of very general laws of learning."How can we interpret [Plato's] proposal in Humans are capable of language because we have themodern terms? A modern variant would be that time, the opportunity and (perhaps) the computingcertain aspects of our knowledge and understanding power that is required to learn 50,000 words and theare innate, part of our biological endowment, associations that link those words together. Much ofgenetically determined, on a par with the elements the research that has taken place in linguistics,of our common nature that cause us to grow arms psycholinguistics and neurolinguistics since the 1950’sand legs rather than wings. This version of the has been dedicated to proving Skinner wrong, byclassical doctrine is, I think, essentially correct." showing that children and adults go beyond their input,(Chomsky, 1988, p. 4) creating novel sentences and (in the case of normalHe has spent his career developing an influential children and brain-damaged adults) peculiar errors thattheory of grammar that is supposed to describe the they have never heard before. Chomsky himself hasuniversal properties underlying the grammars of every been severe in his criticisms of the behaviorist approachlanguage in the world. Because this Universal Grammar to language, denouncing those who believe that2language can be learned as “grotesquely wrong” a new machine built out of old parts, reconstructed from(Gelman, 1986). those parts by every human child.In their zealous attack on the behaviorist approach, So the debate today in language research is notnativists sometimes confuse Skinner’s form of about Nature vs. Nurture, but about the “nature ofempiricism with a very different approach, alternatively Nature,” that is, whether language is something that wecalled “interactionism”, “constructivism,” and “emer- do with an inborn language device, or whether it is thegentism.” This is a much more difficult idea than either product of (innate) abilities that are not specific tonativism or empiricism, and its historical roots are less language. In the pages that follow, we will exploreclear. In the 20th century, the interactionist or current knowledge about the psychology, neurology andconstructivist approach has been most closely associated development of language from this point of view. Wewith the psychologist Jean Piaget (see photograph). will approach this problem at different levels of theMore recently, it has appeared in a new approach to system, from speech sounds to the broader com-learning and development in brains and brain-like municative structures of complex discourse. Let uscomputers alternatively called “connectionism,” “paral- start by defining the different levels of the languagelel distributed processing” and “neural networks” system, and then go on to describe how each of these(Elman et al., 1996; Rumelhart & McClelland, 1986), levels is processed by normal adults, acquired byand in a related theory of development inspired by the children, and represented in the brain.nonlinear dynamical systems of modern physics (ThelenI. THE COMPONENT PARTS OF& Smith, 1994). To understand this difficult butLANGUAGEimportant idea, we need to distinguish between twoSpeech as Sound: Phonetics and Phonologykinds of interactionism: simple interactions (black andwhite make grey) and emergent form (black and white The study of speech sounds can be divided into twoget together and something altogether new and different subfields: phonetics and phonology. happens). Phonetics is the study of speech sounds as physicalIn an emergentist theory, outcomes can arise for and psychological events. This includes a huge body ofreasons that are not obvious or predictable from any of research on the acoustic properties of speech, and thethe individual inputs to the problem. Soap bubbles are relationship between these acoustic features and the wayround because a sphere is the only possible solution to that speech is perceived and experienced by humans. Itachieving maximum volume with minimum surface also includes the detailed study of speech as a motor(i.e., their spherical form is not explained by the soap, system, with a combined emphasis on the anatomy andthe water, or the little boy who blows the bubble). The physiology of speech production. Within the field ofhoneycomb in a beehive takes an hexagonal form phonetics, linguists work side by side with acousticalbecause that is the stable solution to the problem of engineers, experimental psychologists, computerpacking circles together (i.e., the hexagon is not scientists and biomedical researchers.predictable from the wax, the honey it contains, nor Phonology is a very different discipline, focused onfrom the packing behavior of an individual bee—see the abstract representations that underlie speech in bothFigure 1). Jean Piaget argued that logic and knowledge perception and production, within and across humanemerge in just such a fashion, from successive languages. For example, a phonologist may concen-interactions between sensorimotor activity and a trate on the rules that govern the voiced/voicelessstructured world. A similar argument has been made to contrast in English grammar, e.g., the contrast betweenexplain the emergence of grammars, which represent the the unvoiced “-s” in “cats” and the voiced “-s” in “dogs”.class of possible solutions to the problem of mapping a This contrast in plural formation bears an uncannyrich set of meanings onto a limited speech channel, resemblance to the voiced/unvoiced contrast in Englishheavily constrained by the limits of memory, perception past tense formation, e.g., the contrast between anand motor planning. Logic and grammar are not given unvoiced “-ed” in “walked” and a voiced “-ed” inin the world, but neither are they given in the genes. “wagged”. Phonologists seek a maximally general setHuman beings discovered the principles that comprise of rules or principles that can explain similarities oflogic and grammar, because these principles were the this sort, and generalize to new cases of word formationbest possible solution to specific problems that other in a particular language. Hence phonology lies at thespecies just simply do not care about, and could not interface between phonetics and the other regularitiessolve even if they did. Proponents of the emergentist that constitute a human language, one step removedview acknowledge that something is innate in the from sound as a physical event. human brain that makes language possible, but that Some have argued that phonology should not exist“something” may not be a special-purpose, domain- as a separate discipline, and that the generalizationsspecific device that evolved for language and language discovered by phonologists will ultimately be explainedalone. Instead, language may be something that we do entirely in physical and psychophysical terms. Thiswith a large and complex brain that evolved to serve tends to be the approach taken by emergentists. Othersthe many complex goals of human society and culture maintain that phonology is a completely independent(Tomasello & Call, 1997). In other words, language is level of analysis, whose laws cannot be reduced to anycombination of physical events. Not surprisingly, this3tends to be the approach taken by nativists, especially combination of lexical and propositional semantics tothose who believe that language has its very own explain the various meanings that are codified in thededicated neural machinery. Regardless of one’s grammar. This is the position taken by many theoristsposition on this debate, it is clear that phonetics and who taken an emergentist approach to language,phonology are not the same thing. If we analyze speech including specific schools with names like “cognitivesounds from a phonetic point of view, based on all the grammar,” “generative semantics” and/or “linguisticdifferent sounds that a human speech apparatus can functionalism”. Other theorists argue instead for themake, we come up with approximately 600 possible structural independence of semantics and grammar, asound contrasts that languages could use (even more, if position associated with many of those who espouse awe use a really fine-grained system for categorizing nativist approach to language.sounds). And yet most human languages use no more Propositional semantics has been dominatedthan 40 contrasts to build words. primarily by philosophers of language, who areTo illustrate this point, consider the following interested in the relationship between the logic thatcontrast between English and French. In English, the underlies natural language and the range of possibleaspirated (or "breathy") sound signalled by the letter “h-” logical systems that have been uncovered in the last twois used phonologically, e.g., to signal the difference centuries of research on formal reasoning. Abetween “at” and “hat". French speakers are perfectly proposition is defined as a statement that can be judgedcapable of making these sounds, but the contrast created true or false. The internal structure of a propositionby the presence or absence of aspiration (“h-”) is not consists of a predicate and one or more arguments ofused to mark a systematic difference between words; that predicate. An argument is an entity or “thing” thatinstead, it is just a meaningless variation that occurs we would like to make some point about. A one-placenow and then in fluent speech, largely ignored by predicate is a state, activity or identity that we attributelisteners. Similarly, the English language has a binary to a single entity (e.g., we attribute beauty to Mary incontrast between the sounds signalled by “d” and “t”, the sentence “Mary is beautiful”, or we attributeused to make systematic contrasts like “tune” and “engineerness” to a particular individual in the sentence“dune.” The Thai language has both these contrasts, “John is an engineer.”); an n-place predicate is aand in addition it has a third boundary somewhere in relationship that we attribute to two or more entities orbetween the English “t” and “d”. English speakers are things. For example, the verb "to kiss" is a two-placeable to produce that third boundary; in fact, it is the predicate, which establishes an asymmetric relationshipnormal way to pronounce the middle consonant in a of “kissing” to two entities in the sentence “John kissesword like “butter”. The difference is that Thai uses that Mary.”, The verb "to give" is a three-place predicatethird contrast phonologically (to make new words), but that relates three entities in a proposition expressed byEnglish only uses it phonetically, as a convenient way the sentence “John gives Mary a book..” Philosophersto pronounce target phonemes while hurrying from one tend to worry about how to determine the truth orword to another (also called “allophonic variation”). In falsity of propositions, and how we convey (or hide)our review of studies that focus on the processing, truth in natural language and/or in artificial languages.development and neural bases of speech sounds, it will Linguists worry about how to characterize orbe useful to distinguish between the phonetic approach, taxonomize the propositional forms that are used inand phonological or phonemic approach. natural language. Psychologists tend instead to worryabout the shape and nature of the mental representationsSpeech as Meaning: Semantics and thethat encode propositional knowledge, with develop-Lexiconmental psychologists emphasizing the process by whichThe study of linguistic meaning takes place within children attain the ability to express this propositionala subfield of linguistics called semantics. Semantics knowledge. Across fields, those who take a nativistis also a subdiscipline within philosophy, where the approach to the nature of human language tend torelationship between meaning and formal logic is emphasize the independence of propositional oremphasized. Traditionally semantics can be divided into combinatorial meaning from the rules for combiningtwo areas: lexical semantics, focussed on the words in the grammar; by contrast, the variousmeanings associated with individual lexical items (i.e., emergentist schools tend to emphasize both thewords), and propositional or relational seman- structural similarity and the causal relationship betweentics, focussed on those relational meanings that we propositional meanings and grammatical structure,typically express with a whole sentence. suggesting that one grows out of the other.Lexical semantics has been studied by linguistsHow Sounds and Meanings Come Together:from many different schools, ranging from the heavilyGrammardescriptive work of lexicographers (i.e., “dictionarywriters”) to theoretical research on lexical meaning and The subfield of linguistics that studies howlexical form in widely different schools of formal individual words and other sounds are combined tolinguistics and generative grammar (McCawley, 1993). express meaning is called grammar. The study ofSome of these theorists emphasize the intimate grammar is traditionally divided into two parts:relationship between semantics and grammar, using a morphology and syntax. 4Morphology refers to the principles governing the kissed whom, nor are there any clues to transitivityconstruction of complex words and phrases, for lexical marked on the verb "kissed". The opposite is true inand/or grammatical purposes. This field is further Hungarian, which has an extremely rich morphologicaldivided into two subtypes: derivational morpho- system but a high degree of word order variability.logy and inflectional morphology. Sentences like “John kissed a girl” can be expressed inDerivational morphology deals with the almost every possible order in Hungarian, without lossconstruction of complex content words from simpler of meaning. components, e.g., derivation of the word “government” Some linguists have argued that this kind of wordfrom the verb “to govern” and the derivational order variation is only possible in a language with richmorpheme “-ment”. Some have argued that derivational morphological marking. For example, the Hungarianmorphology actually belongs within lexical semantics, language provides case suffixes on each noun thatand should not be treated within the grammar at all. unambiguously indicate who did what to whom,However, such an alignment between derivational together with special markers on the verb that agreemorphology and semantics describes a language like with the object in definiteness. Hence the HungarianEnglish better than it does richly inflected languages translation of our English example would be equivalentlike Greenlandic Eskimo, where a whole sentence may to “John-actor indefinite-girl-receiver-of-action kissed-consist of one word with many different derivational and indefinite). However, the Chinese language poses ainflectional morphemes. problem for this view: has no inflectionalInflectional morphology refers to modulations of markings of any kind (e.g., no case markers, no form ofword structure that have grammatical consequences, agreement), and yet it permits extensive word ordermodulations that are achieved by inflection (e.g., variation for stylistic purposes. As a result, Chineseadding an “-ed” to a verb to form the past tense, as in listeners have to rely entirely on probabilistic cues to"walked") or by suppletion (e.g., substituting the figure out "who did what to whom", including someirregular past tense “went” for the present tense “go”). combination of word order (i.e., some orders are moreSome linguists would also include within inflectional likely than others, even though many are possible) andmorphology the study of how free-standing function the semantic content of the sentence (e.g., boys arewords (like "have", "by", or "the", for example) are more likely to eat apples than vice-versa). In short, itadded to individual verbs or nouns to build up complex now seems clear that human languages have solved thisverb or noun phrases, e.g., the process that expands a mapping problem in a variety of ways. verb like “run” into “has been running” or the process Chomsky and his followers have defined Universalthat expands a noun like “dog” into a noun phrase like Grammar as the set of possible forms that the grammar“the dog” or prepositional phrase like “by the dog”. of a natural language can take. There are two ways ofSyntax is defined as the set of principles that looking at such universals: as the intersect of all humangovern how words and other morphemes are ordered to grammars (i.e., the set of structures that every languageform a possible sentence in a given language. For has to have) or as the union of all human grammarsexample, the syntax of English contains principles that (i.e., the set of possible structures from which eachexplain why “John kissed Mary” is a possible sentence language must choose). Chomsky has alwayswhile “John has Mary kissed” sounds quite strange. maintained that Universal Grammar is innate, in a formNote that both these sentences would be acceptable in that is idiosyncratic to language. That is, grammar doesGerman, so to some extent these rules and constraints not “look like” or behave like any other existingare arbitrary. Syntax may also contain principles that cognitive system. However, he has changed his minddescribe the relationship between different forms of the across the years on the way in which this innatesame sentence (e.g., the active sentence “John hit Bill” knowledge is realized in specific languages like Chineseand the passive form “Bill was hit by John”), and ways or French. In the early days of generative grammar, theto nest one sentence inside another (e.g., “The boy that search for universals revolved around the idea of awas hit by John hit Bill”). universal intersect. As the huge variations that existLanguages vary a great deal in the degree to which between languages became more and more obvious, andthey rely on syntax or morphology to express basic the intersect got smaller and smaller, Chomsky beganpropositional meanings. A particularly good example to shift his focus from the intersect to the union ofis the cross-linguistic variation we find in means of possible grammars. In essence, he now assumes thatexpressing a propositional relation called transitivity children are born with a set of innate options that define(loosely defined as “who did what to whom”). English how linguistic objects like nouns and verbs can be putuses word order as a regular and reliable cue to sentence together. The child doesn’t really learn grammar (in themeaning (e.g., in the sentence "John kissed a girl", we sense in which the child might learn chess). Instead,immediately know that "John" is the actor and "girl" is the linguistic environment serves as a “trigger” thatthe receiver of that action). At the same time, English selects some options and causes others to wither away.makes relatively little use of inflectional morphology to This process is called “parameter setting”. Parameterindicate transitivity or (for that matter) any other setting may resemble learning, in that it helps toimportant aspect of sentence meaning. For example, explain why languages look as different as they do andthere are no markers on "John" or "girl" to tell us who how children move toward their language-specific5targets. However, Chomsky and his followers are 1976). Pragmatics is not a well-defined discipline;convinced that parameter setting (choice from a large indeed, some have called it the wastebasket of linguisticstock of innate options) is not the same thing as theory. It includes the study of speech acts (alearning (acquiring a new structure that was never there taxonomy of the socially recognized acts ofbefore learning took place), and that learning in the communication that we carry out when we declare,latter sense plays a limited and perhaps rather trivial role command, question, baptize, curse, promise, marry,in the development of grammar. etc.), presuppositions (the background informationMany theorists disagree with this approach to that is necessary for a given speech act to work, e.g.,grammar, along the lines that we have already laid out. the subtext that underlies a pernicious question likeEmpiricists would argue that parameter setting really is “Have you stopped beating your wife?”), andnothing other than garden-variety learning (i.e., children conversational postulates (principles governingreally are taking new things in from the environment, conversation as a social activity, e.g., the set of signalsand not just selecting among innate options). that regulate turn-taking, and tacit knowledge of whetherEmergentists take yet another approach, somewhere in we have said too much or too little to make a particularbetween parameter setting and learning. Specifically, an point).emergentist would argue that some combinations of Pragmatics also contains the study of discourse.grammatical features are more convenient to process This includes the comparative study of discourse typesthan others. These facts about processing set limits on (e.g., how to construct a paragraph, a story, or a joke),the class of possible grammars: Some combinations and the study of text cohesion, i.e., the way we usework; some don’t. To offer an analogy, why is it that a individual linguistic devices like conjunctions (“and”,sparrow can fly but an emu cannot? Does the emu lack “so”), pronouns (“he”, “she”, “that one there”), definite“innate flying knowledge,” or does it simply lack a articles (“the” versus “a”) and even whole phrases orrelationship between weight and wingspan that is clauses (e.g., “The man that I told you about....”) to tiecrucial to the flying process? The same logic can be sentences together, differentiate between old and newapplied to grammar. For example, no language has a information, and maintain the identity of individualgrammatical rule in which we turn a statement into a elements from one part of a story to another (i.e.,question by running the statement backwards, e.g., coreference relations).It should be obvious that pragmatics is aJohn hit the ball” --> Ball the hit John? heterogeneous domain without firm boundaries.Chomsky would argue that such a rule does notAmong other things, mastery of linguistic pragmaticsexist because it is not contained within Universalentails a great deal of sociocultural information:Grammar. It could exist, but it doesn’t. Emergentistsinformation about feelings and internal states,would argue that such a rule does not exist because itknowledge of how the discourse looks from thewould be very hard to produce or understand sentences inlistener’s point of view, and the relationships of powerreal time by a forward-backward principle. It mightand intimacy between speakers that go into calculationswork for sentences that are three or four words long, butof how polite and/or how explicit we need to be inour memories would quickly fail beyond that point e.g.,trying to make a conversational point. Imagine aThe boy that kicked the girl hit the ball that Peter Martian that lands on earth with a complete knowledgebought --> of physics and mathematics, armed with computers thatcould break any possible code. Despite these powerfulBought Peter that ball the hit girl the kicked thattools, it would be impossible for the Martian to figureboy the?out why we use language the way we do, unless thatIn other words, the backward rule for questionMartian also has extensive knowledge of human societyformation doesn’t exist because it couldn’t exist, notand human emotions. For the same reason, this is onewith the kind of memory that we have to work with.area of language where social-emotional disabilitiesBoth approaches assume that grammars are the way theycould have a devastating effect on development (e.g.,are because of the way that the human brain is built.autistic children are especially bad on pragmatic tasks).The difference lies not in Nature vs. Nurture, but in theNevertheless, some linguists have tried to organize“nature of Nature,” i.e., whether this ability is built outaspects of pragmatics into one or more independentof language-specific materials or put together from more“modules,” each with its own innate properties (Sperbergeneral cognitive ingredients.& Wilson, 1986). As we shall see later, there has alsoLanguage in a Social Context: Pragmatics been a recent effort within neurolinguistics to identify aand Discourse specific neural locus for the pragmatic aspect ofThe various subdisciplines that we have reviewed linguistic knowledge.so far reflect one or more aspects of linguistic form, Now that we have a road map to the componentfrom sound to words to grammar. Pragmatics is parts of language, let us take a brief tour of each level,defined as the study of language in context, a field reviewing current knowledge of how information at thatwithin linguistics and philosophy that concentrates level is processed by adults, acquired by children, andinstead on language as a form of communication, a tool mediated in the human brain.that we use to accomplish certain social ends (Bates,6Invariance refers to the relationship between theII. SPEECH SOUNDSsignal and its perception across different contexts. EvenHow Speech is Processed by Normal Adultsthough the signal lacks linearity, scientists once hopedThe study of speech processing from a that the same portion of the spectrogram that elicits thepsychological perspective began in earnest after World “d” experience in the context of “di” would alsoWar II, when instruments became available that correspond to the “d” experience in the context of “du”.permitted the detailed analysis of speech as a physical Alas, that has proven not to be the case. As Figure 3event. The most important of these for research shows, the component responsible for “d” looks entirelypurposes was the sound spectrograph. Unlike the more different depending on the vowel that follows. Worsefamiliar oscilloscope, which displays sound frequencies still, the “d” component of the syllable “du” looks likeover time, the spectrograph displays changes over time the “g” component of the syllable “ga”. In fact, thein the energy contained within different frequency bands shape of the visual pattern that corresponds to a(think of the vertical axis as a car radio, while the constant sound can even vary with the pitch of thehorizontal axis displays activity on every station over speaker’s voice, so that the “da” produced by a smalltime). Figure 2 provides an example of a sound child results in a very different-looking pattern from thespectrogram for the sentence “Is language innate?”—one “da’ produced by a mature adult male. of the central questions in this field. These problems can be observed in clean, artificialThis kind of display proved useful not only because speech stimuli. In fluent, connected speech theit permitted the visual analysis of speech sounds, but problems are even worse (see word perception, below).also because it became possible to “paint” artificial It seems that native speakers use many different parts ofspeech sounds and play them back to determine their the context to break the speech code. No simpleeffects on perception by a live human being. Initially “bottom-up” system of rules is sufficient to accomplishscientists hoped that this device would form the basis of this task. That is why we still don’t have speechspeech-reading systems for the deaf. All we would have readers for the deaf or computers that perceive fluentto do (or so it seemed) would be to figure out the speech from many different listeners, even though such“alphabet”, i.e., the visual pattern that corresponds to machines have existed in science fiction for decades.each of the major phonemes in the language. By a The problem of speech perception got “curiousersimilar argument, it should be possible to create and curiouser” as Lewis Carroll would say, leading acomputer systems that understand speech, so that we number of speech scientists in the 1960’s to proposecould simply walk up to a banking machine and tell it that humans accomplish speech perception via a special-our password, the amount of money we want, and so purpose device unique to the human brain. For reasonsforth. Unfortunately, it wasn’t that simple. As it turns that we will come to shortly, they were also persuadedout, there is no clean, isomorphic relation between the that this “speech perception device” is innate, up andspeech sounds that native speakers hear and the visual running in human babies as soon as they are born. Itdisplay produced by those sounds. Specifically, the was also suggested that humans process these speechrelationship between speech signals and speech sounds not as acoustic events, but by testing the speechperception lacks two critical properties: linearity and input against possible “motor templates” (i.e., versionsinvariance. of the same speech sound that the listener can produceLinearity refers to the way that speech unfolds in for himself, a kind of “analysis by synthesis”). Thistime. If the speech signal had linearity, then there idea, called the Motor Theory of Speech Perception, waswould be an isomorphic relation from left to right offered to explain why the processing of speech isbetween speech-as-signal and speech-as-experience in the nonlinear and invariant from an acoustic point of view,speech spectrogram. For example, consider the and why only humans (or so it was believed) are able tosyllable “da” displayed in the artificial spectrogram in perceive speech at all.Figure 3. If the speech signal were linear, then the first For a variety of reasons (some discussed below)part of this sound (the “d” component) should corre- this hypothesis has fallen on hard times. Today we findspond to the first part of the spectrogram, and the a large number of speech scientists returning to the ideasecond part (the “a” component) should correspond to that speech is an acoustic event after all, albeit a verythe second part of the same spectrogram. However, if complicated one that is hard to understand by looking atwe play these two components separately to a native speech spectrograms like the ones in Figures 2-3. Forspeaker, they don’t sound anything like two halves of one thing, researchers using a particular type of“da”. The vowel sound does indeed sound like a vowel computational device called a “neural network” have“a”, but the “d” component presented alone (with no shown that the basic units of speech can be learned aftervowel context) doesn’t sound like speech at all; it all, even by a rather stupid machine with access tosounds more like the chirp of a small bird or a speaking nothing other than raw acoustic speech input (i.e., nowheel on a rolling chair. It would appear that our “motor templates” to fit against the signal). So theexperience of speech involves a certain amount of ability to perceive these units does not have to bereordering and integration of the physical signal as it innate; it can be learned. This brings us to the nextcomes in, to create the unified perceptual experience that point: how speech develops.is so familiar to us all. 7