More Than Words Can Say

The Language of Gestures, as Translated by U. of C. Psycholinguist David McNeill

When Warner Brothers released the seven-minute cartoon Canary Row in 1950, it's a good bet no one realized they'd created an important tool in the study of human communication and cognition.

Sylvester the cat spies Tweety Bird in the upper-story window of a hotel. Sylvester can't just walk in, so he tries climbing up the downspout. Granny, Tweety Bird's protector, chases him down, so he tries climbing up the inside of the downspout, whereupon Tweety Bird drops a bowling ball down it. (With impeccable cartoon logic, Sylvester swallows the ball and then rolls into a bowling alley.) He masquerades as an organ-grinder's monkey and then as a bellboy, but Granny sees through his disguises and beats him with her umbrella. He swings over to Tweety Bird's window on a rope, and slams into a brick wall. Finally he tries crossing on the electric trolley wires over the street--only to be overtaken by a trolley driven by Tweety Bird and Granny, who chase him out of sight, sparks flying.

Since about 1980, Canary Row has taken on a second life in David McNeill's University of Chicago psycholinguistics laboratory. McNeill and a succession of graduate students have used it to study how and why people gesture as they talk, to tease out the hidden mental processes behind speech and gesture, and ultimately to question the popular model of the human mind as nothing more than a very complex computer.

Over the years, McNeill and his colleagues and students (among them Nancy Dray, Elena Levy, Justine Cassell, Laura Pedelty, and Karl-Erik McCullough) have shown Canary Row to graduate students, children aged 2 1/2 to 12, victims of aphasia, deaf people, "split-brain" patients, and speakers of languages as diverse as Swahili, Georgian, Chinese, and Japanese. Afterward, each viewer is asked to tell the story to a second person who has not seen the cartoon, in such a way that the second person can tell it again if he or she wants (usually, for some reason, he or she has no such desire). "We got all these narratives from adults on videotape," says McNeill, "and they were fascinating. It's taken me all this time to understand them." The University of Chicago Press will publish his results next year in a book tentatively entitled "Hand and Mind."

By examining these videotapes closely--at key points, frame by frame--McNeill has found that the gestures we spontaneously make in conversation are full of meaning. They are not "body language," unconnected with speech. They are not ethnic hangovers. They are not emotional outlets for overexcited storytellers. They are not a crutch for the inarticulate--nor a simple translation of the spoken word. McNeill contends, and his tapes show, that gestures and speech jointly form a single mode of expression stemming from the same underlying mental process. Together they are windows on the mind.

McNeill first began noticing gestures in 1962, as a young psychologist studying how children acquire language. When he moved from Berkeley to Harvard, he met two new academic colleagues whose gestures "looked to me like sculptors working in different media. One was always pounding and pushing some heavy blocklike stuff. I imagined that his medium was clay or marble. The other was drawing out and weaving some incredibly delicate spidery stuff. His medium looked like strings or spiderwebs."

But it wasn't until 1974 that McNeill began to take gestures seriously, when his four-year-old son started to use them in conversation for the first time: "Watching him got me thinking that there might be some kind of nexus between language and gestures that was actually quite deep."

That year McNeill was "surrounded by mathematicians" at Princeton's Institute for Advanced Study. "Mathematicians are interested in everything. My theory is that that's because you can't do mathematics for more than about two hours a day." At any rate, some of them became interested in the topic, and McNeill videotaped two of his mathematician friends having a technical discussion. He had the whole thing transcribed--writing down both speech and gestures and the timing of each gesture in relation to speech--and examined the tape again and again. "But I didn't know what to do with it," he says.

Spontaneous conversational gestures are vastly different from spoken language. Gestures are often images of reality--when you tell how Sylvester spots Tweety through a pair of binoculars, you might also peer through O-shaped hands in front of your eyes. Gestures are idiosyncratic--you might make the "binoculars" with your whole hand, and I might use just index finger and thumb, but neither of us is "wrong" in the way we would be if we said, "Binoculars Tweety ribbet Sylvester spies." Gestures don't combine into larger units the way words do; while language is analytic--putting the scene together out of pieces ("Sylvester," "Tweety," "binoculars," "through")--gestures are synthetic. One motion tells all.

On the other hand, under certain circumstances hand motions can be a language. The most familiar example is American Sign Language (ASL), but there are others, like the sign language used by women of the Warlpiri people in north central Australia, and sign languages invented by deaf children of parents who don't know any sign language (the subject of research by McNeill's colleague in psychology, Susan Goldin-Meadow).

Superficially ASL speakers might seem to be gesturing, but ASL is a complete language in which individual words combine into sentences according to fixed rules shared by a community of speakers. McNeill, citing other researchers, gives one striking example of how ASL differs from spontaneous conversational gestures: "The sign for 'slow' is made by moving one hand across the back of the other hand. The movement is performed slowly, and thus iconically demonstrates slowness. However, when the sign is modified to be 'very slow' it is made more rapidly." No one who described Sylvester sneaking very slowly up the stairs would spontaneously add a fast-moving gesture.

In between conversational gestures and true sign languages are various intriguing semilinguistic phenomena, among them pantomime and "emblems"--conventional gesticulations not usually accompanied by speech, among them "the finger" and the thumb-and-finger circle for "OK." Emblems have proven to be longer-lived than many languages. "While no spoken languages have lasted unchanged since Roman times," writes McNeill, "a number of emblem gestures are Roman and some are older than that. We apparently would have no difficulty making ourselves understood to a legionnaire if all we wanted was to say OK or offer a sexual insult."

McNeill describes all these other types of hand motions in order to move them out of the way so he can get on with spontaneous conversational gestures, which he classifies into four main types: Iconic gestures create concrete images that look like what they portray (binoculars, or Sylvester going up the pipe). Metaphoric gestures depict images of abstractions. One common to Western culture is the "conduit" gesture, which presents an idea or immaterial thing as a container. For instance, on McNeill's tapes, speakers typically extend both hands as if to offer the listener a large bowl while saying, "It was a Sylvester and Tweety cartoon," or "The next development in the plot is . . ." Deictic gestures point at things. And beats--physically the simplest gestures, often just the flip of a hand or jerk of a head--mark points of contrast in the narrative, or time-outs for added ("metanarrative") information.

McNeill reminds us that, as different as gesture and speech may seem, they occur together. Listeners rarely make gestures. "In about 100 hours of recorded narratives, only one gesture was made by a listener." Gesture and language express the same or similar meanings, they develop at the same time in children, and they decay together in aphasics. And because these two different ways of thinking are shoehorned together in the same process, says McNeill, they constantly play off each other and give human thought its dynamic, evolving quality.

"He's an absolutely wonderful, inspiring professor," says Justine Cassell, who was a graduate student under McNeill in the 1989-90 school year and who now teaches French, linguistics, and psychology at Penn State. "He is exceptional in his ability to create a nonhierarchical intellectual atmosphere." She was the one who nominated him for the Burlington Northern Foundation Faculty Achievement Award for graduate teaching, which he received last spring.

I asked Cassell if the study of gestures ever made her self-conscious about her own. Not really, she said, but she knows it has made others uncomfortable. At one conference where she presented a paper she authored with McNeill, every speaker who came after her prefaced his or her talk by saying, "I don't want anyone to look at my hands." McNeill contends that, contrary to conventional wisdom, people in academia make more, and more elaborate, gestures than others do. He writes wryly in his book of linguists who "create vast structures of conduit metaphoric gestures": "Often have I regretted not having video equipment with me at the talks of colleagues who so firmly believe that words, phrases, and sentences are the only substantive parts of languages." One such colleague spoke twice at the University of Chicago; on her second visit, McNeill asked if he could videotape her lecture. She declined. Says Cassell, "People still have some childhood sense that they might be giving something away."

It's not anything they need to worry about. McNeill writes about "mind reading" through gestures, but he adds, "I do not mean anything occult. . . . I mean noticing the gestures with which speakers unwittingly reveal aspects of their inner mental processes. . . . We can discover, for each person, what was highlighted, what was relevant and what not, and from this infer the imagistic side of their utterances." Different speakers, for instance, have different perspectives on the episode in which Sylvester climbs up the inside of the downspout. One speaker's hand moves upward with wiggling fingers to emphasize climbing; another contrasts this episode with the one before by making a "basket hand," fingers curled up, to emphasize that he's going up the inside this time. Evidently the first speaker was thinking of Sylvester's climbing as opposed to walking in the front door and up the stairs; the second speaker thought of his climbing up inside the pipe as opposed to outside. Nothing too embarrassing about those revelations--unless you're committed to the idea that gestures are irrelevant.

The hypothesis of "Hand and Mind," McNeill writes, "is that gesture and speech arise from a single process of utterance formation. The utterance has both an imagistic side and a linguistic side. The image arises first and is transformed into a complex structure in which both the gesture and the linguistic structure are integral parts."

Nonpsychologists may not realize the strength of the tide against which McNeill is swimming when he says this. Ever since the decline of behaviorism (whose hegemony had long kept scientists from thinking much about mental processes), in the early 1960s, most psychologists have worked from a computerlike model of the human mind. They believe that, though it may be difficult in practice, in theory all complex thoughts can be described in terms of simpler ones, and those in turn broken down into still simpler ones. For instance, they would break down the sentence, "And Sylvester goes up through the pipe this time," and its accompanying basket-hand gesture, into: (1) Who is the actor? Sylvester. (2) What is the action? Climbing. (3) Where is the action? Inside the drainpipe. (4) Check memory for previous climbing attempts. (5) Compare previous attempts with present one. (6) Compute appropriate gesture. Each of these six steps in turn could be broken down into still simpler operations, until we finally reached the irreducible "atoms" of speech and thought.

In this information-processing (IP) view, the way people think and speak can be described in flow charts made up of simple inputs, simple operations, and simple outputs, repeated as often as necessary. No one step stands out as more important than any other; as in a computer, there are simply "bits" of information that can be arranged in a variety of ways, depending on the use to which they're being put. This idea has appealed to philosophers and psychologists ever since ancient Greece, and it is even more seductive today, now that difficult calculations and games such as chess have been reduced by computer to enormously complex chains of on-off switches.

But McNeill, following the Russian developmental psychologist Lev Vygotsky, doesn't buy it. Instead of breaking down human utterances into ever simpler components, he prefers to break them down only into the smallest components that share the properties of the whole, or in other words, that are capable of growth from simple to complex forms. Let's take "And Sylvester goes up through the pipe this time" and its gesture once more. McNeill doesn't analyze this utterance at all in the way IP psychologists do. Instead, he seeks out the part of the utterance that offers the greatest contrast with its context, or the most new information--the point described variously by McNeill and others as the "growth point," the "psychological predicate," or the point of greatest "communicative dynamism." This is the point at which the speaker is most likely to gesture.

The "growth point" of this sentence, says McNeill, is located at the emphasized word "through." McNeill claims that as you prepare to say this sentence and make the accompanying basket-hand gesture, your mind is not assembling the elements IP-style, as though it were hooking a toy train together. He says the final utterance grows, like a living thing, from the unformed notion that Sylvester was going up through the pipe this time and not up the outside of it. This notion is a hybrid of words and images, and it germinates--at great speed and below the level of conscious thought--into a full-blown utterance in which the speaker says, "And Sylvester goes up through the pipe this time," raising her hand at "up" and making the basket-hand gesture at "through."

In McNeill's view, the comparison of inside to outside is not just one bit of information among many. The contrast is essential. "New thoughts always come in opposition to the speaker's internal mental context," he says. If Sylvester always climbed up inside the pipe and Tweety Bird threw something different down at him each time, then the "growth points" of the speakers' utterances would change.

But true devotees of information processing won't give up so easily. It seems somehow unscientific to have a hybrid gesture-and-speech unit at the heart of every utterance. Surely McNeill is just being stubborn, they might say. If we work hard enough, we can analyze a gesture or a "growth point" into smaller components. What business does McNeill, or Vygotsky, or anyone else have trying to make us stop our analysis at some arbitrary level?

McNeill's reply is as scientific as they come: what he sees on the tapes can be explained better his way than the IP way. One factor is the timing of gestures. Regardless of what language is being spoken, the word and the gesture always occur together. If they began as a single hybrid unit, this makes sense; if the IP view is true, it has to explain how and (especially) why gesture and speech are synchronized. In fact, McNeill points out, the speaker prepares her gesture (usually by raising hand and arm) a syllable or two before actually making the word-gesture utterance. If the gesture were the last thing computed, as in the simple six-step breakdown suggested above, then it would be quite odd that in actual speech it comes first--being prepared even before the relevant word comes out.

McNeill has another reason for not analyzing growth points IP-style, one that has to do with the very nature of gestures themselves. "The gesture presents a whole image at once, of Sylvester going up the inside. If it were in fact computed from elements, then its global imagery too would have to be computed." If IP were true, you would expect gestures to be linear and segmented, like language. You would have one gesture for Sylvester going up and another for being inside the pipe, just as we have separate words and phrases to express these ideas. To McNeill this is strong evidence that the original growth point includes an irreducible image as well as a verbal component.

McNeill is no dogmatist, however. He acknowledges that information-processing theories may be accurate about how to analyze speech within some contexts. Both schools of thought might be correct on different levels. (This sounds stranger than it is: analogously you could "analyze" a human being into either a fertilized ovum or a collection of molecules, atoms, and quarks. Both analyses would be true.)

Oddly enough, says McNeill, his natural intellectual allies are not psychologists or linguists but embryologists--because they too study how entities are able to grow and change on their own. "Most psychologists are terrified of this way of thinking," says McNeill. "They like to think in static terms, not where instability is part of the essence."

And that's why the study of gestures is more than just a charming backwater of psychological research. In McNeill's view it's a strategic battleground between two vastly different explanations of how people think. If he's right, and gestures and speech do stem from a single primitive utterance with facets of both, then that primitive component looks a lot more like a Vygotskyan growth point than it does an on-off switch.

But do gestures mean anything at all? It's not obvious that they do: a Dutch psychologist in 1989 published an entire 500-page book on speech and mentioned gesture only once, in passing.

The best evidence that gestures do carry meaning comes from a "mismatched gesture" experiment devised by McNeill. He had a storyteller recount Canary Row in the normal way but adding just a few gestures that were either inappropriate or just extra. For instance, he said that Sylvester "went down the street" while gesturing in an up-and-down fashion. Sure enough, when the listener retold the story from that version, she said that Sylvester "bounced" down the street! Not only did the original gesture carry a meaning beyond what was said, but in her own rendition the listener transferred that meaning from the gesture "channel" to the speech "channel."

The mismatch experiment also casts doubt on the commonsense idea that gestures simply translate into another form the sentences or phrases they accompany. McNeill records a case (one of many similar) where the speaker, telling about the time Granny whacks Sylvester with her umbrella and drives him off, says "and she chases him out again" while gesturing as if swinging an object through the air. Clearly the gesture contains information that's not in the sentence, so it is not just a translation of the words. (This example, incidentally, shows why McNeill adopted the seemingly cumbersome technique of having people retell a cartoon story: this way he already knows the story, so he can tell when a gesture represents real information and not just the speaker's personal quirk.)

Gestures also supply meanings that English doesn't provide. For instance, near the end of Canary Row, Sylvester is running along the trolley-car electric lines strung above the street. Granny and Tweety chase him in the trolley car, and every time the metal connector from the car to the overhead line hits Sylvester's foot, he gets a shock and leaps forward into the air. There is no handy English verb for this, so when one speaker said, "and he [Sylvester] steps on the part [of the trolley lines] where the streetcar's connecting," he gestured by bringing his flat open left palm into contact with his upward-extended right index finger several times. Why put it into tortuous words when you can make a picture?

The most dramatic proof that gesture doesn't simply translate speech occurs when a speaker tries to retell Canary Row under conditions of "delayed auditory feedback" (DAF). The speaker wears headphones, and through them hears his or her own voice from about 0.2 seconds earlier. Not surprisingly, this provokes involuntary hesitations and stutterings. (DAF is so effective, McNeill notes, that it can be used to determine whether someone is faking deafness.) The speaker, unable to speak as elaborately or as coherently as usual, winds up making more gestures than usual--transferring more meaning into the gesture channel--and simplifying his or her speech. In one extreme case, a speaker who normally made few gestures and spoke in long, complex sentences began to make more gestures and simpler sentences under DAF. Obviously the gestures weren't just translating the sentences, since the gestures now contained more information.

In the late 1930s David Efron, a student of pioneer anthropologist Franz Boas, attempted the first scientific study of gestures. Efron compared the gestures of first- and second-generation immigrant Jews, and found that the more assimilated members of the second generation wound up gesturing pretty much like other Americans and unlike the first generation.

Efron didn't try to study what gestures meant in relation to speech, or how they synchronized with it. Until videotape became common, it's hard to see how anyone could have. Gestures come and go so quickly that no sketch or diagram made on the spot can adequately display their shapes and timing. But videotapes can be replayed as often as needed and transcribed at leisure. (For his research, McNeill has his VCRs modified in a simple but expensive way so that they reproduce sound as well as pictures in slow motion.)

But when it comes to publishing his results, McNeill is no better off than Efron was half a century ago. The printed word and static pictures just don't capture the fluidity of gesture and speech in action. "I'm waiting for the day when books are all done in software form," he says, envisioning elaborate computer "windows" in which the reader can observe gestures and speech while pondering the printed commentary on them.

After making his case in "Hand and Mind" that gestures and speech are equal partners in communication, McNeill goes one step further and asks why people make gestures at all. Answer: it's part of how we think. Gestures "do not just reflect thought," he writes, "but have an impact on thought. Gestures, together with language, help constitute thought."

"I think information-processing theory has blinded people from seeing this," says McNeill. "The crucial unit combines visual-spatial imagery and sociocultural language. We can form mental units that are both. That's what makes us unique among species."

Thinking and speaking both bring together the personal, intuitive, imagistic, right-brain gesture channel with the socially standardized, rational, analytic, left-brain speech channel. "At the final stage, the gesture stroke and speech are integrated into a single performance," McNeill writes. "This is an act of communication, but also is an act of thought. Not only the listener but the speaker is affected. That is, the speaker realizes his or her meaning only at the final moment of synthesis."

Art accompanying story in printed newspaper (not available in this archive): photos/Charles Eshelman.