Infants able to identify humans as source of speech, monkeys as source of monkey calls

October 19th, 2009 in Medicine & Health / Psychology & Psychiatry

Infants as young as five months old are able to correctly identify humans as the source of speech and monkeys as the source of monkey calls, psychology researchers have found. Their finding, which appears in the latest issue of the Proceedings of the National Academy of Sciences (PNAS), provides the first evidence that human infants are able to correctly match different kinds of vocalizations to different species.

The study's co-authors were: Athena Vouloumanos, an assistant professor in New York University's Department of Psychology; Madelynn Druhen, a doctoral candidate in the Department of Psychology at the University of North Carolina at Greensboro; Marc Hauser, a professor in Harvard University's Departments of Psychology and Human Evolutionary Biology; and Anouk Huizink, a researcher in McGill University's Department of Psychology. The research was conducted at the McGill Infant Development Centre and the NYU Infant Cognition and Communication Lab, under the direction of Vouloumanos.

While young children know that humans speak, monkeys grunt, and ducks quack, it's not clear when we come to know which vocalizations each of these animals produce. Although much is known about infants' abilities to match properties of human voices to faces, such as emotion, it is unknown whether infants are able to match vocalizations to the specific species that produces them. In the PNAS study, the team of psychologists explored this question by asking whether young infants expect humans, but not other animals, to produce speech, and also, whether infants can identify the sources of vocalizations produced by other species.

To do so, the researchers showed five-month-old infants from English- and French-speaking homes a sequence of individually presented pictures of human faces and rhesus monkey faces paired either with human speech or with rhesus vocalizations. They then examined whether infants preferentially attended to the human faces when human vocalizations were presented (two Japanese single words "nasu" and "haiiro"), and whether infants preferentially attended to the rhesus faces when rhesus vocalizations (a coo and a gekker call) were presented. Previous research has revealed that when presented with audiovisual stimuli, infants tend to look longer at sounds and images that correctly match, so the researchers predicted that if infants identified the sources of vocalizations, they would look longer when the vocalizations and faces matched.

As the researchers had predicted, the results showed that the infants looked longer at the pictures of human faces when human speech was presented and looked longer at pictures of rhesus monkey faces when rhesus vocalizations were presented. Surprisingly, however, infants weren't able to match human-produced non-speech vocalizations, like laughter, to humans, suggesting that infants are especially tuned at an early age to some of the functional properties of speech. The fact infants were able to correctly attribute even unfamiliar Japanese speech to humans bolstered the significance of the results.

However, a subsequent experiment designed to test infants' ability to identity non-human vocalizations revealed the limits of their recognition. The infants were given three acoustic stimuli-human speech, rhesus monkey calls, or duck calls-in tandem with the faces of humans and ducks. Unlike the initial experiment on human and rhesus monkey images and sounds, the infants did not look systematically longer at the duck face when it was presented with a duck vocalization, suggesting an inability to match ducks' faces with their sounds.

Infants' expectations about the sources of vocalizations seem not to be based on a simple association between faces and voices and extend beyond their specific experiences, the researchers concluded. This ability may help infants identify their conspecifics even when they are out of view and allow them to identify the human-produced speech sounds that are relevant for language acquisition.