The fact that infants are able to learn language without any help from adults can sometimes seem almost miraculous. Not only do children learn to speak and understand language completely on their own, active teaching of language skills seems to make almost no difference in their ability to talk.

One of the first difficulties when learning a language solely from listening to spoken language is determining where one word ends and the next one begins. Native speakers of a language typically leave no audible space between words at all. Even “motherese” doesn’t leave any space between words — if anything the spaces are diminished: “issntdatacutewittlebaby!”

So how do babies learn where one word ends and the next one begins? A group of researchers including Luca Bonatti, Marina Nespor, Jacques Mehler, and Juan Toro, believes it has identified a key pattern that works in a wide range of languages: language learners look to patterns in the consonants for information about where words start and end; they look to vowels to understand the role of words in a sentence. The first part of their explanation was explored in 2005. Their newest paper, led by Toro, considers the second part of the problem. How did they do it? They invented a “language” that had a couple of very simple rules. See if you can figure out the rules by looking at the list of “words” below:

Get it? To be a “word” in this language, you must have three syllables, the first and last syllables must have the same vowel sound, and the consonants must follow the t-p-n pattern or the b-d-k pattern.

Paid Italian-speaking volunteers listened to a 10-minute recording of this language, but in the recording, each word was separated by one or two syllables randomly selected from the possible syllables in the language, like this:

tapenabepobedakenotopanokadebadoka

Then the listeners were tested on a new set of words to see if they had learned the 12 words in the language. They were given complete words like “tepane” or partial words from the set like “penabe.” Even though the second word follows the vowel pattern correctly, it doesn’t have the correct consonant pattern, so the correct answer is “no.”

Respondents were an average of 63 percent correct on this test. They were also tested on whether they could generalize the pattern to other vowel sounds: “biduki” would be a word, but “biduku” wouldn’t. Respondents were 67 percent correct on this test — in both cases, significantly better than random chance.

In a second experiment, the roles of the consonants and vowels were reversed: words had to have an a-u-E or i-e-o vowel pattern, and the first and last syllables shared the same consonant sound. This time, listeners couldn’t accurately identify whole versus partial words, and they couldn’t generalize apply the rule to other consonant sounds.

But maybe the reason they couldn’t generalize was simply because they couldn’t identify the words to begin with. So in a third experiment, the researchers added short pauses after each word. Now listeners were accurate in identifying words they had heard before, but still couldn’t generalize to other consonant sounds.

In a final experiment, the consonant pattern was made even easier to recognize. All real words repeated the same consonant sound three times: “bibebo” was a word, but “binebo” was not. Again there were pauses after each word. As before, listeners could distinguish whole words from partial words, but couldn’t generalize the pattern to other consonant sounds.

The researchers conclude that listeners look to vowels and consonants for different types of information. This basic pattern can help language learners begin to understand what a word is, and eventually to assign a meaning to that word and understand its role in a phrase.

Finally, I couldn’t do better than the authors’ own explanatory figure for the study, so I’m duplicating it here (click for larger version):

Comments

Native speakers of a language typically leave no audible space between words at all. Even “motherese” doesn’t leave any space between words — if anything the spaces are diminished: “issntdatacutewittlebaby!”

Is this true? My subjective impression is that child-directed speech often involves explicitly segmenting words, particularly nouns. Take an instance where a parent points at a dog and says “Dog!” or “Doggy!”, often repeating. I’m not familiar with CDS corpus studies, but is there actually any evidence regarding how much of CDS is segmented versus unsegmented?

A group of researchers attempts to teach iCub, a robot, language. The approach is to work with language development specialists who research how parents teach children to speak. The iCub is supposed to learn in a way which is closer to the human experience.

In your article however, you state, children learn without the help of an adult. Just by discerning vowels and consonants a human can eventually learn a new language. So, is the ability to speak and understand a language innate? And does this mean, a robot without this in-built language skill can not learn how to speak?

Derek,
From my experience, it depends on the purpose of the interaction. When your conciously trying to teach a child a word, you’ll slow down like you describe. But most of the time, people interacting with a child will get all high pitched and squeeky and their words will bunch together.

@June: The idea that the human capacity for language is innate (or, “the Innateness Hypothesis”) isn’t exactly new. The idea is that any human child has the capacity to learn a language as long as they have some contact with that language. In fact, case studies show that two isolated children with no outside human contact will spontaneously create a language.

The biggest mysteries are how exactly this is done — what clues we’re keyed to listen for in a string of speech.