Viterbi Voice for Kids Interface Wins IEEE Signals Prize

Three-part project teaches machines to understand children's speech

Creating a system that lets children talk to computers instead of
using of conventional mouse or keyboard controls won the 2005 IEEE
Signal Processing Society "Best Paper" Award for the USC Viterbi School
of Engineering's Shrikanth Narayanan and a collaborator.

Agent Chimp understands children — or at least some of the things they say

Narayanan, who holds appointments
in the Viterbi School's departments of electrical engineering and
computer science, as well as the USC college department of linguistics,
based the paper on research done in 2000-01 with his co-author,
Alexandros Potamianos of the University of Crete.

The award will be presented at the annual meeting of the 16,000-member
society, to be held this year May 14-19 in Toulouse, France.

First, it describes
at length the particular problems of creating systems that can
recognize children's speech, which is acoustically quite distinct from
that of adults. Children also have much wider variation in their
pronunciation of words than do adults, creating additional
difficulties.

The bottom line was that standard methods for Automatic Speech
Recognition (ASR) did four times worse on children's speech than adults.
However, special adjustments made by Narayanan and Potamianos were able
to bridge the gap and bring the error rate down into the standard adult
ranges.

But is voice control a useful and effective technique for children? The
next part of the study was a controlled "Wizard of Oz" setup in
which children played a well-known educational game (Where in the USA
is Carmen Sandiego). Half of the children used the standard mouse and keyboard
techniques. The other half spoke their commands and choices, which an
unseen human observer ("the Wizard) then executed.

Quizzed afterward on how they liked playing using voice versus mouse, an
overwhelming number loved it -- "Ninety-three percent rated the interface 4 or
5."

The final element in the paper describes how the researchers built an interface for a
simple game using ASR. The prototype was a program that prompts children to play a spelling
game, while also casually interacting with them and offering praise. The character was Agent Chimp,
and while the game was elementary, it was effective in holding the attention of the eight small childen ( ages 8-14) who played.

Narayanan: "These
ideas will be used in some of the advanced virtual learning
environments that we are trying to create presently at USC." (photo Abigail Kaun)

"Overall, the prototype represents a successful first effort at
building a multimodal system for children with an emphasis on
conversational speech," concluded the authors at the time." We expect
the data from such prototypes will help further conversational
human-machine interaction."

In fact, according to Narayanan, this is happening: "Some of the work
in this paper serves as a basis for a current projects on automated
literacy assessment [for young children] funded by NSF, and we are
hoping that some of these ideas will be used in some of the
advanced virtual learning environments that we are trying to
create presently at USC," he said.

NSF funded the research described in the paper.

The
"Best Paper" honor is only the latest distinction for Narayanan, who was named a fellow of the Acoustical
Society of America in November 2005. A lengthening page of media stories
chronicles the youthful investigator's efforts in such fields as
voice-to-voice speech translation, laughter analysis and production,
and answering system devices that detect irritation in callers
voices.