AT&T Researchers — Inventing the Science Behind the Service

Each year, AT&T Research through its AT&T Labs Fellowship Program (ALFP) offers three-year fellowships to outstanding under-represented minority and women students pursuing PhD studies in computing and communications-related fields. This year three students received fellowships, including Marcelo Worsely.

Marcelo Worsley’s interest in technology is really about how people experience technology and, more specifically, how technology can better help people learn.

Only 25 he already has years of experience helping young people learn and interact with technology.

Since high school he’s tutored and coached kids through programs such as MATHCOUNTS and Intel Computer Clubhouse. More than this, as an undergraduate, he recruited others to set up a distance tutoring program that used video conferencing to teach medical education to students in Africa (the program also raised more than $200,000 in medical equipment for Ethiopian schools). Education is something he cares about and he unstintingly volunteers his time.

He’s currently pursuing an education PhD at Stanford in Learning Sciences and Technology Design. This interdisciplinary program systematically applies technological and psychological processes to understand how people learn and how to design technologies that support learning in the classroom. A key component is being able to capture clues to the learning process.

Marcelo’s summer project at AT&T Research centers on multimodal interfaces that incorporate speech and gestures to make it easier and more natural for people to interact with technology. The idea is for devices to capture and interpret people’s natural ways of communicating with one another. Someone using a multimodal interface should be able to point at a map while saying “this is where you need to be” and for the interface to associate the gesture with the utterance and interpret both inputs.

Futuristic as that seems, natural interfaces may not be that far off. Low-cost sensors for capturing speech and gestures—microphones, accelerometers, and cameras—are all around, embedded in iPhones and other mobile devices (which can do double-duty in transmitting input to speech and gesture recognition programs residing on a computer or in the cloud).

The technology exists. What’s missing is a clear idea of what a natural interface will look like. With no start button or enter key, how best to signal the start of input to a device? A hand motion, a spoken command, a long pause? What’s most natural to people while also effective for devices?

With natural interfaces allowing people to talk and gesture freely and at the same time, there’s the added difficulty of aligning two or more input streams. Most work on speech recognition or computer vision has focussed on a single mode, and many questions remain regarding the most effective methods for robust integration (or fusion) of inputs from multiple different input modes. Gestures may precede, accompany, or follow spoken inputs, and handling groups of gestures within a single utterance can be complex.

A big challenge of multimodal interfaces is aligning different input streams

For these questions and more, Marcelo is developing a theoretical framework to support multimodal interfaces for use anywhere: in the home, office, commute, or classroom.

It’s a big project entailing many different technologies and systems, any one of which could by itself be the basis for many years of work. But Marcelo, according to his mentor Michael Johnston, already has an impressively broad range of technical experience and skills, thanks both to the interdisciplinary nature of his study and his many extra-curricular activities. It also helps he’s good at programming, a skill acquired on the job at Accenture Technology Labs (where he worked for two years and also learned about artificial intelligence). For the one area he had no previous experience, speech, he’s playing rapid catch-up.

When Marcelo returns to Stanford in the fall, where he and his advisor Professor Paulo Blikstein are investigating how to characterize learning, Marcelo hopes to parlay lessons from the project—specifically the capture of speech, gestures, and actions—to find new ways to continuously evaluate student progress.

While it may be as simple as inferring increasing interest from students’ more frequent hand raising in class, Marcelo’s particularly intrigued in capturing and classifying students’ speech, watching how over time students adapt a new vocabulary or better articulate a concept as they absorb a lesson.

At the same time, Marcelo will continue working on multimodal interfaces with his AT&T mentor (ALFP collaborations are typically long term in nature). AT&T Research will be steps closer to understanding what’s required of natural interfaces, and Marcelo will have a head start on building technologies for the classroom.