'Ga-Ga' for Daddy's Data

When Massachusetts Institute of Technology professor Deb Roy wanted to study the ways caregivers influence how babies learn to talk, he decided to bring his work home.

How does the speech of caregivers affect how babies learn language? To find out, MIT Prof. Deb Roy recorded 8-9 hours a day from his son's first years of life and analyzed the data. He talked with WSJ's Shirley Wang at the TED conference about his findings.

Dr. Roy wired his house with 11 cameras and 14 microphones to record most waking moments of his son's first years. The cognitive scientist and his team are analyzing tens of thousands of hours of footage to learn how his son acquired language in real life.

One early result shows that all three of the child's primary caregivers—Dr. Roy; his wife, a speech professor at Northeastern University; and their nanny—developed a pattern to simplify their sentences until the baby learned a key word, such as "water." Only then would they return to full, adult speech patterns, according to a presentation Dr. Roy gave last week at the TED conference, a meeting of leaders from the worlds of technology, business, science and art.

Although it isn't clear whether this means that simplifying word use earlier on can speed up babies' language development, the type of data and the analytic techniques employed will help answer such questions in the future, says Dr. Roy.

Future analysis may yield insight into whether parents should speak to their infants in the same way they talk to adults, or if there is a benefit to using baby talk.

Most studies of language development take place in a lab, limiting the view of how parents interact with their kids.

The goal of this work was to study natural language patterns "to understand the process of how a child learns language," says Dr. Roy.

Studying social influences on language-learning in the lab is like telling someone: "We're taking you out of your social-operating system to study your social-operating system," he says.

The experiment, which involved capturing some eight to 10 hours of life a day, yielded 90,000 hours of video recording and 140,000 hours of audio.

Advances in computing technology and software now allow such huge volumes of high-quality data to be processed efficiently, and for data of one type, such as recorded speech, to be linked with images or location information or other contextual data. Having what Dr. Roy calls a "perfect memory store" allows researchers to find and visually present patterns that weren't visible before.

Since the data in this study came from just one household, it's difficult to generalize trends. In addition, privacy concerns may limit the ability to collect such detailed data in the real world. However, the experiment offers a glimpse into how scientists are learning to harness the power of complex real-world data to understand social influences on behavior.

In order to examine language development, the MIT research team first analyzed the video footage to track motion visually using "space-time worms"—squiggly lines that tracked each individual's movement around the house over time.

They then identified and traced the movements of Dr. Roy's son so they could know where and when to listen for language. Ultimately, they transcribed over 7 million words from the recordings to study when and how his language skills developed.

The technology allowed the researchers to not only map the transition of the baby's first word from "ga-ga" to "water," but also to look at what the caregivers were saying and doing at the same time as the baby learned the word.

Dr. Roy and his colleagues were recently awarded a federal grant to use these techniques to investigate language-learning of children with autism, and are about to embark on a study that wires up and records the daily activities of six households for at least several months.

The work also has implications for analyzing publicly available social data, such as comments on Facebook or Twitter. Dr. Roy co-founded a company called Bluefin Labs in 2008 to map in real time the responses from audiences to television shows and advertisements.

This copy is for your personal, non-commercial use only. Distribution and use of this material are governed by our Subscriber Agreement and by copyright law. For non-personal use or to order multiple copies, please contact Dow Jones Reprints at 1-800-843-0008 or visit www.djreprints.com.