You can thank David Blei ’97 for all those personalized suggestions of
things to buy that pop up on your screen whenever you’re online. In
fact, the easiest way for this Columbia University computer science and
statistics professor to explain his field of expertise—“probabilistic
topic modeling”—is to talk about Amazon or Netflix or Etsy, the wildly
popular website that sells handmade goods and crafts. (Etsy uses a
variation of an algorithm that Blei cowrote.)

Courtesy David Blei

Amazon and Netflix use advanced formulas related to
probabilistic topic modeling on you. Their method for suggesting what
music or TV show you might like might use algorithms that not only
analyze your past choices but compare them with those of similar
shoppers.

The predictions can be amazingly—and eerily—accurate. Blei’s pioneering
doctoral work on probabilistic topic modeling at UC Berkeley first
allowed computers to summarize the content of a large collection of
data.

“It used to be difficult to handle 10,000 documents,” Blei says,
“but now we can process millions.” Because of his work, Blei was
awarded a prestigious National Science Foundation Presidential Early
Career Award for Scientists and Engineers in 2011.

The applications for Blei’s breakthrough go far beyond stores and
streaming services. Researchers can now scan visual images, like
photos, or find patterns among common ancestors by looking at genetic
data. Lawyers digging through subpoenaed e-mails, historians studying
government records, or anyone who doesn’t know what’s buried in a
mountain of data also stand to benefit. “Sure, you could hire a
thousand people to read every document and summarize them,” Blei says,
“but this does it automatically.”

At Brown, Blei concentrated in math and computer science, focusing on
artificial intelligence. At Berkeley he became interested in text data
mining, a direct precursor to his current work. With another grad
student and their adviser, he wrote the groundbreaking algorithm, known
as “latent Dirichlet allocation,” that spawned the field of
probabilistic topic modeling.

The ongoing challenge is to do more with increasingly large data sets
and to do it faster. Until now, many search engines have only hunted
for keywords that match a user’s specific search terms. A search for
“cat,” for example, would not turn up links to sites abut “felines.”

Blei’s algorithms may help to correct that deficiency. His model, for
example, could show that weather is the unifying subject among pages
discussing temperature, rainfall, and wind. “We need to take advantage
of all this information,” Blei says. “Something like keyword
search can fall short.”

At Princeton, where Blei previously worked, he teamed up with Ken
Norman, a neuroscientist, to study human memory. The researchers
performed brain scans on volunteers as they memorized and recited a
list of words. A computer program then examined the brain scans for
patterns that reveal how people store and retrieve language in their
minds

Amazon and Netflix use advanced formulas related to
probabilistic topic modeling on you. Their method for suggesting what
music or TV show you might like might use algorithms that not only
analyze your past choices but compare them with those of similar
shoppers.

Name and Class Year:

Email:

Comment:

Code:*

The Brown Alumni Magazine is published bimonthly, in print since 1900.