Pages

Wednesday, July 3, 2013

Interview with Susan Athey on Big Data and Other Topics

Douglas Clement has a characteristically excellent interview with Susan Athey, appearing in the June 2013 issue of The Region, which is published by the Federal Reserve Bank of Minneapolis. Athey is a professor at Stanford's Business School, winner of the John Bates Clark medal in 2007, and also has been Chief Economist at Microsoft since 2007. Here are some snippets:

On whether the arrival of "big data" means that theory is now less important:

"In fact, the need for theory is in some ways magnified by having
large amounts of data. When you have a small amount of data, you can
just look at the data and build your intuition from it. When you have
very large amounts of data, just taking an average can cost thousands
of dollars of computer time. So you’d better have an idea of what
you’re doing and why before you go out to take those averages. The
importance of theory to create conceptual frameworks to know what to
look for has never been larger ... I think what is true is that when
you have large amounts of data, if you ask it the right questions, you
have a greater ability to let the data speak, and so you can be much
less reliant on assumptions. But you still need a strong conceptual
framework to understand what’s coming out.

"And I would say in the business world, this is where there’s an
enormous scarcity of talent. I see that there are a fair number of
statisticians out there, not nearly enough, but a fair number of data
scientists out there. There’s a huge demand for them still. But among data scientists, the ones who can define a question and
introduce a new way of looking at the data—those data scientists are
rock stars. They’re pursued by every company and they move up the
hierarchy very quickly. They’re giving presentations to top executives
and are extraordinarily influential. And there are never enough of
them."

Why economics should focus more on issues of big data:

"I think that the data scientists should take a little more
economics. That would help; economics puts a lot of emphasis on the
conceptual framework. And I also think that economics should be paying a
lot more attention to the statistics of big data.
Right now, economics as a profession has very little
market share in the business analysis of this big data. It’s mostly
statisticians. We’re just not training our undergraduates to be
qualified for these jobs. Even our graduate students, even someone with a
Ph.D. from a very good economics department really doesn’t have the
right skills to analyze the kinds of data sets that big Internet firms
are creating. ... We’re a little
bit behind. Econometrics, at the undergraduate level, is not
appreciated as much as an expertise that’s extremely important for
future employment, and we certainly don’t see a lot of economics majors
going on to take extra steps beyond what’s required....I really think we need to make some changes in education. What
happens at the top Ph.D. programs isn’t going to really impact the
overall workforce. But what we do at the undergraduate level and
whether we start offering more advanced or master’s level courses
becomes more important—because, really, with just an undergraduate
degree it’s hard to be very successful on the technical side at any of
these firms."

How big data will generate future productivity gains:

"Companies in all sorts of different
industries are starting to generate large amounts of data. The Internet
companies were built from the ground up on that data. Other companies
are just starting to think about what they do with the data. If you think about these kind of general purpose innovations like
the computer, it took us a while to figure out what to do with the
computer. It replaced the secretary and the typewriter, but it took
another 15 years before the personal computer really changed the way we
do commerce, which you would say really comes with the Internet and
businesses being built around it."

"With the big data, of course, the Googles and the Facebooks and so
on were born on that. But if you take, say, a car manufacturer that
might be getting real-time information from monitoring devices within
the cars, there’s a first level of things you can do with that data.
Like you can look at aggregate failure rates, or something, for certain
types of things. You can identify problems."

"But there’s a whole other level of optimizations that can be done.
And I think that idea will apply across many industries. They’ll start
with just the basics of, let’s figure out how to prioritize problems.
For example, with software you can get telemetry data about, where are
the bugs? What’s causing crashes? That’s sort of the first level of
what you do with data: You use the data to identify problems and make
priorities. The more frequent the crash, the higher you prioritize in
fixing that problem. But there’s a next level, which includes real-time machine
learning, customization, personalization, optimization, where industry
as a whole is just inventing what to do with it. And there could be
some really radical breakthroughs in different industries. They’re just
very hard to anticipate as they start to use these data."

On the idea that auction design needs to focus not only on getting the highest bid, but also on attracting lots of bidders:

"So if you’re thinking about how to design an auction, or how to
design a market more generally, even though it can be tempting to focus
on what happens once the people are in the room, it can be more
important to start with designing your marketplace to get people to
come, to start with. This insight is one that I’ve brought to other settings. I think,
for example, it applies in online auctions. When a large company like
eBay or an online advertising firm is designing its marketplace, for
example, it can be more important to design your marketplace to attract
bidders and make sure they’re there to participate than it is to try
to extract every last cent out of them once they get there. If
potential bidders are not making enough profit to make it worth their
time to come, they won’t come. And thin markets can be much more
problematic."

On the difference between profit-seeking search engines and competitive search engines:

"[A] profit-maximizing search engine cares how much surplus
the advertisers get versus the search engine. As a result of that, a
monopolist search engine will tend to raise reserve prices [meaning the
lowest price they’ll accept] too high in order to extract more surplus
from the advertisers even if it means eliminating ads that the
consumers might have liked to see. In contrast, a competitive search engine—one that’s competing for advertisers and users—will be more likely to choose the welfare-maximizing point. A more realistic model would also incorporate the other content
that gets crowded off the page by the ads; such a model would be more
likely to see a monopolist search engine put up too many ads relative
to what consumers would like, but again competition would typically
push a firm closer to welfare maximization in order to keep both sides
of the market participating."