Forecasting Fox

In 2006, Philip E. Tetlock published a landmark book called “Expert Political Judgment.” While his findings obviously don’t apply to me, Tetlock demonstrated that pundits and experts are terrible at making predictions.

But Tetlock is also interested in how people can get better at making forecasts. His subsequent work helped prompt people at one of the government’s most creative agencies, the Intelligence Advanced Research Projects Agency, to hold a forecasting tournament to see if competition could spur better predictions.

In the fall of 2011, the agency asked a series of short-term questions about foreign affairs, such as whether certain countries will leave the euro, whether North Korea will re-enter arms talks, or whether Vladimir Putin and Dmitri Medvedev would switch jobs. They hired a consulting firm to run an experimental control group against which the competitors could be benchmarked.

Five teams entered the tournament, from places like M.I.T., Michigan and Maryland. Tetlock and his wife, the decision scientist Barbara Mellers, helped form a Penn/Berkeley team, which bested the competition and surpassed the benchmarks by 60 percent in Year 1.

How did they make such accurate predictions? In the first place, they identified better forecasters. It turns out you can give people tests that usefully measure how open-minded they are.

For example, if you spent $1.10 on a baseball glove and a ball, and the glove cost $1 more than the ball, how much did the ball cost? Most people want to say that the glove cost $1 and the ball 10 cents. But some people doubt their original answer and realize the ball actually costs 5 cents.

Image

David BrooksCreditJosh Haner/The New York Times

Tetlock and company gathered 3,000 participants. Some got put into teams with training, some got put into teams without. Some worked alone. Some worked in prediction markets. Some did probabilistic thinking and some did more narrative thinking. The teams with training that engaged in probabilistic thinking performed best. The training involved learning some of the lessons included in Daniel Kahneman’s great work, “Thinking, Fast and Slow.” For example, they were taught to alternate between taking the inside view and the outside view.

Suppose you’re asked to predict whether the government of Egypt will fall. You can try to learn everything you can about Egypt. That’s the inside view. Or you can ask about the category. Of all Middle Eastern authoritarian governments, what percentage fall in a given year? That outside view is essential.

Most important, participants were taught to turn hunches into probabilities. Then they had online discussions with members of their team adjusting the probabilities, as often as every day. People in the discussions wanted to avoid the embarrassment of being proved wrong.

In these discussions, hedgehogs disappeared and foxes prospered. That is, having grand theories about, say, the nature of modern China was not useful. Being able to look at a narrow question from many vantage points and quickly readjust the probabilities was tremendously useful. The Penn/Berkeley team also came up with an algorithm to weigh the best performers. Let’s say the top three forecasters all believe that the chances that Italy will stay in the euro zone are 0.7 (with 1 being a certainty it will and 0 being a certainty it won’t). If those three forecasters arrive at their judgments using different information and analysis, then the algorithm synthesizes their combined judgment into a 0.9. It makes the collective judgment more extreme.

This algorithm has been extremely good at predicting results. Tetlock has tried to use his own intuition to beat the algorithm but hasn’t succeeded.

In the second year of the tournament, Tetlock and collaborators skimmed off the top 2 percent of forecasters across experimental conditions, identifying 60 top performers and randomly assigning them into five teams of 12 each. These “super forecasters” also delivered a far-above-average performance in Year 2. Apparently, forecasting skill cannot only be taught, it can be replicated.

Tetlock is now recruiting for Year 3. (You can match wits against the world by visiting www.goodjudgmentproject.com.) He believes that this kind of process may help depolarize politics. If you take Republicans and Democrats and ask them to make a series of narrow predictions, they’ll have to put aside their grand notions and think clearly about the imminently falsifiable.

If I were President Obama or John Kerry, I’d want the Penn/Berkeley predictions on my desk. The intelligence communities may hate it. High-status old vets have nothing to gain and much to lose by having their analysis measured against a bunch of outsiders. But this sort of work could probably help policy makers better anticipate what’s around the corner. It might induce them to think more probabilistically. It might make them better foxes.

My column on Tuesday about the shift
in progressive thought put a statue of a
rambunctious horse in front of the Labor
Department. It’s actually in front of the
Federal Trade Commission.