Trending Upward

How the intelligence community can better see into the future.

The next 15 years will witness the transformation of North Korea and resulting elimination of military tensions on the peninsula. No, this is not our rosy assessment of Northeast Asian politics or the reformist goals of Kim Jong Un. It was the verbatim prediction of the senior-most officials in the U.S. intelligence community — 15 years ago. Needless to say, the Stalinist regime, though hardly the picture of health, remains untransformed. In fact, Pyongyang has since tested nuclear weapons, and relations between North and South show little sign of improving; military tensions are high.

One suspects the analysts who wrote that line regret it. But the truth is that prediction is hard, often impossible. Academic research suggests that predicting events five years into the future is so difficult that most experts perform only marginally better than dart-throwing chimps. Now imagine trying to predict over spans of 15 to 20 years. Sisyphus arguably had it easier. But that has not deterred the intelligence community from trying; that is its job.

Starting with the 1997 release of Global Trends 2010 — the report that featured the North Korea prediction — the National Intelligence Council (NIC) has repeatedly tried to predict the trajectories of world politics over a 15-to-20-year period. These predictions run the gamut from a 1997 prediction that Saddam Hussein would no longer rule Iraq by 2010 to the more generic prediction of global multipolarity by 2025 in the most recent report. These predictions are the product of hard work by talented analysts who work under political pressures and intellectual constraints. And, in any case, we are skeptical how much better than chance it is possible for anyone to do in forecasting 15 to 20 years into the future.

That said, when we look at these reports in light of recent research on expert judgment, we cannot help wondering whether there are not ways of doing a better job — of assigning more explicit, testable, and accurate probabilities to possible futures. Improving batting averages by even small margins means the difference between runner-ups and World Series winners — and improving the accuracy of probability judgments by small margins could significantly contribute to U.S. national security.

How Have They Done So Far?

The original Global Trends report came out of a series of conferences held by the NIC and the Institute for National Strategic Studies at National Defense University. The idea was to "describe and assess major features of the political world as they will appear in the year 2010." The report self-consciously focused on what it called evolutionary changes in world politics, positing that truly disruptive changes are too rare and difficult to predict. Three more reports, Global Trends 2015, Global Trends 2020, and Global Trends 2025followed. The NIC is now finalizing Global Trends 2030, which will be released later this year.

There are several potential grounds for criticizing Global Trends reports. The reports almost inevitably fall into the trap of treating the conventional wisdom of the present as the blueprint for the future 15 to 20 years down the road. Many things the early reports get right, such as the continued integration of Western Europe, were already unfolding in 1997. Similarly, predicting that "some states will fail to meet the basic requirements that bind citizens to their government" or that information technology will have a large impact on politics was hardly going out on a limb.

Looking carefully at the first two Global Trends reports reveals how the reports have struggled to make accurate non-obvious predictions of big-picture trends. (It is harder to assess Global Trends 2020 and 2025 because we are still so far away.) There are some things the Global Trends reports got right, like Saddam Hussein leaving office before 2010, but many others they missed. Consider how the reports treat the rise of China. Global Trends 2010 predicted that "While China has the potential to become the region’s dominant military power, it is beset by significant internal problems that in our judgment will preclude it from becoming so during this time frame." The report then goes on to predict the present (circa 1997): offering a thoughtful analysis of the large internal problems facing the Chinese government. Global Trends 2015, published in December 2000, contained similar statements. What can we make of this? On the one hand, both reports were technically correct. Yet it was not until Global Trends 2020, written in 2004, that the report fully embraced the key trend: the notion of a rising China, which by that time was simply predicting the present.

Global Trends 2010 and Global Trends 2015, the two reports written before the 9/11 attacks, also underplay the threat of terrorism. Global Trends 2015 includes a paragraph on the risk to the United States in a laundry list of threats, but neither report references al Qaeda or comes close to predicting the events of the last 11 years.

The reports also engage in extensive hedging. For every prediction, there is a caveat. The reports lean heavily on words such as "could," "possibly," and "maybe." The lead-in to Global Trends 2025 uses "could" nine times in two pages, and the report as a whole uses the word a whopping 220 times. The report also uses "maybe" 36 times. Global Trends 2020 uses "could" 110 times. Add all of the caveats and conditionals, and a harsh critic might conclude that these reports are saying no more than that there is a possibility that something could happen at some point — and it might have a big effect.

A different form of hedging arises in Global Trends 2015,which devotes a page to "significant discontinuities" that were not built into the scenarios but could disrupt trends noted elsewhere. These include things that have now occurred, such as domestic turmoil in Egypt, as well as things that may well never occur, such as a China-India-Russia alliance and an Asian Trade Organization that cuts off the United States. So many possibilities have been laid out that, no matter what happens, there is something someone could point to and say, "We predicted that."

These criticisms are not entirely fair, however. The Global Trends reports, for example, generally avoid low-hanging fruit. There are large pockets of stability in the world, such as Western Europe, and Global Trends does not pat itself on the back by "predicting" yet another decade of peace between France and Germany. The reports also mostly avoid TED-style buzz speak about disruptive change and synergies. Although qualification-ridden, the reports are well written and fill an important role: laying out the combined wisdom of the intelligence community and select outside experts at a given time on the future of world politics. The Global Trends reports are also an arguably rational bureaucratic response to an impossible political task: to signal that the intelligence community is thinking hard about the future and not simply assuming that things will continue because that it is how things usually work out.

There is also value to conducting Global Trends exercises even if the results only minimally improve our knowledge of the future. The process of creating these reports links the intelligence community with smart outsiders — in academia and business — who have different perspectives on the world. The reports also force policymakers to step back from the day-to-day and think hard about big-picture trends. Producing Global Trends may — and we should treat this as a testable hypothesis — help analysts become better short- and medium-term forecasters.

So, What Can We Expect?

Intelligence agencies are relentlessly second-guessed when they get it wrong, but the second-guessers rarely ask the deeper question: What level of accuracy is it realistic to expect — and how can we help agencies reach it?

The core challenge confronting Global Trends and related exercises is that, all else being equal, our predictive accuracy falls sharply the further out we try to look. Research shows that expert predictions over five years are often no better than chance. Tetlock’s book, Expert Political Judgment, demonstrates that confident experts with deep knowledge in particular areas — the "hedgehogs" in Isaiah Berlin‘s typology of intellectuals — are particularly likely to get things wrong in the long run. From this standpoint, the caveats in Global Trends reports are salutary: They acknowledge the inherent uncertainty of political life. Modesty about long-term forecasting is a useful antidote to overconfidence. The most accurate experts in Tetlock’s book were the most diffident about their skills — self-styled "foxes" who knew a bit about many things and were not wedded to one way of viewing the world. But even those forecasters, the best in show, could not see far into the future.

Given these constraints, imagine how hard it is to forecast over 15 to 20 years. You might get things right sometimes, but even when you do, it will rarely be for the right reasons. As scholars from Nobel Prize-winning psychologist Daniel Kahneman to political scientist Robert Jervis have noted, the world is vastly more complicated than our mental models of the world. The cognitive biases that distort individual judgment and the irreducible uncertainties in volatile polities make forecasting something that falls between difficult and impossible.

We see evidence in the Global Trends reports of many "best practices" recommended by thoughtful psychologists, network theorists, and management consultants. These best practices include the sophistication of the expert talent pool, the self-conscious search for diversity of perspectives, rigorous attempts to prevent groupthink and encourage constructive confrontation of clashing views, and skillful deployment of scenario analysis to facilitate divergent thinking and prevent the status quo from excessively anchoring down estimates of the potential for change. But we also see potential "process" deficiencies, which make it difficult both to assess how accurate the reports are and to test alternative methods of making them more accurate.

How to Get Better (Maybe)

Let’s compare the long-term forecasting exercises of the NIC with the much shorter-range forecasting competition sponsored by Intelligence Advanced Research Projects Activity (IARPA). IARPA is sponsoring a political-forecasting Olympics in which research teams are competing over the next four years to come up with the most innovative ways of assigning the most accurate probability estimates to outcomes that the intelligence community cares about. The teams in the IARPA tournament consist largely of experimental psychologists, statisticians, and computer scientists who focus on extracting as much predictive value as they can from the qualitative arguments and probability estimates of thoughtful political observers of the world scene. (In the interests of full disclosure, Tetlock is a principal investigator for the team that "won" Year 1, and Horowitz is also on the team.) Whereas the Global Trends reports focus on plate tectonics shaping the political landscape over decades, the IARPA tournament focuses on more immediate issues: the likelihood of a country exiting the eurozone in the next few months or of a leader falling from power in the next six months.

IARPA requires forecasters to translate vague hunches onto explicit probabilities that can be scored for accuracy. Work to date within the IARPA program has revealed that it pays off to obtain quantitative probability estimates, aggregate those opinions, and track the better forecasters and give them more weight, among other tools. IARPA’s first-year estimates suggest that the best combination of methods yields a roughly 60 percent improvement in predictive accuracy above the unweighted crowd average.

Can we apply these methods to the Global Trends enterprise? One challenge is that the Global Trends reports cover decades, while the longest predictions in the IARPA tournament are approximately two years. We do think, however, that some specific steps could be taken to bridge the gap between IARPA methods and Global Trends practices. To be clear, there are no guarantees that following our recommendations will boost predictive accuracy. Fifteen to 20 years may simply be too far out. But even knowing that would be helpful. (Once you know the limits on how far you can see at night, you also know the limits on how fast you should drive at night.)

Explicit quantification: Exposing policymakersto "deep thoughts" about the future has value, but the failure to offer even broad ranges of probability estimates for outcomes is unfortunate. This allergy to quantifying uncertainty makes it difficult to gauge how much uncertainty the expert community sees — and impossible to evaluate accuracy. The Global Trends reports do not use the "estimative language" rubric the NIC has recently used to consistently convert words into probabilities (a move in the right direction but still too vague). The core problem here is that when you ask readers what "could" could mean, you get a staggering range of estimates, from 0.00000001 (an asteroid could strike in the next few hours and prevent us from finishing this manuscript) to 50 percent or higher (Obama could win reelection, given the polls up to Sept. 1). The Global Trends reports do not offer rough guidelines for translating hedging words into probability estimates, leaving it all to readers’ imaginations.

There are two possibilities here. One is that the probability ranges are not nearly as wide as the vague verbiage allows but the authors want political cover in case the unexpected occurs. The other possibility is that the probability ranges are truly that wide, in which case Global Trends should divulge which schools of thought are generating the lower and upper estimates. That sort of predictive-track-record information could prove extremely useful when policymakers weigh the recommendations of differing schools of thought.

The best way to become a better-calibrated appraiser of long-term futures is to get in the habit of making quantitative probability estimates that can be objectively scored for accuracy over long stretches of time. Explicit quantification enables explicit accuracy feedback, which enables learning. This requires extraordinary organizational patience — an investment that may span decades — but the stakes are high enough to merit a long-term investment.

Signposting the future: The authors of these reports are keenly aware of the human tendency to project the present into the future ("anchoring on the status quo") — and they fight it. The key analytical weapon against "present-ism" in the Global Trends reports has thus far been scenario planning. Scenarios frequently pop up in Global Trends reports with catchy labels such as "A New Caliphate" and "Cycle of Fear." Each scenario is a plausible extrapolation of what could happen if certain causal drivers take hold.

These scenarios are useful, and we are not recommending dropping them. Scenario generation is a great way for imaginative analysts to channel their inner social-science-fiction writers. Organizations that want to retain talent should provide such outlets. But if scenario generation is not eventually subjected to rigorous logical discipline, the net value of the exercise plummets. For instance, research shows that the more scenarios participants generate, the more support they typically find for each one — and the higher their probability estimates go. What’s more, experts rely on the same crude sense of the balance of forces for estimating both the weekly and monthly probabilities, so if they can see a 1/20 chance of the regime falling in a given week, they will see roughly the same probability in a given month, an effect that Kahneman calls scope insensitivity. When we finish unpacking all scenarios and subscenarios, it is not unusual for the probabilities to sum to closer to 2.0 than 1.0, a logical impossibility.

Scenarios are more valuable when they come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another. For instance, Global Trends2025 outlines a "BRIC’s Bust-Up" scenario — a conflict between China and India over resources. Which signposts or early warning indicators might help policymakers know whether such a future is becoming more or less likely? Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.

Similarly, specifying signposts requires breaking 20 years into finer temporal segments. If we are on a historical trajectory leading to the Chinese Civil War of 2023, for example, what should we be observing in 2013, 2014, etc. to justify taking so speculative a scenario seriously? How diagnostic would these signposts have to be? One could imagine such a conflict erupting from tensions between the poor, neo-Maoist countryside and affluent technocratic coastal cities, but imagining is not enough. We need specifiable metrics.

Leveraging aggregation: Since James Surowiecki’s influential The Wisdom of Crowds, it has been well known that the average forecast is often more accurate than the vast majority of the individual forecasts that went into computing the average. Participants in Global Trends should therefore make individual probability judgments that can later be compared to the accuracy of various averages and weighted averages of those judgments. They should also get into the habit that some of the better forecasters in the IARPA tournament have gotten into: comparing their predictions to group averages, weighted-averaging algorithms, prediction markets, and financial markets.

Undoubtedly, some inside the intelligence community will reject these suggestions, especially explicit quantification. Recall the old Aristotelian maxim: Seek precision only insofar as the nature of things permits. But why not try? Explicit quantification of uncertainty helps in a wide range of professional domains, from finance to medicine to meteorology. We are not convinced that geopolitics falls in a category of its own when it comes to forecasting.

Perhaps the more serious objection is ultimately political. Analysts are right to worry that if they assign explicit probabilities to anything, partisan critics will jump on them whenever they appear to have either under- or overestimated anything. You should expect that when a perfectly calibrated forecasting system says 75 percent, for example, the lower-likelihood outcome will occur 25 percent of the time– and, by this curious standard, appear to be wrong 25 percent of the time. But try explaining that to a congressional subcommittee looking for scapegoats.

We do not have a solution to the political problem except to say that it should not be a deal-killer because (a) the intelligence community has the option of keeping score in private by maintaining internal classified records of explicit probability assessments; (b) it would be tragic if democracy were incompatible with efforts to improve the accuracy of our expectations of the future; (c) following our suggestions of creating clear metrics and breaking down predictions by time frames will make it easier for Global Trends authors to update their predictions and bring previously invisible low-probability possibilities into focus. The projections will be both more accurate and more defensible.

Our key arguments might appear in tension with each other. On the one hand, we are skeptical that even the most astute analysts working with the best methodological tools can see 15 or 20 years into the future. On the other hand, we think the National Intelligence Council should aggressively experiment with rigorous methods that would allow us to assess how accurately they can see into the future and with tools that have the potential to improve foresight.

Even if we were 80 percent or 90 percent confident that there is no room for improvement — and the Global Trends reports are doing as good a job as humanly and technically possible at this juncture in history — we would still recommend that the NIC conduct our proposed experiments. When one works within a government that routinely makes multibillion-dollar decisions that often affect hundreds of millions of lives, one does not have to improve the accuracy of probability judgments by much to justify a multimillion-dollar investment in improving accuracy.

Michael C. Horowitz is a professor of political science and the author of The Diffusion of Military Power: Causes and Consequences for International Politics. Twitter: @mchorowitz