Defining and Assessing Good Judgment

My 2005 book, Expert Political Judgment: How Good Is It? How Can We Know?, traces the evolution of this project. It reports a series of relatively small scale forecasting tournaments that I started in 1984 and wound down by 2003. A total of 284 experts participated as forecasters at various points. They came from a variety of backgrounds, including government officials, professors, journalists, and others, and subscribed to a a variety of political-economic philosophies, from Marxists to libertarians.

Cumulatively they made 28,000 predictions bearing on a diverse array of geopolitical and economic outcomes.

The results were sobering. One widely reported finding was that forecasters were often only slightly more accurate than chance, and usually lost to simple extrapolation algorithms. Also, forecasters with the biggest news media profiles tended to lose to their lower profile colleagues, suggesting a rather perverse inverse relationship between fame and accuracy.

The expert political judgment project also compared the accuracy track records of "foxes" and "hedgehogs" (two personality types identified in Isaiah Berlin’s 1950 essay The Hedgehog and the Fox). The more theoretically single-minded hedgehogs performed less well, especially on long-term forecasts within the domain of their expertise, than the more eclectic foxes.

These findings received considerable media attention and came to the attention of the Intelligence Advanced Research Projects Activity (IARPA) inside the United States intelligence community—a fact that was partly responsible for the 2011 launch of a four-year geopolitical forecasting tournament that engaged tens of thousands of forecasters and drew over one million forecasts across roughly 500 questions of relevance to U.S. national security. From 2011 to 2015, Barbara Mellers and I served as co-principal investigators of the Good Judgment Project (GJP), a research collaborative that emerged as the wide-margin winner of the IARPA tournament.

The aim of the tournament was to improve geo-political and geo-economic forecasting. Illustrative questions included “What is the chance that a member will withdraw from the European Union by a target date?” or “What is the likelihood of naval clashes claiming over 10 lives in the East China Sea?” or “How likely is the head of state of Venezuela to resign by a target date?” The tournament challenged GJP and its competitors at other academic institutions to come up with innovative methods of recruiting gifted forecasters, methods of training forecasters in basic principles of probabilistic reasoning, methods of forming teams that are more than the sum of their individual parts and methods of developing aggregation algorithms that most effectively distill the wisdom of the crowd.

Among the more surprising findings from the tournament were:

1. the degree to which simple training exercises improved the accuracy of probabilistic judgments as measured by Brier scores;
2. the degree to which the best forecasters could learn to distinguish many degrees of uncertainty along the zero to 1.0 probability scale (many more distinctions than the traditional 7-point verbal scale used by the National Intelligence Council);
3. the consistency of the performance of the elite forecasters (superforecasters) across time and categories of questions;
4. the power of a log-odds extremizing aggregation algorithm to out-perform competitors; and
5. the apparent ability of GJP to generate probability estimates that were "reportedly 30% better than intelligence officers with access to actual classified information."

These and other findings are laid out in the 2015 book, “Superforecasting.” My co-author Dan Gardner and I stress that good forecasting does not require powerful computers or arcane methods. It involves gathering evidence from a variety of sources, thinking probabilistically, working in teams, keeping score, and being willing to admit error and change course. We also suggest that the public accountability of participants in the later IARPA tournament boosted performance. Apparently, “even the most opinionated hedgehogs become more circumspect” when they feel their accuracy will soon be compared to that of ideological rivals.

I see forecasting tournaments as a possible mechanism for helping intelligence agencies escape from blame-game (or accountability) ping-pong in which agencies find themselves whipsawed between clashing critiques that they were either too slow to issue warnings (false negatives such as 9/11) and too fast to issue warnings (false positives). Tournaments are ways of signaling that an organization is committed to playing a pure accuracy game –and generating probability estimates that are as accurate as possible (and not tilting estimates to avoid the most recent “mistake”).

The Good Judgment research program continues to recruit new forecasters for new forecasting tournaments at www.goodjudgmentproject.com.

Committee on Behavioral and Social Science Research to Improve Intelligence Analysis for National Security (2011). Intelligence analysis for tomorrow: Advances from the behavioral and social sciences. The National Academies Press. Washington D.C.

Accountability and Attributions of Responsibility

I proposed in a 1985 essay that accountability is a key concept for linking the individual levels of analysis to the social-system levels of analysis. Accountability binds people to collectivities by specifying who must answer to whom, for what, and under what ground rules. Some forms of accountability can make humans more thoughtful and constructively self-critical (reducing the likelihood of biases or errors), whereas other forms of accountability can make us more rigid and defensive (mobilizing mental effort to defend previous positions and to criticize critics). In a follow-up 2009 essay, I noted how little we still know about how psychologically deep the effects of accountability run—for instance, whether it is or is not possible to check automatic or implicit association-based biases, a topic with legal implications for companies in employment discrimination class actions.

In addition, I have also explored the political dimensions of accountability. When, for instance, do liberals and conservatives diverge in the preferences for “process accountability” that holds people responsible for respecting rules versus “outcome accountability” that holds people accountable for bottom-line results? I call this line of work the “intuitive politician research program.”

Taboo Cognition and Sacred Values

I use a different “functionalist metaphor” to describe my work on how people react to threats to sacred values—and on the pains they take to structure situations so as to avoid open or transparent trade-offs involving sacred values. Real-world implications of this claim are explored largely in peer-review outlets such as the Journal of Consumer Research, California Management Review, and Journal of Consumer Psychology. This research argues that most people recoil from the specter of relativism: the notion that the deepest moral-political values are arbitrary inventions of mere mortals struggling to infuse moral meaning into an otherwise meaningless universe. Rather, humans prefer to believe that they have sacred values that provide firm foundations for their moral-political opinions. People can become very punitive “intuitive prosecutors” when they feel sacred values have been seriously violated, going well beyond the range of socially acceptable forms of punishment when given chances to do so covertly.

Political versus Politicized Psychology

I have a long-standing interest in the tensions between political and politicized psychology, arguing that most political psychologists tacitly assume that, relative to political science, psychology is the more basic discipline in their hybrid field. Political actors—be they voters or national leaders—are human beings whose behavior should be subject to fundamental psychological laws that cut across cultures and historical periods. I also raise the contrarian possibility in numerous articles and chapters that reductionism can run in reverse—and that psychological research is often driven by ideological agenda (of which the psychologists often seem to be only partly conscious). I have also developed variants of this analysis in articles on the links between cognitive styles and ideology (the fine line between rigid and principled) as well as on the challenges of assessing value-charged concepts like symbolic racism and unconscious bias (is it possible to be a “Bayesian bigot”?). I have also co-authored papers on the value of ideological diversity in behavioral and social science research. One consequence of the lack of ideological diversity in high-stakes, soft-science fields is frequent failures of turnabout tests (scientific-debate hypocrisy detectors).

Hypothetical Societies and Intuitions About Justice

In collaboration with Greg Mitchell and Linda Skitka, I have conducted research on hypothetical societies and intuitions about justice “experimental political philosophy”). The spotlight here is on a fundamental question in political theory: who should get what from whom, when, how, and why? In real-world debates over distributive justice, however, it is virtually impossible to disentangle the factual assumptions that people are making about human beings from the value judgments people are making about end-state goals, such as equality and efficiency. Hypothetical society studies make it possible for social scientists to disentangle these otherwise hopelessly confounded influences on public policy preferences.

ADDRESS

CONTACT

INTERVIEW

An interview with Phil Tetlock in which he describes the philosophy behind his most recent research on forecasting tournaments and the value they have both to individuals and the larger society. Read Article >