Uncertainty About How Best to Convey Uncertainty

NPR News ran a series of stories this week under the header Risk and Reason, on “how well we understand and act on probabilities.” I thought the series nicely represented how uncertain we are about how best to convey forecasts to people who might want to use them. There really is no clear standard here, even though it is clear that the choices we make in presenting forecasts and other statistics on risks to their intended consumers strongly shape what they hear.

This uncertainty about how best to convey forecasts was on full display in the piece on how CIA analysts convey predictive assessments (here). Ken Pollack, a former analyst who now teaches intelligence analysis, tells NPR that, at CIA, “There was a real injunction that no one should ever use numbers to explain probability.” Asked why, he says that,

Assigning numerical probability suggests a much greater degree of certainty than you ever want to convey to a policymaker. What we are doing is inherently difficult. Some might even say it’s impossible. We’re trying to protect the future. And, you know, saying to someone that there’s a 67 percent chance that this is going to happen, that sounds really precise. And that makes it seem like we really know what’s going to happen. And the truth is that we really don’t.

In that same segment, though, Dartmouth professor Jeff Friedman, who studies decision-making about national security issues, says we should provide a numeric point estimate of an event’s likelihood, along with some information about our confidence in that estimate and how malleable it may be. (See this paper by Friedman and Richard Zeckhauser for a fuller treatment of this argument.) The U.S. Food and Drug Administration apparently agrees; according to the same NPR story, the FDA “prefers numbers and urges drug companies to give numerical values for risk—and to avoid using vague terms such as ‘rare, infrequent and frequent.'”

Instead of numbers, Pollack advocates for using words: “Almost certainly or highly likely or likely or very unlikely,” he tells NPR. As noted by one of the other stories in the series (here), however—on the use of probabilities in medical decision-making—words and phrases are ambiguous, too, and that ambiguity can be just as problematic.

Doctors, including Leigh Simmons, typically prefer words. Simmons is an internist and part of a group practice that provides primary care at Mass General. “As doctors we tend to often use words like, ‘very small risk,’ ‘very unlikely,’ ‘very rare,’ ‘very likely,’ ‘high risk,’ ” she says.

But those words can be unclear to a patient.

“People may hear ‘small risk,’ and what they hear is very different from what I’ve got in my mind,” she says. “Or what’s a very small risk to me, it’s a very big deal to you if it’s happened to a family member.

Intelligence analysts have sometimes tried to remove that ambiguity by standardizing the language they use to convey likelihoods, most famously in Sherman Kent’s “Words of Estimative Probability.” It’s not clear to me, though, how effective this approach is. For one thing, consumers are often lazy about trying to understand just what information they’re being given, and templates like Kent’s don’t automatically solve that problem. This laziness came across most clearly in NPR’s Risk and Reason segment on meteorology (here). Many of us routinely consume probabilistic forecasts of rainfall and make decisions in response to them, but it turns out that few of us understand what those forecasts actually mean. With Kent’s words of estimative probability, I suspect that many readers of the products that use them haven’t memorized the table that spells out their meaning and don’t bother to consult it when they come across those phrases, even when it’s reproduced in the same document.

Equally important, a template that works well for some situations won’t necessarily work for all. I’m thinking in particular of forecasts on the kinds of low-probability, high-impact events that I usually analyze and that are essential to the CIA’s work, too. Here, what look like small differences in probability can sometimes be very meaningful. For example, imagine that it’s August 2001 and you’ve three different assessments of the risk of a major terrorist attack on U.S. soil in the next few months. One pegs the risk at 1 in 1,000; another at 1 in 100; and another at 1 in 10. Using Kent’s table, all three of those assessments would get translated into a statement that the event is “almost certainly not” going to happen, but I imagine that most U.S. decision-makers would have felt very differently about risks of 0.1%, 1%, and 10% with a threat of that kind.

There are lots of rare but important events that inhabit this corner of the probability space: nuclear accidents, extreme weather events, medical treatments, and mass atrocities, to name a few. We could create a separate lexicon for assessments in these areas, as the European Medicines Agency has done for adverse reactions to medical therapies (here, via NPR). I worry, though, that we ask too much of consumers of these and other forecasts if we expect them to remember multiple lexicons and to correctly code-switch between them. We also know that the relevant scale will differ across audiences, even on the same topic. For example, an individual patient considering a medical treatment might not care much about the difference between a mortality risk of 1 in 1,000 and 1 in 10,000, but a drug company and the regulators charged with overseeing them hopefully do.

If there’s a general lesson here, it’s that producers of probabilistic forecasts should think carefully about how best to convey their estimates to specific audiences. In practice, that means thinking about the nature of the decision processes those forecasts are meant to inform and, if possible, trying different approaches and checking to find out how each is being understood. Ideally, consumers of those forecasts should also take time to try to educate themselves on what they’re getting. I’m not optimistic that many will do that, but we should at least make it as easy as possible for them to do so.

11 Comments

Rex Brynen

Having spent some time working in an intelligence assessment unit where some divisions used numerical probabilities and others did not, let me offer a few quick thoughts.

First, I do think there are problems with using numbers in the final versions of assessments provided to decision-makers for exactly the reasons you cite about—a false aura of confidence and precision. For this reason, the use of approved probabilistic terms (likely, very likely, etc is the better way to go. Much the FVEY intelligence community does this, although they do not necessarily use the same terms to mean exactly the same things. One should also be aware that the research is quite clear that not everyone interprets terms such as “likely” in similar ways, which is why many agencies include a brief statement of terms/probabilities within a report (you’ll find this with many NIEs, for example).

That being said, I do think there is great value in having analysis include probability statements in the drafting stage. It tends to make it easier to determine whether the linguistic thrust of the report matches the author’s mental probability map. It makes it easier to identify cases where the overall chance of nested, contingent probabilities are incorrectly stated: for example, if A is likely (70%), and if that happens B is likely (70%), and if so then C is likely (70%), it is the case of course that the overall probability of C happening is unlikely (34%), a fact that analysts don’t always clearly express.

With regard to low-probability, high-impact events, that term is itself often used in intelligence report and works fine in my view. “Game-changer” has crept in for events that could have broad, cascading effects (over-used, I think), and “black swan” sometimes makes an appearance (but shouldn’t in my view, because people do not understand the term in the same way or necessarily use it the way Taleb did).

Words have a huge psychological advantage but in banking where we had a culture of using words to avoid using probability the issue that sprung up was that of ambiguity – what does ‘low risk’ really mean? Some would say ‘low risk’ is 0.01% probability of default over the course of a loan, others might say 5%, indeed ratings agencies change these definitions over time. The point with numbers is that anyone can easily see that e.g. 25% probality of failure of one metal beam is worse than the other beam which has 5% probability of failure, assuming the probability model is the same. It’s just that there is a culture of anti-knowledge out there where the idea of knowledge and education being useful is alien to many who get into their field without really using much of their degree (if they have one).

With meaning of numbers maybe one solution is that, when learning probability, kids are required to play out a probabilistic event after doing the calculation e.g. a ten sided dice which rolls numbers 1-10 can show them the reality of what “p=0.6″ means, by requiring that they roll and see how often the dice rolls a 6 or less. It’s like a more practical (and less lethal version) of the Schrodinger cat experiment. Come to think of it, I never had so much as one project or class ask me to do something like that – I starting doing a programming version of this when I found probability models looked a bit odd.

That paper’s concluding section includes this sentence: “Risk and uncertainty communication is a process that is even more complex than expected.”

A quick search on Google Scholar suggests that Karl Halvor Teigen has done quite a bit of interesting work on this topic over the past few decades, and as Jon Baron pointed out, that paper’s bibliography is a rich source of leads.

I look forward to getting from this body of research a better sense of what we do know so far about how this process works.

I remember first seeing the use of “likely” “very likely” or “not likely” in the 2007 Intergovernmental Panel on Climage Change (IPCC) report and thinking it was strange, but now it has become a normalcy. Ultimately I think one of the benefits of using “likely” versus using a numerical example is that it avoids some of the confusing misconceptions about p-values, such as the idea that a lower p-value means something is more likely to happen. Hopefully some good standards are established for using words such as “likely” or “unlikely” though, or other problems might arise.