Incoherence & Mattson

People are often incoherent: their probabilities don't add to 100%. We get an 18% gain in accuracy if we coherentize their estimates. But we get a 30% gain in accuracy if we also assign more weight to coherent estimates.

What implications does this have for making subjective probability maps?

The following figure is from a recent paper I co-authored*:

Figure from Karvetski et al. 2014 showing we get more accuracy by ignoring incoherent estimates than by simple unweighted averages. (The unfortunately abbreviated 'BS' means 'Brier Score'. Lower is better, with 0 being perfect.)

What implications does it have for making subjective "consensus" probability maps at the start of a search?

David Mandel has just blogged a summary of our recent Decision Analysis paper**. In laboratory conditions on general knowledge questions, we found:

People are often incoherent: their probabilities don't add to 100%.

We get an 18% gain in accuracy if we coherentize their estimates.

But we get a 30% gain in accuracy if we also assign more weight to coherent estimates.

Suppose this applies to making subjective probability maps -- and we don't know that it does. Recall the original "Mattson" consensus asks everyone to put probabilities in each region. People are bad at this. Often they don't add to 100%, so people correct by getting to about 80% and then just dividing the rest. So we have invented decision aids to help them. The Proportional method says use whatever numbers you like, and normalizes. The O'Connor method uses verbal cues (A = "very likely" ... I = "very unlikely"), then gives each letter a number 9..1, and uses the Proportional method. Etc.

I tend to fall in the "use whatever method you like best" camp. Usually that means Proportional or O'Connor. But if our result applies to making subjective probability maps, it would suggest:

Share this:

Author: ctwardy

Charles Twardy started the SARBayes project at Monash University in 2000. Work at Monash included SORAL, the Australian Lost Person Behavior Study, AGM-SAR, and Probability Mapper. At George Mason University, he added the MapScore project and related work. More generally, he works on evidence and inference with a special interest in causal models, Bayesian networks, and Bayesian search theory,
especially the analysis and prediction of lost person behavior.
From 2011-2015, Charles led the DAGGRE & SciCast combinatorial prediction market projects at George Mason University, and has recently joined NTVI Federal as a data scientist supporting the Defense Suicide Prevention Office.
Charles received a Dual Ph.D. in History & Philosophy of Science and Cognitive Science from Indiana University, followed by a postdoc in machine learning at Monash.
View all posts by ctwardy

It's a generalization of normalization to any related set of probabilities: find the coherent set of probabilities that is the least different in terms of squared deviation from the ones provided to us.

I prefer the Proportional method but O'Connor is ok however it doesn't offer the resolution you can get with Proportional method. As you stated people are not good at appropriately distributing probability for a fixed range (Mattson) particularly if the distribution must be divided over many regions. I am speaking anecdotally here, but if someone were asked to divide the probability over two or three decisions (regions) they could probably do a good job. But asked to divide it over 7 or more, I think it gets difficult to maintain coherence. I imagine someone has done a study to determine the optimal number of hypotheses (regions in the case of SAR) in order to maintain coherence. Of course it will vary by situation/individual but it would be interesting to see if there is a statistically preferred number of "decisions" in order to maintain coherence.

In the business world and the realm of incident command, practitioners talk about "span of control". Recommended values are all of the board (http://www.economist.com/node/14301444). Within the context of Incident Command, the recommended value is 3 - 7 with 5 being optimal (http://training.fema.gov/EMIWeb/is/ICSResource/assets/reviewMaterials.pdf). Maybe this would also apply to maintaining coherence in decision (regions) making as well. If a reviewer were asked to limit decisions to 5 as opposed to 12 would they be able to maintain coherence (as defined in the blog posting)?

With regards to SAR Regions of Probability in situations when there are a large number of regions, we may want to compartmentalize the regions so the review is only asked to review no more than 5 at a time.

That's if we want them to be coherent. The radical idea here is that we might want to allow people to be incoherent. If worse forecasters are also less coherent (or lose coherence faster), let them show their hand, and correct for that by giving them less weight.

Mind you, this could be a really bad idea. I wouldn't try it in the field just yet.