Forecasting Round-Up No. 6

The latest in a very occasional series.

1. The Boston Globe ran a story a few days ago about a company that’s developing algorithms to predict which patients in cardiac intensive care units are most likely to take a turn for the worse (here). The point of this exercise is to help doctors and nurses allocate their time and resources more efficiently and, ideally, to give them more lead time to try to stop those bad turns from happening.

The story suffers some rhetorical tics common to press reports on “predictive analytics.” For example, we never hear any specifics about the analytic techniques used or the predictive accuracy of the tool, and the descriptions of machine learning tilt toward the ingenuous (e.g., “The more data fed into the model, the more accurate the prediction becomes”). On the whole, though, I think this article does a nice job representing the promise and reality of this kind of work. The following passage especially resonated with me, because it describes a process for applying these predictions that sounds like the one I have in mind when building my own forecasting tools:

The unit’s medical director, Dr. Melvin Almodovar, uses [the prediction tool] to double-check his own clinical assessment of patients. Etiometry’s founders are careful to note that physicians will always be the ultimate bedside decision makers, using the Stability Index to confirm or inform their own diagnoses.

Butler said that an information-overload environment like the intensive care unit is ideal for a data-driven risk assessment tool, because the patients teeter between life and death. A predictive model can act as an early warning system, pointing out risky changes in multiple vital signs in a more sophisticated way than bedside alarms.

When our predictive models aren’t as accurate as we’d like or don’t yet have a clear track record, this hybrid approach—decisions are informed by the forecasts but not determined by them—is a prudent way to go. In the cardiac intensive care unit, doctors are already applying their own mental models to these data, so the idea of developing explicit algorithms to do the same isn’t a stretch (or shouldn’t be, but…). Unlike those doctors, though, statistical models won’t suffer from low blood sugar or distraction or become emotionally attached to some patients but not others. Also unlike the mental models doctors use now, statistical models will produce explicit forecasts that can be collected and assessed over time. The resulting feedback will give the stats guys many opportunities to improve their models, and the hospital staff a chance to get a feel for the models’ strengths and limitations. When you’re making such weighty decisions, why wouldn’t you want that additional information?

2. Lyle Ungar recently discussed forecasting with the Machine Intelligence Research Institute (here). The whole thing deserves a read, but I especially liked this framework for thinking about when different methods work best:

I think one can roughly characterize forecasting problems into categories—each requiring different forecasting methods—based, in part, on how much historical data is available.

Some problems, like the geo-political forecasting [the Good Judgment Project is] doing, require lots collection of information and human thought. Prediction markets and team-based forecasts both work well for sifting through the conflicting information about international events. Computer models mostly don’t work as well here—there isn’t a long enough track records of, say, elections or coups in Mali to fit a good statistical model, and it isn’t obvious what other countries are ‘similar.’

Other problems, like predicting energy usage in a given city on a given day, are well suited to statistical models (including neural nets). We know the factors that matter (day of the week, holiday or not, weather, and overall trends), and we have thousands of days of historical observation. Human intuition is not as going to beat computers on that problem.

Yet other classes of problems, like economic forecasting (what will the GDP of Germany be next year? What will unemployment in California be in two years) are somewhere in the middle. One can build big econometric models, but there is still human judgement about the factors that go into them. (What if Merkel changes her mind or Greece suddenly adopts austerity measures?) We don’t have enough historical data to accurately predict economic decisions of politicians.

The bottom line is that if you have lots of data and the world isn’t changing to much, you can use statistical methods. For questions with more uncertain, human experts become more important.

I might disagree on the particular problem of forecasting coups in Mali, but I think the basic framework that Lyle proposes is right.

3. Speaking of the Good Judgment Project (GJP), a bevy of its researchers, including Ungar, have an article in the March 2014 issue of Psychological Science (here) that shows how certain behavioral interventions can significantly boost the accuracy of forecasts derived from subjective judgments. Here’s the abstract:

Five university-based research groups competed to recruit forecasters, elicit their predictions, and aggregate those predictions to assign the most accurate probabilities to events in a 2-year geopolitical forecasting tournament. Our group tested and found support for three psychological drivers of accuracy: training, teaming, and tracking. Probability training corrected cognitive biases, encouraged forecasters to use reference classes, and provided forecasters with heuristics, such as averaging when multiple estimates were available. Teaming allowed forecasters to share information and discuss the rationales behind their beliefs. Tracking placed the highest performers (top 2% from Year 1) in elite teams that worked together. Results showed that probability training, team collaboration, and tracking improved both calibration and resolution. Forecasting is often viewed as a statistical problem, but forecasts can be improved with behavioral interventions. Training, teaming, and tracking are psychological interventions that dramatically increased the accuracy of forecasts. Statistical algorithms (reported elsewhere) improved the accuracy of the aggregation. Putting both statistics and psychology to work produced the best forecasts 2 years in a row.

Speaking of which: If you know something about conflict or atrocities risk or a particular part of the world and are interested in volunteering as a forecaster, please send an email to ewp@ushmm.org.

4. Finally, Daniel Little writes about the partial predictability of social upheaval on his terrific blog, Understanding Society (here). The whole post deserves reading, but here’s the nub (emphasis in the original):

Take unexpected moments of popular uprising—for example, the Arab Spring uprisings or the 2013 riots in Stockholm. Are these best understood as random events, the predictable result of long-running processes, or something else? My preferred answer is something else—in particular, conjunctural intersections of independent streams of causal processes (link). So riots in London or Stockholm are neither fully predictable nor chaotic and random.

This matches my sense of the problem and helps explain why predictive models of these events will never be as accurate as we might like but are still useful, as are properly elicited and combined forecasts from people using their noggins.

2 Comments

Will

For better and for worse, our system has tracked this case pretty well. Our statistical risk assessments identified South Sudan as one of the world’s highest-risk cases in 2013. When fighting broke out in December, we posted a question to our opinion pool asking about the likelihood of an episode of mass killing there before 2015. The forecasters immediately pegged the probability at about 85 percent and it’s stuck there since, inching up a bit more over the past week.