“Kill the Quants?”: Why risk analysis fails

Catastrophes as diverse as Hurricane Katrina and the financial crisis point to a series of common mistakes in utilizing risk analysis and risk management.

By Douglas A. Samuelson

Should we “kill all the quants?”

The financial crisis, Hurricane Katrina and the BP oil spill have aroused many claims that quantitative risk analysis failed. Even fairly sober scientific reviews seriously criticized models in general [Scientific American, 2008] or called for a major change in their use [Lo, 2009]. Douglas W. Hubbard’s book on failures of risk management [2009] has been a popular hit, raising the most intriguing question of why modelers don’t usually apply quantitative measures to how well their models worked. (He and this reporter followed up on his main points in an article for OR/MS Today [Hubbard and Samuelson, 2010]). Even more popular and more trenchant is Nassim Taleb’s “The Black Swan,” based on the assertion that the most important events are the “unknown unknown” ones, beyond any modeling method’s scope [Taleb, 2007]. These critiques raise major questions about the uses and limitations of risk analyses, and about the ways in which managers expect to use them.

First of all, we can surely agree that the examples cited here, among others, do indicate major deficiencies in predicting and managing risk. The BP spill left many people asking, “Why didn’t we know what could happen?” The responsible state and federal agencies lacked in-house expertise and apparently ignored a pattern of risky practice. BP underestimated the risks and under-prepared. There were some genuine modeling errors and questionable assessment of predictions the models did make.

Katrina and its aftermath showed a similar pattern. The state and federal governments repeatedly postponed repairs to the levees around New Orleans, overruling strong recommendations from their engineers. After 9/11, the federal government shifted much of its in-house expertise from natural disasters to terrorism. The analyses that were conducted focused more on the danger from a storm surge up the Mississippi River than on the danger of a “back-side” surge from Lake Pontchartrain, on the other side of the city – neglecting the critical fact that repair ships can navigate the river but could not traverse the lake, rendering the main repair plans ineffective for the actual event. The problems were greatly compounded by uncertainty about who was responsible for what.

The financial crisis featured both models that were wrong and models that were right but were ignored by managers. Modeling errors vied with deliberate misstatements. Federal agencies deferred to the expertise and presumed non-malevolence of private firms.

In all these cases and a number of others, some common elements emerge:

unduly limiting assumptions in the analyses;

over-reliance on inappropriate theory;

underestimating probabilities and effects of rare events;

excessive trust in markets;

over-specialization and resistance among disciplines;

insufficient empiricism about assessing quantitative methods;

insufficient attention to availability and quality of data; and

unclear responsibility and accountability for failures.

1. Unduly limiting assumptions. Most creditors have relied for many years on statistical credit scores, most famously the FICO score almost universally used for residential mortgages. For obvious reasons, credit scores rely entirely on borrowers’ recent experience. Hence, in usual pre-2008 times, which saw no downturns, a borrower with high utilization of available credit probably was someone who tended to spend too much and kept too little reserve. As the housing market spiraled downward, however, people who had used consumer credit and home equity lines as ready reserve for small businesses got squeezed, delaying payments to other small business who in turn got squeezed – and suddenly a large proportion of the populace looked high-risk to the scoring models.

This is one example of a general principle: When circumstances change substantially, models are liable to deteriorate not only because variables take on values different from the observations on which the model was based, but also because the relationships among variables change. For instance, a model of crowd behavior that works very well in normal situations may fail badly when, in a crisis, everyone wants to rush to a limited set of exits. In a crisis, usually uncorrelated behaviors become correlated – in the case of the financial system, urgent selling.

In the BP case, it is not clear that anyone potentially responsible for problems asked whether the geologic structure was unusual or how much difference it made that the well was 5,000 feet under water. In the planning for Katrina, evidently no one considered how people without cars would heed an order to evacuate after the buses stopped running.

2. Over-reliance on inappropriate theory. Some years ago, as a relatively junior federal policy analyst, this reporter was one of several reviewers of a study of estimating oil spill risks from tankers. The researchers who had done the study recommended having the Coast Guard require reports of all oil spills, no matter how small, for three years, to enable more precise estimation of the distribution. (As it was, spills under five gallons were exempted.) I pointed out the difference in what causes different-sized spills and asked, “How much do small spills tell you about large spills?” The lead researcher replied in a patronizing tone, “You don’t understand statistics.” Obviously he was oblivious to the assumptions implicit in treating the data as part of one data set – and prepared to follow those assumptions to any conclusion whatsoever.

The researcher in this story was a biologist, but economists are prone to the same fallacious thinking. Alan Greenspan admitted that one cause of the financial crisis was his assumption that private firms would not sacrifice their own long-term interests and those of the country for short-term profits. It would not be surprising if federal regulatory officials made similar assumptions about BP’s likelihood of sacrificing safety for short-term profit. Some litigation now in process alleges regulatory lapses, again relying on mine operators’ desire to keep their mines operating long-term, in the West Virginia coal mine collapse early last year.

3. Excessive trust in markets. A special and important case of over-reliance on theory is over-reliance on markets. William Kahn, who was director of the risk management modeling group at FNMA (Fannie Mae) in 2008, told of how the CEO believed the market price, not the model’s, until FNMA was $50 billion “underwater” in its risk pricing of residential mortgages. Economists readily concede that markets are myopic and tend not to give sufficient weight to sketchy information. Also, the assumption of approximately equal information among parties, critical to market equilibrium theory, may not hold in practice, and it is hard to pin down whether it does.

4. Underestimating probabilities and effects of rare events. It is well known that known rare events tend to be overlooked entirely and under-predicted when they are considered. Some locations have had three or four “hundred-year floods” in the past century, and some financial markets have experienced “once-in-a-generation” swoops and drops every five to 10 years or so. In models heavily reliant on data, the temptation to downplay rare events is, if anything, accentuated, as the presence of some solid data discourages speculation about the unknown. Computing confidence intervals usually relies on distributional assumptions that may not be applicable and definitely relies on having accounted for all sources of variation – which usually has not happened. Confidence intervals computed only on sampling variation without other sources of randomness overstate precision and can easily mislead even the knowledgeable about the risk from rare events.

Hubbard [2009], in a survey of 60 firms that used some form of quantitative models, found that most experts asked to produce 90 percent confidence intervals came nowhere near getting 90 percent of the correct values within these intervals. Training and feedback can greatly improve these experts’ assessments of uncertainty – particularly when the training includes requiring the experts to bet, with real money, on their predictions. “If you won’t bet on it at 9-to-1 odds, it’s not a 90 percent interval” has a profound educational effect.

5. Insufficient empiricism about assessing quantitative methods. In his survey of users of risk analysis methods, Hubbard found that several popular techniques increase comfort far more than they improve actual results. Among the more prominent examples, he cited balanced scorecards and the analytical hierarchy process (AHP).

This is not the occasion for arguments about whether these methods were applied as designed and still produced the disappointing results he found. In any event, he urges persuasively, both producers and users of models should insist on assessing how accurate the models’ predictions were and evaluating subsequent efforts by the same modelers accordingly.

6. Insufficient attention to availability and quality of data. Good models require good data. Nevertheless, it is not at all difficult to find instances in which the data required were either unavailable or not good. One example from this reporter’s experience is hospital emergency department diagnostic codes, which tended to have little association with the patients’ eventual diagnosis and treatment.

In assessing a model, it is useful to ask how much effort was spent on data quality and whose responsibility that was. Even when good data were available for developing the model, it is also important to ascertain whether key data series are available sufficiently ahead of time to make forecasts. This reporter remembers with regret a contractor’s fine model of the price and supply of chromium that turned out to be critically dependent on the amounts purchased under a few federal contracts. These purchase figures were readily available for past years, but future such purchases were classified, as they indicated intended construction of certain high-performance military aircraft.

Modeling often demonstrates a need for more data, but additional data collection after model development is underway seems to be done rather infrequently. When it is done, modelers tend to collect more of the kinds of data they can get easily, which often means more of what they already have. Hubbard suggests that collection should be guided by the expected value of perfect information (EVPI): how much better could we do if we had this measurement precisely? Often the most useful data to collect will be a very few sketchy observations about a variable we know next to nothing about.

7. Over-specialization and resistance among disciplines. Scientific specialization has become a major and growing problem. In risk analysis, it appears to be the driving factor behind many of the other shortcomings.

In the FNMA example, for instance, the CEO trusted conventional economics over risk models. The BP crisis highlighted the long-established reluctance of economists, geologists and risk analysts to talk to each other, and the propensity to talk past each other when conversations do take place. This is not new: the Forrester Meadows World III model, the basis for the book “The Limits to Growth” that ignited a hot policy debate in the mid-1970s, omitted all price effects, so economists simply dismissed it, while geologists discussing resource limitations dismiss econometric models that assume prices dominate.

In the early days of credit scoring, in the 1960s, pioneering Fair, Isaac and Company (now FICO) frequently encountered resistance to letting models override loan officers’ judgment. Before that, they first tried offering, free of charge, to model risks of medical ailments, such as heart attacks, and got no takers at all from the doctors they approached. Most OR/MS practitioners have run into this kind of interdisciplinary resistance many times; some, regrettably, provide such resistance themselves.

8. Unclear responsibility and accountability for failures. Central to all these stories of shortcomings, as well, is lack of clarity about who is accountable for failures, whether of analysis, planning, reaction to changing events or carrying out needed actions. Both managers and analysts could have done better. Redesigning organizational structures, incentives and information handling to do better in crises is another topic for another occasion, but it is critical to future improvement.

Everyone has blind spots, from individuals to large organizations and entire professions. Over-specialization exacerbates the problem. In trying to anticipate “unknown unknowns,” breadth of vision and consideration becomes critical. One expert group [NRC, 2008] suggested that all models of complex social phenomena should be checked by scenario experts, domain experts, modelers and users to minimize the chance of something known to anyone available being overlooked by the other people involved in making the decisions. Strategic gaming [Samuelson, December 2009] is one of the most effective ways to elicit alternative assumptions and sketch their likely effects. More empiricism in evaluating models and the information on which they are based is also clearly warranted. Both analysts and managers need to challenge modeling assumptions much more diligently. In short, risk analyses remain useful, but we can all do better, and the best way forward is to help each other improve.

Douglas A. Samuelson (samuelsondoug@yahoo.com) is president and chief scientist of InfoLogix, Inc., an R&D and consulting company in Annandale, Va. He is a frequent contributor to Analytics and OR/MS Today.

Headlines

In the nearly 60 years between the 1939 release of Hollywood’s first full-length animated movie, “Snow White and the Seven Dwarfs” and modern hits like “Toy Story,” “Shrek” and more, advances in animation technology have revolutionized not only animation techniques, but moviemaking as a whole. However, a new study in the INFORMS journal Organization Science found that employing the latest technology doesn’t always ensure creative success for a film. Read more →

INFORMS selected a diverse group of six finalists for the 47th annual Franz Edelman Award for Achievements in Operations Research and Management Science, the world’s most prestigious award for achievement in the practice of analytics and O.R. The 2018 finalists, who will present their work before a panel of judges at the INFORMS Conference on Analytics & Operations Research in Baltimore on April 15-17, included innovative applications in broadcasting, healthcare, communication, inventory management, vehicle fleet management and alternative energy. Read more →

On Feb. 4, more than 40 percent of U.S. households will watch the 2018 Super Bowl game on TV. Advertisers will pay up to $4 million for a 30-second spot during the telecast. Is the high cost of advertising worth it? A new study finds that the benefits from Super Bowl ads persist well into the year with increased sales during other sporting events. Further, the research finds that the gains in sales are much more substantial when the advertiser is the sole advertiser from its market category or niche in a particular event. Read more →