Machine Learning in Health Care

An NBER conference on "Machine Learning in Health Care" took place on June 4 in Cambridge. Research Associates David M. Cutler and Sendhil Mullainathan, both of Harvard University, and Ziad Obermeyer of Harvard Medical School organized the meeting.
Funding for this conference was made possible, in part, by grant P30AG012810 from the National Institute on Aging. The views expressed in written conference materials or publications and by speakers and moderators do not necessarily reflect the official policies of the Department of Health and Human Services; nor does mention of trade names, commercial practices, or organizations imply endorsement by the U.S. Government.
These researchers' papers were presented and discussed:

Susan Athey, Stanford University and NBER

The Impact of Machine Learning on Economics (Chapter in forthcoming NBER book The Economics of Artificial Intelligence: An Agenda, Ajay K. Agrawal, Joshua Gans, and Avi Goldfarb, editors. Conference held September 13-14, 2017. Forthcoming from University of Chicago Press)

Athey provides an assessment of the early contributions of machine learning to economics, as well as predictions about its future contributions. It begins by briefly overviewing some themes from the literature on machine learning, and then draws some contrasts with traditional approaches to estimating the impact of counterfactual policies in economics. Next, Athey reviews some of the initial "off-the-shelf" applications of machine learning to economics, including applications in analyzing text and images. Athey then describes new types of questions that have been posed surrounding the application of machine learning to policy problems, including "prediction policy problems," as well as considerations of fairness and manipulability. Athey presents some highlights from the emerging econometric literature combining machine learning and causal inference. Finally, Athey overviews a set of broader predictions about the future impact of machine learning on economics, including its impacts on the nature of collaboration, funding, research tools, and research questions.

Low-value health care — care that provides little health benefit relative to its cost — is a central concern for policymakers. Identifying exactly which care is likely to be of low-value ex ante, however, has proven challenging. Mullainathan and Obermeyer apply machine learning tools to study an iconic testing decision, often considered to epitomize low-value care: testing for heart attack (acute coronary syndromes) in the emergency setting. By comparing doctors' decisions to individualized, prospective risk estimates, the researchers show that mis-prediction of risk is a major driver of low-value care, contributing to both over- and under-testing for heart attack. They find a substantial number of patients with very low model-predicted risk ex ante, whom doctors nonetheless decide to test. These tests are low yield, i.e., few patients benefit from interventions to treat heart attack afterwards. Indeed, individualized predictions show that the conventional approach to studying low-value testing — focusing on average, rather than marginal, yield — substantially understates the extent of over-use in the lowest-risk tested patients. So far, this fits with a common view of doctor behavior: over-testing because of financial incentives. But the study also finds evidence of a different kind of mis-prediction: untested patients at high model-predicted risk. Doctors' decisions not to test these patients do not appear to reflect private information: these patients develop serious complications (or death) at remarkably high rates in the weeks after emergency visits. By isolating specific conditions under which emergency patients are quasi-randomly assigned to doctors, the researchers are able to minimize the influence of unobservables. The results suggest that both under-testing and over-testing are prevalent, and that targeting mis-prediction is an important but understudied policy priority.

David C. Chan Jr, Stanford University and NBER, and Jonathan Gruber, MIT and NBER

Triage Judgments in the Emergency Department

Triage nurses prioritize patients in the emergency department (ED), where they give each patient an emergency severity index (ESI) level and determine wait times. Chan and Gruber asked whether triage nurses can impact mortality for ED patients, and how they cause this effect. The researchers used nurse team-day variation as an instrument to assess nurse-day value-added measure. They found that triage mattered for mortality, and when nurses were one standard deviation worse, mortality increased by 10%. The study could explain 75-80% of nurse effect on mortality, but factors varied across hospitals, likely because of the varying ED organization across hospitals. This research could quantify what is it that the "best" triage nurses are doing for ESI and wait times, and potentially improve mortality via an algorithm that guides these triage actions.

Using Big Data and Data Science to Generate Solutions to the Opioid Crisis

The opioid epidemic is a growing problem, but opioids are also a key tool for addressing pain. Preventing first abuse is important because 80% of opioid abusers had a valid prescription prior to first abuse, and 51% of the remainder had a family member with a prescription, with estimated abuse rates of 8-12%. Hastings, Howison, Inman, and Shah used Medicaid data from Rhode Island to look at predictors of opioid or heroin poisoning, abuse, and dependence using gradient boosted trees. About 4% had an adverse event within 5 years, with previous prison time as a particularly strong predictor. To determine the impact of different important variables, the researchers simulated an average patient and then changed variables in order of tree-predicted importance, and plotted the trajectory.

American healthcare relies heavily on private provision of insurance through marketplaces. This system depends on active, informed consumers to discipline the market, but evidence of this behavior is scarce. To choose optimal insurance, a customer must synthesize information on risk, health, and predicted out-of-pocket spending. Traditionally theory has focused on expert agents as a source to correct the market failures due to complexity. An alternative, machine learning and recommendation algorithms may also improve recommendations for picking the optimal plan. Gruber, Handel, Kolstad, and Kina worked with PicWell, who leverage claims data to model spending in Medicare Advantage plans using random forest regressions and combine the output with risk preferences and other tastes to recommend plans. Did skilled agents consistently make good recommendations in the absence of an algorithm? No, moving from the 25th to the 75th percentile of the agent distribution reduced expected consumer cost by $350. There were systematic choice errors and widespread heterogeneity — agents made the same mistake as has been seen for consumers in the literature (e.g. overweighting premiums relative to out-of-pocket cost). Did the introduction of a predictive plan recommendation algorithm improve plan choices? Yes, by saving about $270 per enrolleein expected cost on MAPD as a result of more rational decision-making. Agent performance improved at all skill-levels but the very top performers. After decision-support was available recommendation quality was equivalent across all agents due to the convergence of recommendation quality and the elimination of choice errors, particularly among the lower skilled agents.

Ladhania, Haviland, Sood, and Mehrotra illustrate the methodological opportunities and challenges of using large observational data and statistical machine learning methods to generate hypotheses about subgroups with heterogeneous effects. In one case study with exogenous treatment, the authors find that some of the generated hypotheses hold up and many do not.