The blog of Ashish Jha — physician, health policy researcher, and advocate for the notion that an ounce of data is worth a thousand pounds of opinion.

Category Archives: Uncategorized

Austin Frakt and Aaron Carroll recently approached me about a New York Times UpShot piece aiming to rank eight healthcare systems they had chosen: Australia, Canada, France, Germany, Singapore, Switzerland, the United Kingdom, and the United States. This forced me to think about a pretty fundamental question: what do we want from a healthcare system?

What Does an Ideal Healthcare System Look Like?

I would argue that most people want a healthcare system where they can get timely access to high quality, affordable care and one that also promotes innovation of new tests and treatments. But underlying these sentiments are a lot of important issues that need unpacking. First, what does it mean to be able to access care when you need it? A simple way to think about this is being able to see a doctor (or other healthcare professional) quickly and easily and in cases where there are follow-on tests, procedures, and treatments, you can get them without much delay. This brings up one important point: while experts often discount the importance of timeliness, regular people generally don’t: anyone who has waited weeks or months for a follow-up after an abnormal test result or to get a needed surgery knows that waiting times are not just an inconvenience. Delayed access can be stressful, agonizing and in some instances, downright harmful.

Beyond access, of course, we want care we can afford. Almost all of us need some sort of insurance that would pay for an unexpected, catastrophic healthcare expense (like spending a few weeks in an ICU). Most of us need some sort of financial coverage for other, slightly less expensive services such as an MRI or a knee replacement. And even still, some of us will struggle to pay for simpler things like doctors’ visits and need financial help there as well. There is broad consensus that we want a healthcare system where people aren’t denied the services they need because they can’t afford them.

While accessibility, timeliness, and affordability are key, there are other aspects of care that get less attention but are just as important: we want care that is safe and effective and produces the best outcomes possible. It’s great if you can have timely cardiac surgery and pay little or nothing out of pocket. But if you die unnecessarily from a preventable error, you didn’t get what you needed from the healthcare system.

Finally, we want a healthcare system that creates new knowledge so that we get better at caring for sick people. One of my earliest memories of medical school was caring for a young woman, an artist with two small children, who died of a complication of chronic myelogenous leukemia after a bone marrow transplant. Today, her disease could have been managed by a simple, daily pill that has turned CML into a chronic, yet manageable disease. A system that generates new therapies that save lives is critical and its importance is often overlooked when assessing health system performance.

Health System Organization

So what is the ideal way to organize a healthcare system to accomplish these goals? One school of thought believes that market-based systems are the solution because they rely on competition, customize care for individuals, keep prices down, and allow the highest quality providers to flourish. For others, the answer is a government-run, single-payer system where everyone has equal access, gets comparable quality, and patients don’t have to worry about costs because the government takes care of it. While either approach can be supported with selected data and facts, as I have looked at health systems from around the globe, one theme becomes obvious: systems organized very differently can achieve comparable levels of performance and no single approach consistently outperforms others.

So which countries have the best systems? As the UpShot piece outlines, we did a tournament-style competition where in each round, we had to pick winners and losers. At the end, we were also asked to rank the selected 8 countries based on our overall assessment. To do so, my approach was simple. Health systems should be judged not by how they are organized (i.e. markets or government) but what they produce. How well does it do what a healthcare system ought to do? So that’s the approach I took.

Evaluating Health Systems

That leads us to the next question: what metrics should we use? If you made it to the first day of a health policy 101 class, you learned about two metrics: per capita spending and life expectancy. If you made it to the second class, you learned that unfortunately, these are far too crude to tell you much about health system performance and do not help generate an actionable set of policy prescriptions. Health care spending is driven by many factors, including what is encompassed in spending calculations (research and development? medical education?) and prices (if one country pays its nurses half as much as another – does that mean the first is twice as efficient?). Life expectancy is even more complicated as it is driven in large part by behavior, lifestyle, and genetics of the underlying population. As Irene Papanicolas and I point out in our recent JAMA piece, drawing these boundaries when comparing healthcare systems is important.

So if we can’t just look at those metrics, what else should we examine? While one could evaluate literally hundreds of metrics, I prioritized 16 (see Table 1).

None of these are perfect but they seemed reasonable to me – a few on access, quality, cost and innovation. Ultimately, I was interested in assessing performance in areas that are clearly within the purview of the healthcare system – how many people are covered and covered for what? How quickly can you see someone when you’re sick? How good is the system at taking care of you when something terrible happens, like you have a stroke or a heart attack? Does the system generate lots of innovation so that everyone’s care gets better over the time? I tried not to overly weigh any one of these but tried to look at them holistically.

My Rankings

Based on these measures (for country data, see Table 2), my ranking of the selected health systems is as follows:

Switzerland

Germany

U.S.A.

U.K.

France

Australia

Canada

Singapore

A few caveats. First, these are all very good healthcare systems – and we’re generally comparing systems that are far superior to much of the rest of the world. Second, there was rarely a clear winner in head to head competitions. Switzerland and Germany both have excellent systems and reasonable people could draw a different conclusion from the same data. I struggled among the U.S., France, Australia, and the U.K., all of which had clear strengths and clear challenges. Singapore lagged behind in large part because there is so little data about their performance and lack of data means it might be better than it looks, or it could be worse. I just don’t know.

The ranking of the U.S. above places like France and the UK may be surprising. Some people will point, rightly, to the fact that the U.S. has the highest spending in the world yet still has people who are uninsured. The healthcare spending problem of the U.S. is largely a political choice – we have extraordinarily high prices on everything from physician salaries to pharmaceuticals. While some of these high prices may spur innovation (i.e. pharmaceuticals), the cost of spending nearly 20% of our GDP on healthcare means less money for everything else. We could do better with different policy choices.

On the issue of universal coverage, things are a bit more complicated. While its narrowly true that the U.S. is the only country here without universal coverage, it’s too simplistic. First, 91% of Americans are now insured (thanks in part to the ACA). Some countries have universal coverage for their citizens but not necessarily for immigrants or other groups. Second, it is important to consider what is actually covered. While most Americans can get access to the latest treatments, in many countries, access to the most expensive therapies can be difficult or non-existent. I don’t know if we will get to 100% coverage but we are inching towards it and I hope that with the next set of policy reforms, we can get into the high 90’s. And that would be good.

Finally, if you take a big step back and look at the data, Americans do better than average in timely access, especially to specialty services and “elective” surgery (which is often not that elective). They tend to be among the leaders in acute care quality, when healthcare means the difference between life and death, although the quality of primary care could surely be better. And America is the innovation engine of the world, pumping out new drugs and treatments that benefit the whole world. All of that earns America a high rank in my book – behind Switzerland and Germany but ahead of others. You can disagree but overall, while the U.S. healthcare system has a lot of work ahead, we should not overlook its strengths – and they are sizeable.

So here’s the big picture: when it comes time to measure health system performance, it’s important to think about boundaries (what is the responsibility of the healthcare system and what isn’t). It’s also important to consider whether the system is delivering what people need: coverage of a broad range of services, especially those that are important for the sickest among us, timely access to affordable, high quality care, and innovation that ensures care gets better over time. For most people, whether the system is market-based or government-run matters a lot less than whether it’s meeting their needs. And that’s the way it should be.

How much does it matter which hospital you go to? Of course, it matters a lot – hospitals vary enormously on quality of care, and choosing the right hospital can mean the difference between life and death. The problem is that it’s hard for most people to know how to choose. Useful data on patient outcomes remain hard to find, and even though Medicare provides data on patient mortality for select conditions on their Hospital Compare website, those mortality rates are calculated and reported in ways that make nearly every hospital look average.

Some people select to receive their care at teaching hospitals. Studies in the 1990s and early 2000s found that teaching hospitalsperformed better, but there was also evidence that they were more expensive. As “quality” metrics exploded, teaching hospitals often found themselves on the wrong end of the performance stick with more hospital-acquired conditions and more readmissions. In nearly every national pay-for-performance scheme, they seemed to be doing worse than average, not better. In an era focused on high-value care, the narrative has increasingly become that teaching hospitals are not any better – just moreexpensive.

But is this true? On the one measure that matters most to patients when it comes to hospital care – whether you live or die – are teaching hospitals truly no better or possibly worse? About a year ago, that was the conversation I had with a brilliant junior colleague, Laura Burke. When we scoured the literature, we found that there had been no recent, broad-based examination of patient outcomes at teaching versus non-teaching hospitals. So we decided to take this on.

As we plotted how we might do this, we realized that to do it well, we would need funding. But who would fund a study examining outcomes at teaching versus non-teaching hospitals? We thought about NIH but knew that was not a realistic possibility – they are unlikely to fund such a study and even if they did, it would take years to get the funding. There are also some excellent foundations, but they are small and therefore, focus on specific areas. Next, we considered asking the Association of American Medical Colleges (AAMC). We know these colleagues well and knew they would be interested in the question. But we also knew that for some people – those who see the world through the “conflict of interest” lens – any finding funded by AAMC would be quickly dismissed, especially if we found that teaching hospitals were better.

Setting up the rules of the road

As we discussed funding with AAMC, we set up some basic rules of the road. Actually, Harvard requires these rules if we receive a grant from any agency. As with all our research, we would maintain complete editorial independence. We would decide on the analytic plan and make decisions about modeling, presentation, and writing of the manuscript. We offered to share our findings with AAMC (as we do with all funders), but we were clear that if we found that teaching hospitals were in fact no better (or worse), we would publish those results. AAMC took a leap of faith knowing that they might be funding a study that casts teaching hospitals in a bad light. The AAMC leadership told me that if teaching hospitals are not providing better care, they wanted to know – they wanted an independent assessment of their performance using meaningful metrics.

Our approach

Our approach was simple. We examined 30-day mortality (the most important measure of hospital quality) and extended our analysis to also examine 90 days (to see if differences between teaching and non-teaching hospitals persisted over time). We built our main models, but in the back of my mind, I knew that no matter which choices we made, some people would question them as biased. Thus, we ran a lot of sensitivity analyses, looking at shorter-term outcomes (7 days), models with and without transferred patients, within various hospital size categories, and with various specification of how one even defines teaching status. Finally, we included volume in our models to see if volume of patients seen was driving differences in outcomes.

The one result that we found consistently across every model and using nearly every approach was that teaching hospitals were doing better. They had lower mortality rates overall, across medical and surgical conditions, and across nearly every single individual condition. And the findings held true all the way out to 90 days.

What our findings mean

This is the first broad, post-ACA study examining outcomes at teaching hospitals, and for the fans of teaching hospitals, this is good news. The mortality differences between teaching and non-teaching hospitals is clinically substantial: for every 67 to 84 patients that go to a major teaching hospital (as opposed to a non-teaching hospital), you save one life. That is a big effect.

Should patients only go to teaching hospitals though? That is wholly unrealistic, and these are only average effects. Many community hospitals are excellent and provide care that is as good if not superior to teaching institutions. Lacking other information when deciding where to receive care, patients do better on average at teaching institutions.

Way forward

There are several lessons from our work that can help us move forward in a constructive way. First, given that most hospitals in the U.S. are non-teaching institutions, we need to think about how to help those hospitals improve. The follow-up work needs to delve into why teaching hospitals are doing better, and how can we replicate and spread that to other hospitals. This strikes me as an important next step. Second, can we work on our transparency and public reporting programs so that hospital differences are distinguishable to patients? As I have written, we are doing transparency wrong, and one of the casualties is that it is hard for a community hospital that performs very well to stand out. Finally, we need to fix our pay-for-performance programs to emphasize what matters to patients. And for most patients, avoiding death remains near the top of the list.

Final thoughts on conflict of interest

For some people, these findings will not matter because the study was funded by “industry.” That is unfortunate. The easiest and laziest way to dismiss a study is to invoke conflict of interest. This is part of the broader trend of deciding what is real versus fake news, based on the messenger (as opposed to the message). And while conflicts of interest are real, they are also complicated. I often disagree with AAMC and have publicly battled with them. Despite that, they were bold enough to support this work, and while I will continue to disagree with them on some key policy issues, I am grateful that they took a chance on us. For those who can’t see past the funders, I would ask them to go one step further – point to the flaws in our work. Explain how one might have, untainted by funding, done the work differently. And most importantly – try to replicate the study. Because beyond the “COI,” we all want the truth on whether teaching hospitals have better outcomes or not. Ultimately, the truth does not care what motivated the study or who funded it.

Our recent paper on differences in outcomes for Medicare patients cared for by male and female physicians has created a stir. While the paper has gotten broadcoverage and mostly positive responses, there have also been quite a few critiques. There is no doubt that the study raises questions that need to be aired and discussed openly and honestly. Its limitations, which are highlighted in the paper itself, are important. Given the temptation we all feel to overgeneralize, we do best when we stick with the data. It’s worth highlighting a few of the more common critiques that have been lobbed at the study to see whether they make sense and how we might move forward. Hopefully by addressing these more surface-level critiques we can shift our focus to the important questions raised by this paper.

Correlation is not causation

We all know that correlation is not causation. Its epidemiology 101. People who carry matches are more likely to get lung cancer. Going to bed with your shoes on is associated with higher likelihood of waking up with a headache. No, matches don’t cause lung cancer any more than sleeping with your shoes on causes headaches. Correlation, not causation. Seems straightforward and it has been a consistent critique of this paper. The argument is that because we had an observational study – that is, not an experiment where we proactively, randomly assigned millions of Americans to male versus female doctors – all we have is an association study. To have a causal study, we’d need a randomized, controlled trial. In an ideal world, this would be great, but unfortunately in the real world, this is impractical…and even unnecessary. We often make causal inferences based on observational data – and here’s the kicker: sometimes, we should. Think smoking and lung cancer. Remember the RCT that assigned people to smoking (versus not) to see if it really caused lung cancer? Me neither…because it never happened. So, if you are a strict “correlation is not causation” person who thinks observational data only create hypotheses that need to be tested using RCTs, you should only feel comfortable stating that smoking is associated with lung cancer but it’s only a hypothesis for which we await an RCT. That’s silly. Smoking causes lung cancer.

Why correlation can be causation

How can we be so certain that smoking causes lung cancer based on observational data alone? Because there are several good frameworks that help us evaluate whether a correlation is likely to be causal. They include presence of a dose-response relationship, plausible mechanism, corroborating evidence, and absence of alternative explanations, among others. Let’s evaluate these in light of the gender paper. Dose-response relationship? That’s a tough one – we examine self-identified gender as a binary variable…the survey did not ask physicians how manly the men were. So that doesn’t help us either way. Plausible mechanism and corroborating evidence? Actually, there is some here – there are now over a dozen studies that have examined how men and women physicians practice, with reasonable evidence that they practice a little differently. Women tend to be somewhat more evidence-based and communicate more effectively. Given this evidence, it seems pretty reasonable to predict that women physicians may have better outcomes.

The final issue – alternative explanations – has been brought up by nearly every critic. There must be an alternative explanation! There must be confounding! But the critics have mostly failed to come up with what a plausible confounder could be. Remember, a variable, in order to be a confounder, must be correlated both with the predictor (gender) and outcome (mortality). We spent over a year working on this paper, trying to think of confounders that might explain our findings. Every time we came up with something, we tried to account for it in our models. No, our models aren’t perfect. Of course, there could still be confounders that we missed. We are imperfect researchers. But that confounder would have to be big enough to explain about a half a percentage point mortality difference, and that’s not trivial. So I ask the critics to help us identify this missing confounder that explains better outcomes for women physicians.

Statistical versus clinical significance

One more issue warrants a comment. Several critics have brought up the point that statistical significance and clinical significance are not the same thing. This too is epidemiology 101. Something can be statistically significant but clinically irrelevant. Is a 0.43 percentage point difference in mortality rate clinically important? This is not a scientific or a statistical question. This is a clinical question. A policy and public health question. And people can reasonably disagree. From a public health point of view, a 0.43 percentage point difference in mortality for Medicare beneficiaries admitted for medical conditions translates into potentially 32,000 additional deaths. You might decide that this is not clinically important. I think it is. It’s a judgment call and we can disagree.

Ours is the first big national study to look at outcome differences between male and female physicians. I’m sure there will be more. This is one study – and the arc of science is such that no study gets it 100% right. New data will emerge that will refine our estimates and of course, it’s possible that better data may even prove our study wrong. Smarter people than me – or even my very smart co-authors – will find flaws in our study and use empirical data to help us elucidate these issues further, and that will be good. That’s how science progresses. Through facts, data, and specific critiques. “Correlation is not causation” might be epidemiology 101, but if we get stuck on epidemiology 101, we’d be unsure whether smoking causes lung cancer. We can do better. We should look at the totality of the evidence. We should think about plausibility. And if we choose to reject clear results, such as women internists have better outcomes, we should have concrete, testable, alternative hypotheses. That’s what we learn in epidemiology 102.

About a year ago, Yusuke Tsugawa – then a doctoral student in the Harvard health policy PhD program – and I were discussing the evidence around the quality of care delivered by female and male doctors. The data suggested that women practice medicine a little differently than men do. It appeared that practice patterns of female physicians were a little more evidence-based, sticking more closely to clinical guidelines. There was also some evidence that patients reported better experience when their physician was a woman. This is certainly important, but the evidence here was limited to a few specific settings or in subgroups of patients. And we had no idea whether these differences translated into what patients care the most about: better outcomes. We decided to tackle this question – do female physicians achieve different outcomes than male physicians. The result of that work is out today in JAMA Internal Medicine.

Our approach

First, we examined differences in patient outcomes for female and male physicians across all medical conditions. Then, we adjusted for patient and physician characteristics. Next, we threw in a hospital “fixed-effect” – a statistical technique that ensures that we only compare male and female physicians within the same hospital. Finally, we did a series of additional analyses to check if our results held across more specific conditions.

We found that female physicians had lower 30-day mortality rates compared to male physicians. Holding patient, physician, and hospital characteristics constant narrowed that gap a little, but not much. After throwing everything into the model that we could, we were still left with a difference of about 0.43 percentage points (see table), a modest but clinically important difference (more on this below).

Next, we focused on the 8 most common conditions (to ensure that our findings weren’t driven by differences in a few conditions only) and found that across all 8 conditions, female physicians had better outcomes. Finally, we looked at subgroups by risk. We wondered – is the advantage of having a female physician still true if we just focus on the sickest patients? The answer is yes – in fact, the biggest gap in outcomes was among the very sickest patients. The sicker you are, the bigger the benefit of having a female physician (see figure).

Additionally, we did a variety of other “sensitivity” analyses, of which the most important focused on hospitalists. The biggest threat to any study that examines differences between physicians is selection – patients can choose their doctor (or doctors can choose their patients) in ways that make the groups of patients non-comparable. However, when patients are hospitalized for an acute illness, increasingly, they receive care from a “hospitalist” – a doctor who spends all of their clinical time in the hospital caring for whoever is admitted during their shift. This allows for “pseudo-randomization.” And the results? Again, female hospitalists had lower mortality than male hospitalists.

What does this all mean?

The first question everyone will ask is whether the size of the effect matters. I am going to reiterate what I said above – the effect size is modest, but important. If we take a public health perspective, we see why it’s important: Given our results, if male physicians had the same outcomes as female physicians, we’d have 32,000 fewer deaths in the Medicare population. That’s about how many people die in motor vehicle accidents every year. Second, imagine a new treatment that lowered 30-day mortality by about half a percentage point for hospitalized patients. Would that treatment get FDA approval for effectiveness? Yup. Would it quickly become widely adopted in the hospital wards as an important treatment we should be giving our patients? Absolutely. So while the effect size is not huge, it’s certainly not trivial.

A few things are worth noting. First, we looked at medical conditions, so we can’t tell you whether the same effects would show up if you looked at surgeons. We are working on that now. Second, with any observational study, one has to be cautious about over-calling it. The problem is that we will never have a randomized trial so this may be about as well as we can do. Further, for those who worry about “confounding” – that we may be missing some key variable that explains the difference – I wonder what that might be? If there are key missing confounders, it would have to be big enough to explain our findings. We spent a lot of time on this – and couldn’t come up with anything that would be big enough to explain what we found.

How to make sense of it all – and next steps

Our findings suggest that there’s something about the way female physicians are practicing that is different from the way male physicians are practicing – and different in ways that impact whether a patient survives his or her hospitalization. We need to figure out what that is. Is it that female physicians are more evidence-based, as a few studies suggest? Or is it that there are differences in how female and male providers communicate with patients and other providers that allow female physicians to be more effective? We don’t know, but we need to find out and learn from it.

Another important point must be addressed. There is pretty strong evidence of a substantial gender pay gap and a gender promotion gap within medicine. Several recent studies have found that women physicians are paid less than male physicians – about 10% less after accounting for all potential confounders – and are less likely to promoted within academic medical centers. Throw in our study about better outcomes, and those differences in salary and promotion become particularly unconscionable.

The bottom line is this: When it comes to medical conditions, women physicians seem to be outperforming male physicians. The difference is small but important. If we want this study to be more than just a source of cocktail conversation, we need to learn more about why these differences exist so all patients have better outcomes, irrespective of the gender of their physician.

The link in the tweet is to a press release. The link in the press release citing more details is to another press release. There’s little in the way of analysis or data about how ACOs did in 2015. So I decided to do a quick examination of how ACOs are doing and share the results below.

Basic background on ACOs:

Simply put, an ACO is a group of providers that is responsible for the costs of caring for a population while hitting some basic quality metrics. This model is meant to save money by better coordinating care. As I’ve written before, I’m a pretty big fan of the idea – I think it sets up the right incentives and if an organization does a good job, they should be able to save money for Medicare and get some of those savings back themselves.

ACOs come in two main flavors: Pioneers and Medicare Shared Savings Program (MSSP). Pioneers were a small group of relatively large organizations that embarked on the ACO pathway early (as the name implies). The Pioneer program started with 32 organizations and only 12 remained in 2015. It remains a relatively small part of the ACO effort and for the purposes of this discussion, I won’t focus on it further. The other flavor is MSSP. As of 2016, the program has more than 400 organizations participating and as opposed to Pioneers, has been growing by leaps and bounds. It’s the dominant ACO program – and it too comes in many sub-flavors, some of which I will touch on briefly below.

A couple more quick facts: MSSP essentially started in 2012 so for those ACOs that have been there from the beginning, we now have 4 years of results. Each year, the program has added more organizations (while losing a small number). In 2015, for instance, they added an additional 89 organizations.

So last week, when CMS announced having saved more than $1B from MSSPs, it appeared to be a big deal. After struggling to find the underlying data, Aneesh Chopra (former Chief Technology Officer for the US government) tweeted the link to me:

You can download the excel file and analyze the data on your own. I did some very simple stuff. It’s largely consistent with the CMS press release, but as you might imagine, the press release cherry picked the findings – not a big surprise given that it’s CMS’s goal to paint the best possible picture of how ACOs are doing.

While there are dozens of interesting questions about the latest ACO results, here are 5 quick questions that I thought were worth answering:

How many organizations saved money and how many organizations spent more than expected?

How much money did the winners (those that saved money) actually save and how much money did the losers (those that lost money) actually lose?

How much of the difference between winners and losers was due to differences in actual spending versus differences in benchmarks (the targets that CMS has set for the organization)?

Given that we have to give out bonus payments to those that saved money, how did CMS (and by extension, American taxpayers) do? All in, did we come out ahead by having the ACO program in 2015 – and if yes, by how much?

Are ACOs that have been in the program longer doing better? This is particularly important if you believe (as Andy Slavitt has tweeted) that it takes a while to make the changes necessary to lower spending.

There are a ton of other interesting questions about ACOs that I will explore in a future blog, including looking at issues around quality of care. Right now, as a quick look, I just focused on those 5 questions.

and ran some pretty basic frequencies. Here are data for the 392 ACOs for whom CMS reported results:

Question 1: How many ACOs came in under (or over) target

Question 2: How much did the winners save – and how much did the losers lose?

Table 1.

Number (%)

Number of Beneficiaries

Total Savings (Losses)

Winners

203 (51.8%)

3,572,193

$1,568,222,249

Losers

189 (48.2%)

3,698,040

-$1,138,967,553

Total

392 (100%)

7,270,233

$429,254,696

I define winners as those organizations that spent less than their benchmark. Losers were organizations that spent more than their benchmarks.

Take away – about half the organizations lost money and about half the organizations made money. If you are a pessimist, you’d say, this is what we’d expect; by random chance alone, if the ACOs did nothing, you’d expect half to make money and half to lose money. However, if you are an optimist, you might argue that 51.8% is more than 48.2% and it looks like the tilt is towards more organizations saving money and the winners saved more money than the losers lost.

Next, we go to benchmarks (or targets) versus actual performance. Reminder that benchmarks were set based on historical spending patterns – though CMS will now include regional spending as part of their formula in the future.

Question 3: Did the winners spend less than the losers – or did they just have higher benchmarks to compare themselves against?

Table 2.

Per Capita Benchmark

Per Capita Actual Spending

Per Capita Savings (Losses)

Winners (n=203)

$10,580

$10,140

$439

Losers (n=189)

$9,601

$9,909

-$308

Total (n=392)

$10,082

$10,023

$59

A few thoughts on table 2. First, the winners actually spent more money, per capita, then the losers. They also had much higher benchmarks – maybe because they had sicker patients – or maybe because they’ve historically been high spenders. Either way, it appears that the benchmark matters a lot when it comes to saving money or losing money.

Next, we tackle the question from the perspective of the U.S. taxpayer. Did CMS come out ahead or behind? Well – that should be an easy question – the program seemed to net savings. However, remember that CMS had to share some of those savings back with the provider organizations. And because almost every organization is in a 1-sided risk sharing program (i.e. they don’t share losses, just the gains), CMS pays out when organizations save money – but doesn’t get money back when organizations lose money. So to be fair, from the taxpayer perspective, we have to look at the cost of the program including the checks CMS wrote to ACOs to figure out what happened. Here’s that table:

Table 3 (these numbers are rounded).

Total Benchmarks

Total Actual Spending

Savings to CMS

Paid out in Shared Savings to ACOs

Net impact to CMS

Total (n=392)

$73,298 m

$72,868 m

$429 m

$645 m

-$216 m

According to this calculation, CMS actually lost $216 million in 2015. This, of course, doesn’t take into account the cost of running the program. Because most of the MSSP participants are in a one-sided track, CMS has to pay back some of the savings – but never shares in the losses it suffers when ACOs over-spend. This is a bad deal for CMS – and as long as programs stay 1-sided, barring dramatic improvements in how much ACOs save — CMS will continue to lose money.

Finally, we look at whether savings have varied by year of enrollment.

Question #5: Are ACOs that have been in the program longer doing better?

Table 4.

Enrollment Year

Per Capita Benchmark

Per Capita Actual Spending

Per Capita Savings

Net Per Capita Savings (Including bonus payments)

2012

$10,394

$10,197

$197

$46

2013

$10,034

$10,009

$25

–$60

2014

$10,057

$10,086

-$29

-$83

2015

$9,772

$9,752

$19

-$33

These results are straightforward – almost all the savings are coming from the 2012 cohort. A few things worth pointing out. First, the actual spending of the 2012 cohort is also the highest – they just had the highest benchmarks. The 2013-2015 cohorts look about the same. So if you are pessimistic about ACOs – you’d say that the 2012 cohort was a self-selected group of high-spending providers who got in early and because of their high benchmarks, are enjoying the savings. Their results are not generalizable. However, if you are optimistic about ACOs, you’d see these results differently – you might argue that it takes about 3 to 4 years to really retool healthcare services – which is why only the 2012 ACOs have done well. Give the later cohorts more time and we will see real gains.

Final Thoughts:

This is decidedly mixed news for the ACO program. I’ve been hopeful that ACOs had the right set of incentives and enough flexibility to really begin to move the needle on costs. It is now four years into the program and the results have not been a home run. For those of us who are fans of ACOs, there are three things that should sustain our hope. First, overall, the ACOs seem to be coming in under target, albeit just slightly (about 0.6% below target in 2015) and generating savings (as long as you don’t count what CMS pays back to ACOs). Second, the longer standing ACOs are doing better and maybe that portends good things for the future – or maybe it’s just a self-selected group that with experience that isn’t generalizable. And finally, and this is the most important issue of all — we have to continue to move towards getting all these organizations into a two-sided model where CMS can recoup some of the losses. Right now, we have a classic “heads – ACO wins, tails – CMS loses” situation and it simply isn’t financially sustainable. Senior policymakers need to continue to push ACOs into a two-sided model, where they can share in savings but also have to pay back losses. Barring that, there is little reason to think that ACOs will bend the cost curve in a meaningful way.

Because hospitals are expensive and often cause harm, there has been a big focus on reducing hospital use. This focus has been the underpinning for numerous policy interventions, most notable of which is the Affordable Care Act’s Hospital Readmissions Reduction Program (HRRP), which penalizes hospitals for higher than expected readmission rates. The motivation behind HRRP is simple: the readmission rate, the proportion of discharged patients who return to the hospital within 30 days, had been more or less flat for years and reducing this rate would save money and potentially improve care. So it was big news when, as the HRRP penalties kicked in, government officials started reporting that the national readmission rate for Medicare patients was declining.

Rising Use of Observation Status

But during this time, another phenomenon was coming into focus: increasing use of observation status. When a patient needs hospital services, there are two options: that patient can be admitted for inpatient care or can be “admitted to observation”. When patients are “admitted to observation” they essentially still get inpatient care, but technically, they are outpatients. For a variety of reasons, we’ve seen a decline in patients admitted to “inpatient” status and a rise in those going to observation status. These two phenomena – a drop in readmissions and an increase in observation – seemed related.

I – and others – spoke publicly about our concerns that the drop in readmissions was being driven by increasing observation admissions. An analysis by David Himmelstein and Steffie Woolhandler in the Health Affairs blog suggested that most of the drop in readmissions could be accounted for both by increases in observation status and by increases in returns to the emergency department that did not lead to readmission. Two months later, a piece by Claire Noel-Miller and Keith Lund, also in the Health Affairs blog, found that the hospitals with the biggest drop in readmissions appeared to have big increases in their use of observation status. It seemed like much of the drop in readmissions was about reclassifying people as “observation” and administratively lowering readmissions without changing care.

New Data

Now comes a terrific, high quality study in the New England Journal of Medicine that takes this topic head on. The authors examine directly whether the hospitals that lowered their readmission rates were the same ones that increased their observation status – and find no correlation. None. If you’re ever looking for a scatter plot of two variables that are completely uncorrelated, look no further than Figure 3 of the paper. The best reading of the evidence prior to the study did not turn out to be the truth. It reminds me of the period we were all convinced, based on excellent observational data, that hormone replacement therapy was lifesaving for women with cardiovascular disease. And that became the standard of care – until someone conducted a randomized trial, and found that HRT provided little benefit to these patients. That’s why we do research – it moves our knowledge forward.

Where are we now?

So where does this leave us? Is the ACA’s readmissions policy a home run? Here’s what we know: the HRRP has, most likely (we have no controls) led to fewer patients being readmitted to the hospital. Second, the HRRP does not seem responsible for the increase in observation stays.

Here’s what we don’t know: is a drop in readmissions a good thing for patients? It may seem obvious that it is but if you think about it, you realize that readmission rate is a utilization measure, not a patient outcome. It’s a measure of how often patients use inpatient services within 30 days of discharge. Utilization measures, unto themselves, don’t tell you whether care is good or bad. So the real question is — has the HRRP improved the underlying quality of care? It might be that we have improved on care coordination, communications between hospitals and primary care providers, and ensuring good follow-up. That likely happened in some places. Alternatively, it might be that we have just made it much harder for that older, frail woman with heart failure sitting in the emergency room to get admitted if she was discharged in the last 30 days. That too has likely happened in some places. But how much of it is the former versus the latter? Until we can answer that question, we won’t know whether care is better or not.

Beyond understanding why readmissions have fallen, we also don’t know how HRRP has affected the other things that hospitals ought to focus on, such as mortality and infection rates. If your parent was admitted to the hospital with pneumonia, what would be your top priority? Most people would say that they would like their parent not to die. The second might be to avoid serious complications like a healthcare associated infection or a fall that leads to a hip fracture. Another might be to be treated with dignity and respect. Yes, avoiding being readmitted would be nice – but for me at least, it pales in comparison to avoiding death and disability. We know little about the potential spillover effects of the readmission penalties on the things that matter the most.

So here we are – a good news study that says readmissions are down because fewer people are being readmitted to the hospital, not because people are being admitted to observation status. That’s important. But the real challenge is in figuring out whether patients are better off. Are they more likely to be alive after hospitalization? Do they have fewer functional limitations? Less pain and suffering? Until we answer those questions, it’ll be hard to know whether this policy is making the kind of difference we want. And that’s the point of science – using data to answer those questions. Because we all can have our opinions – but ultimately, it’s the data that counts.

For more than a decade, those in health care have known that serious safety problems were present in our health systems. Many of the country’s most prominent health care leaders have done their best to make safety improvement a priority. Patients and families have shared their stories, pleading for more attention to the errors that have become so commonplace. Published research has shown dramatic improvements can occur when the culture of a hospital changes in a way that prioritizes safety and reduction of harm. Many physicians and hospitals have stepped up to these changes, demonstrating that striking improvement can occur. But much of our health care system has not made these improvements, and in many cases has not been transparent with consumers about these risks.

In early July Consumer Reports published its first Hospital Safety Ratings of more than 1,159 hospitals in 44 states across the U.S. Hospital errors and mistakes contribute to the deaths of 180,000 hospital patients a year, according to 2010 figures from the Department of Health and Human Services. And another 1.4 million patients on Medicare are seriously hurt by their hospital care.

But hospital safety information is difficult to come by; the U.S. government doesn’t track it the same way it does say, automobile crashes, and it becomes especially difficult for consumers to know and interpret how well hospitals in their community are doing. Our safety ratings covered just 18 percent of all U.S. hospitals, but included some of the largest and best-known hospital systems. The ratings were focused on six, safety-related categories: infections, readmissions, communication, CT scanning, complications, and mortality. We did not rate hospitals based on how much a consumer liked being there or not, or other hospital experiences they had, or even the benefit a consumer experienced by being at a certain hospital. We focused entirely on safety.

Our safety composite score was developed after several years of working with publicly-reported hospital quality data. Besides using data that was reported to federal or state governments, we selected measures where there was enough information to include as many hospitals as possible, areas that had the greatest effect on patient safety, measures that were focused on outcomes, instead of processes, and that were valid and reliable, as assessed by our internal statisticians and our external experts. All of the information we published was the most currently available and most has been reported for sufficient time for hospitals to implement improvement efforts. Finally, we looked at areas where consumers could take some sort of action to protect themselves. We realize that our safety composite measures some aspects of safety but not all aspects. We are working on developing more safety measures to add to the composite. We envision this work as a long term journey not a set destination.

Why did we focus entirely on safety? Because the amount of accidental harm inflicted upon patients in a hospital setting is nearly epidemic, and in many cases, almost entirely avoidable. We believe that safety should be a top priority for hospitals, and to underscore that point, we set a high bar to determine the cut-offs that defined our Ratings. We included risk adjustments in several areas. But we do not believe making more extensive adjustments in the data is appropriate, especially when an error can always be prevented. For example, in the case of hospital infections, we believe that zero errors is a reasonable goal. And we believe that is the case for other events too, such as pressure sores.

We acknowledge that this challenged hospitals and researchers. Yet given the slow rate of system-wide improvement in safety and errors, we think it is appropriate to stimulate new approaches and more critical thinking.

We think that good science is done by independent teams who are transparent in their methodology and use data accessible to all. We have had multiple interactions, conversations, and presentations involving hospitals, and we know that many hospitals would disagree with our decisions and the subsequent ratings. While we consulted multiple experts and researchers, and reviewed multiple studies, we developed our safety composite independent of any other rating effort, research team or existing strategy. We made all the final decisions.

We think the best science includes input from those who are most likely to benefit and those most at risk. We are fortunate to have input from consumers who understand the benefits and risk of health care. We have urged consumers to use all resources available, including other publications, to assess the benefits and risks of a health care intervention. Our priority was always—and remains—a focus on what can best improve the health of, and reduce the harm to, consumers.

Here’s some stories from my summer reading list. In one story, a woman is stopped by security at the airport when the metal detector goes off. Security guards can find no cause for this and eventually let her board the flight. But this makes her wonder, so she calls her doctor. Later an X-Ray shows a metal retractor in her abdomen, which is a surgical instrument the size of a crowbar somehow left in her body after recent surgery.

Here’s another story in the same book. We’ve all laughed about doctor’s handwriting, and wondered how pharmacists learn to read those scribbles. In this story, a doctor in a hospital handwrote a prescription. The prescription is misread because of the handwriting. As a direct result, the patient dies.

You won’t find this book at the beach, though it’s every bit as harrowing as anything Dean Koontz or Stephen King could dream up: I found these stories in a premier textbook on patient safety: Understanding Patient Safety, by Dr. Robert Wachter. Throw away your memories of dry, highlighter-burned textbooks from your school days, this one is a shocking page turner.

Chilling as the patient stories are, here’s what really raises the goosebumps on your arm: these stories are all true, and worse, they aren’t particularly unusual. There is one medication error per day per hospitalized patient—more in the ICU. One in four Medicare patients admitted to a hospital suffer some form of unintended harm during their stay. An estimated 6000 never events—like that retained retractor–happen every month to Medicare beneficiaries in the U.S.

Despite this, Dr. Wachter’s textbook is oddly reassuring, because the book bursts with 450+ pages of solutions–great ideas, excellent case studies, well researched protocols, and interesting evidence about what hospitals can successfully do to save lives. We have a wealth of evidence and a plethora of dedicated, motivated people in health care who have already demonstrated they can avoid many of these terrible errors.

But if the problems are clear and the solutions abundant, then why are we still losing so many lives? The employers and other purchaser members of Leapfrog concluded that what’s missing is market pressure. Given the many competing priorities hospitals face in this time of turbulent change, consumers and purchasers need to make clear that safety is the priority. That means consumers need to insist on the importance of safety when they talk with their doctors and nurses, and when possible they should vote with their feet to protect themselves and their families from harm in an unsafe hospital. Purchasers need to structure their contracting and benefits to favor and reward safety.

All of this rests on one critical resource: transparency. Consumers and purchasers need to have information about safety in a format that they can use. That is why Leapfrog, a national nonprofit with a membership of employers and other purchasers, launched the Hospital Safety Score.

The Leapfrog Board modeled the Score on the restaurant safety inspection policies recently enacted in Los Angeles and New York City. In those cities, health department inspectors give restaurants a letter grade rating their safety, and restaurants are required to post the grade on their front entryway. Within one year of implementation, a poll found that two-thirds of New Yorkers consulted the letter grade before choosing a restaurant. Distilling complex data into one comprehensible letter grade clearly helped the dining public, so Leapfrog hopes it might work for the hospitalized public. On June 6, 2012, we issued letter grades rating the safety of over 2600 general hospitals across the country. (www.hospitalsafetyscore.org).

To calculate the score using the best evidence, Leapfrog sought advice from a blue ribbon panel of experts that included nine of the nation’s top researchers in patient safety. Dr. Wachter served on the panel, along with three leading researchers from Harvard, and others from Johns Hopkins, Vanderbilt, Michigan, Stanford, and others. These experts advised Leapfrog on which publicly available measures of safety to consider using, and offered guidance in calculating scores using those measures. Leapfrog considered this advice in calculating the grades for hospitals.

Leapfrog’s Hospital Safety Score focuses exclusively on errors, accidents, injuries, and infections in hospitals—the unintended sometimes deadly events that no patient ever seeks out from a hospital stay. There are many other issues affecting the performance of a hospital that Leapfrog did not consider, such as mortality rates for certain procedures or patient experience reports. Other ratings in other places offer perspectives on those issues for consumers. The Hospital Safety Score rates hospitals on whether they have the procedures and protocols in place to prevent harm and death, as well as the rate of actual harm and death to patients from accidents and errors.

The good news is that hundreds of hospitals are demonstrating excellent performance in safety, and earned an A. But not all hospitals perform as well, and consumers and purchasers deserve to know which is which.

The Hospital Safety Score is one tool among many consumers should use to choose a hospital. We link to other resources on our website, including the Consumer Reports safety ratings and others. Just as consumers can consult several different reviews before making a major purchase like a car, so should consumers consider different views about different aspects of hospital performance. Personally, I will welcome the opportunity to consult a variety of reviews if and when my family faces the critical decision about admission to a hospital. But for me, safety will always come first.

I have met many people who suffered egregious, unnecessary harm in American hospitals, and without exception they tell their story publicly for one reason: to make sure what happened to them doesn’t happen to others. Beneath the Hospital Safety Score and the Consumer Reports rating are the stories of hundreds of thousands of such victims. We calculated our scores because their experience counts.