The blog of Ashish Jha — physician, health policy researcher, and advocate for the notion that an ounce of data is worth a thousand pounds of opinion.

Monthly Archives: December 2016

Our recent paper on differences in outcomes for Medicare patients cared for by male and female physicians has created a stir. While the paper has gotten broadcoverage and mostly positive responses, there have also been quite a few critiques. There is no doubt that the study raises questions that need to be aired and discussed openly and honestly. Its limitations, which are highlighted in the paper itself, are important. Given the temptation we all feel to overgeneralize, we do best when we stick with the data. It’s worth highlighting a few of the more common critiques that have been lobbed at the study to see whether they make sense and how we might move forward. Hopefully by addressing these more surface-level critiques we can shift our focus to the important questions raised by this paper.

Correlation is not causation

We all know that correlation is not causation. Its epidemiology 101. People who carry matches are more likely to get lung cancer. Going to bed with your shoes on is associated with higher likelihood of waking up with a headache. No, matches don’t cause lung cancer any more than sleeping with your shoes on causes headaches. Correlation, not causation. Seems straightforward and it has been a consistent critique of this paper. The argument is that because we had an observational study – that is, not an experiment where we proactively, randomly assigned millions of Americans to male versus female doctors – all we have is an association study. To have a causal study, we’d need a randomized, controlled trial. In an ideal world, this would be great, but unfortunately in the real world, this is impractical…and even unnecessary. We often make causal inferences based on observational data – and here’s the kicker: sometimes, we should. Think smoking and lung cancer. Remember the RCT that assigned people to smoking (versus not) to see if it really caused lung cancer? Me neither…because it never happened. So, if you are a strict “correlation is not causation” person who thinks observational data only create hypotheses that need to be tested using RCTs, you should only feel comfortable stating that smoking is associated with lung cancer but it’s only a hypothesis for which we await an RCT. That’s silly. Smoking causes lung cancer.

Why correlation can be causation

How can we be so certain that smoking causes lung cancer based on observational data alone? Because there are several good frameworks that help us evaluate whether a correlation is likely to be causal. They include presence of a dose-response relationship, plausible mechanism, corroborating evidence, and absence of alternative explanations, among others. Let’s evaluate these in light of the gender paper. Dose-response relationship? That’s a tough one – we examine self-identified gender as a binary variable…the survey did not ask physicians how manly the men were. So that doesn’t help us either way. Plausible mechanism and corroborating evidence? Actually, there is some here – there are now over a dozen studies that have examined how men and women physicians practice, with reasonable evidence that they practice a little differently. Women tend to be somewhat more evidence-based and communicate more effectively. Given this evidence, it seems pretty reasonable to predict that women physicians may have better outcomes.

The final issue – alternative explanations – has been brought up by nearly every critic. There must be an alternative explanation! There must be confounding! But the critics have mostly failed to come up with what a plausible confounder could be. Remember, a variable, in order to be a confounder, must be correlated both with the predictor (gender) and outcome (mortality). We spent over a year working on this paper, trying to think of confounders that might explain our findings. Every time we came up with something, we tried to account for it in our models. No, our models aren’t perfect. Of course, there could still be confounders that we missed. We are imperfect researchers. But that confounder would have to be big enough to explain about a half a percentage point mortality difference, and that’s not trivial. So I ask the critics to help us identify this missing confounder that explains better outcomes for women physicians.

Statistical versus clinical significance

One more issue warrants a comment. Several critics have brought up the point that statistical significance and clinical significance are not the same thing. This too is epidemiology 101. Something can be statistically significant but clinically irrelevant. Is a 0.43 percentage point difference in mortality rate clinically important? This is not a scientific or a statistical question. This is a clinical question. A policy and public health question. And people can reasonably disagree. From a public health point of view, a 0.43 percentage point difference in mortality for Medicare beneficiaries admitted for medical conditions translates into potentially 32,000 additional deaths. You might decide that this is not clinically important. I think it is. It’s a judgment call and we can disagree.

Ours is the first big national study to look at outcome differences between male and female physicians. I’m sure there will be more. This is one study – and the arc of science is such that no study gets it 100% right. New data will emerge that will refine our estimates and of course, it’s possible that better data may even prove our study wrong. Smarter people than me – or even my very smart co-authors – will find flaws in our study and use empirical data to help us elucidate these issues further, and that will be good. That’s how science progresses. Through facts, data, and specific critiques. “Correlation is not causation” might be epidemiology 101, but if we get stuck on epidemiology 101, we’d be unsure whether smoking causes lung cancer. We can do better. We should look at the totality of the evidence. We should think about plausibility. And if we choose to reject clear results, such as women internists have better outcomes, we should have concrete, testable, alternative hypotheses. That’s what we learn in epidemiology 102.

About a year ago, Yusuke Tsugawa – then a doctoral student in the Harvard health policy PhD program – and I were discussing the evidence around the quality of care delivered by female and male doctors. The data suggested that women practice medicine a little differently than men do. It appeared that practice patterns of female physicians were a little more evidence-based, sticking more closely to clinical guidelines. There was also some evidence that patients reported better experience when their physician was a woman. This is certainly important, but the evidence here was limited to a few specific settings or in subgroups of patients. And we had no idea whether these differences translated into what patients care the most about: better outcomes. We decided to tackle this question – do female physicians achieve different outcomes than male physicians. The result of that work is out today in JAMA Internal Medicine.

Our approach

First, we examined differences in patient outcomes for female and male physicians across all medical conditions. Then, we adjusted for patient and physician characteristics. Next, we threw in a hospital “fixed-effect” – a statistical technique that ensures that we only compare male and female physicians within the same hospital. Finally, we did a series of additional analyses to check if our results held across more specific conditions.

We found that female physicians had lower 30-day mortality rates compared to male physicians. Holding patient, physician, and hospital characteristics constant narrowed that gap a little, but not much. After throwing everything into the model that we could, we were still left with a difference of about 0.43 percentage points (see table), a modest but clinically important difference (more on this below).

Next, we focused on the 8 most common conditions (to ensure that our findings weren’t driven by differences in a few conditions only) and found that across all 8 conditions, female physicians had better outcomes. Finally, we looked at subgroups by risk. We wondered – is the advantage of having a female physician still true if we just focus on the sickest patients? The answer is yes – in fact, the biggest gap in outcomes was among the very sickest patients. The sicker you are, the bigger the benefit of having a female physician (see figure).

Additionally, we did a variety of other “sensitivity” analyses, of which the most important focused on hospitalists. The biggest threat to any study that examines differences between physicians is selection – patients can choose their doctor (or doctors can choose their patients) in ways that make the groups of patients non-comparable. However, when patients are hospitalized for an acute illness, increasingly, they receive care from a “hospitalist” – a doctor who spends all of their clinical time in the hospital caring for whoever is admitted during their shift. This allows for “pseudo-randomization.” And the results? Again, female hospitalists had lower mortality than male hospitalists.

What does this all mean?

The first question everyone will ask is whether the size of the effect matters. I am going to reiterate what I said above – the effect size is modest, but important. If we take a public health perspective, we see why it’s important: Given our results, if male physicians had the same outcomes as female physicians, we’d have 32,000 fewer deaths in the Medicare population. That’s about how many people die in motor vehicle accidents every year. Second, imagine a new treatment that lowered 30-day mortality by about half a percentage point for hospitalized patients. Would that treatment get FDA approval for effectiveness? Yup. Would it quickly become widely adopted in the hospital wards as an important treatment we should be giving our patients? Absolutely. So while the effect size is not huge, it’s certainly not trivial.

A few things are worth noting. First, we looked at medical conditions, so we can’t tell you whether the same effects would show up if you looked at surgeons. We are working on that now. Second, with any observational study, one has to be cautious about over-calling it. The problem is that we will never have a randomized trial so this may be about as well as we can do. Further, for those who worry about “confounding” – that we may be missing some key variable that explains the difference – I wonder what that might be? If there are key missing confounders, it would have to be big enough to explain our findings. We spent a lot of time on this – and couldn’t come up with anything that would be big enough to explain what we found.

How to make sense of it all – and next steps

Our findings suggest that there’s something about the way female physicians are practicing that is different from the way male physicians are practicing – and different in ways that impact whether a patient survives his or her hospitalization. We need to figure out what that is. Is it that female physicians are more evidence-based, as a few studies suggest? Or is it that there are differences in how female and male providers communicate with patients and other providers that allow female physicians to be more effective? We don’t know, but we need to find out and learn from it.

Another important point must be addressed. There is pretty strong evidence of a substantial gender pay gap and a gender promotion gap within medicine. Several recent studies have found that women physicians are paid less than male physicians – about 10% less after accounting for all potential confounders – and are less likely to promoted within academic medical centers. Throw in our study about better outcomes, and those differences in salary and promotion become particularly unconscionable.

The bottom line is this: When it comes to medical conditions, women physicians seem to be outperforming male physicians. The difference is small but important. If we want this study to be more than just a source of cocktail conversation, we need to learn more about why these differences exist so all patients have better outcomes, irrespective of the gender of their physician.