The blog of Ashish Jha — physician, health policy researcher, and advocate for the notion that an ounce of data is worth a thousand pounds of opinion.

Correlation, Causation, and Gender Differences in Patient Outcomes

Our recent paper on differences in outcomes for Medicare patients cared for by male and female physicians has created a stir. While the paper has gotten broadcoverage and mostly positive responses, there have also been quite a few critiques. There is no doubt that the study raises questions that need to be aired and discussed openly and honestly. Its limitations, which are highlighted in the paper itself, are important. Given the temptation we all feel to overgeneralize, we do best when we stick with the data. It’s worth highlighting a few of the more common critiques that have been lobbed at the study to see whether they make sense and how we might move forward. Hopefully by addressing these more surface-level critiques we can shift our focus to the important questions raised by this paper.

Correlation is not causation

We all know that correlation is not causation. Its epidemiology 101. People who carry matches are more likely to get lung cancer. Going to bed with your shoes on is associated with higher likelihood of waking up with a headache. No, matches don’t cause lung cancer any more than sleeping with your shoes on causes headaches. Correlation, not causation. Seems straightforward and it has been a consistent critique of this paper. The argument is that because we had an observational study – that is, not an experiment where we proactively, randomly assigned millions of Americans to male versus female doctors – all we have is an association study. To have a causal study, we’d need a randomized, controlled trial. In an ideal world, this would be great, but unfortunately in the real world, this is impractical…and even unnecessary. We often make causal inferences based on observational data – and here’s the kicker: sometimes, we should. Think smoking and lung cancer. Remember the RCT that assigned people to smoking (versus not) to see if it really caused lung cancer? Me neither…because it never happened. So, if you are a strict “correlation is not causation” person who thinks observational data only create hypotheses that need to be tested using RCTs, you should only feel comfortable stating that smoking is associated with lung cancer but it’s only a hypothesis for which we await an RCT. That’s silly. Smoking causes lung cancer.

Why correlation can be causation

How can we be so certain that smoking causes lung cancer based on observational data alone? Because there are several good frameworks that help us evaluate whether a correlation is likely to be causal. They include presence of a dose-response relationship, plausible mechanism, corroborating evidence, and absence of alternative explanations, among others. Let’s evaluate these in light of the gender paper. Dose-response relationship? That’s a tough one – we examine self-identified gender as a binary variable…the survey did not ask physicians how manly the men were. So that doesn’t help us either way. Plausible mechanism and corroborating evidence? Actually, there is some here – there are now over a dozen studies that have examined how men and women physicians practice, with reasonable evidence that they practice a little differently. Women tend to be somewhat more evidence-based and communicate more effectively. Given this evidence, it seems pretty reasonable to predict that women physicians may have better outcomes.

The final issue – alternative explanations – has been brought up by nearly every critic. There must be an alternative explanation! There must be confounding! But the critics have mostly failed to come up with what a plausible confounder could be. Remember, a variable, in order to be a confounder, must be correlated both with the predictor (gender) and outcome (mortality). We spent over a year working on this paper, trying to think of confounders that might explain our findings. Every time we came up with something, we tried to account for it in our models. No, our models aren’t perfect. Of course, there could still be confounders that we missed. We are imperfect researchers. But that confounder would have to be big enough to explain about a half a percentage point mortality difference, and that’s not trivial. So I ask the critics to help us identify this missing confounder that explains better outcomes for women physicians.

Statistical versus clinical significance

One more issue warrants a comment. Several critics have brought up the point that statistical significance and clinical significance are not the same thing. This too is epidemiology 101. Something can be statistically significant but clinically irrelevant. Is a 0.43 percentage point difference in mortality rate clinically important? This is not a scientific or a statistical question. This is a clinical question. A policy and public health question. And people can reasonably disagree. From a public health point of view, a 0.43 percentage point difference in mortality for Medicare beneficiaries admitted for medical conditions translates into potentially 32,000 additional deaths. You might decide that this is not clinically important. I think it is. It’s a judgment call and we can disagree.

Ours is the first big national study to look at outcome differences between male and female physicians. I’m sure there will be more. This is one study – and the arc of science is such that no study gets it 100% right. New data will emerge that will refine our estimates and of course, it’s possible that better data may even prove our study wrong. Smarter people than me – or even my very smart co-authors – will find flaws in our study and use empirical data to help us elucidate these issues further, and that will be good. That’s how science progresses. Through facts, data, and specific critiques. “Correlation is not causation” might be epidemiology 101, but if we get stuck on epidemiology 101, we’d be unsure whether smoking causes lung cancer. We can do better. We should look at the totality of the evidence. We should think about plausibility. And if we choose to reject clear results, such as women internists have better outcomes, we should have concrete, testable, alternative hypotheses. That’s what we learn in epidemiology 102.