I am most curious about the source of the data. It apparently came from a website called Doximity, which collects data from physicians. Here is a link to the PR release related to this compensation dataset. However, the data is not freely available. There is a claim that this data come from self reports by 36,000 physicians.

I am not sure whether I trust this data. For example:

Do I believe that physicians in North Dakota earn the highest salaries on average in the nation? And not only that, they earn almost 30% more than the average physician in New York. Does the average physician in ND really earn over $400K a year? If you are wondering, the second highest salary number comes from South Dakota. And then Idaho. Also, these high-salary states are correlated with the lowest gender wage gaps.

I suspect that sample size is an issue. They do not report sample size at the level of their analyses. They apparently published statistics at the level of MSAs. There are roughly 400 MSAs in the U.S. so at that level, on average, they have only 90 samples per MSA. When split by gender, the average sample size is less than 50. Then, they are comparing differences, so we should see the standard errors. And finally, they are making hundreds of such comparisons, for which some kind of multiple-comparisons correction is needed.

I am pretty sure some of you are doctors, or work in health care. Do those salary numbers make sense? Are you moving to North/South Dakota?

***

Turning to the Visual corner of the Trifecta Checkup (link), I have a mixed verdict. The hover-over effect showing the precise values at either axes is a nice idea, well executed.

I don't see the point of drawing the circle inside a circle. The wage gap is already on the vertical axis, and the redundant representation in dual circles adds nothing to it. Because of this construct, the size of the bubbles is now encoding the male average salary, taking attention away from the gender gap which is the point of the chart.

I also don't think the regional analysis (conveyed by the colors of the bubbles) is producing a story line.

***

This is another instance of a dubious analysis in this "big data" era. The analyst makes no attempt to correct for self-reporting bias, and works as if the dataset is complete. There is no indication of any concern about sample sizes, after the analyst drills down to finer areas of the dataset. While there are other variables available, such as specialty, and other variables that can be merged in, such as income levels, all of which may explain at least a portion of the gender wage gap, no attempt has been made to incorporate other factors. We are stuck with a bivariate analysis that does not control for any other factors.

Last but not least, the analyst draws a bold conclusion from the overly simplistic analysis. Here, we are told: "If you want that big money, you can't be a woman." (link)

P.S. The Stat News article reports that the researchers at Doximity claimed that they controlled for "hours worked and other factors that might explain the wage gap." However, in Doximity's own report, there is no language confirming how they included the controls.

Comments

Thanks for taking a look at this. I'd guess it's a pretty hard area to collect reliable data on, and the broad message that there is a huge disparity between men and women is probably worth highlighting, even if the data quality is limited at this stage.

The design decisions are definitely questionable though. If you inspect the SVG code for the chart, you can see that the circle areas are a bit weird too. Taking Missouri as an example. The wage gap there is $256k to $364k (42% higher for men). The radius of the two circles is 4.8px for women, and 13.5px for men (so 180% bigger for men). That's bad enough, but the eye processes area more naturally, so bubble charts like this are supposed to use circle area instead of radius. The area of the women's circle is 72.5px, and the men's 568.7px - or 685% bigger. Not sure why this 42% difference is represented with an object that's 685% bigger - or am I missing something?

The data is shocking enough as it is, without having to exaggerate the point!

Adrian: Good to know. I wonder why that would be the case!
Will: Thanks for sending me the graphic. And yes, the encoding sounds really suspicious. The visual form of showing the gap as a donut is very ineffective even without the encoding problem.

I'm a med student and we hear a decent amount about the pay difference in the midwest vs. the coasts. The midwest is mostly fee-for-service (FFS) model of paying and there isn't the same level of saturation as on the coasts and the cities. So if you're doing FFS and you're the only game in town, you can charge whatever you want for any test/imaging you do.