Look what I found: two amazing charts

While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).

How embarrassing is the cost effectiveness of U.S. health care spending?

When a chart is executed well, no further words are necessary.

I'd only add that the other countries depicted are "wealthy nations".

***

Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.

Let's appreciate this beauty:

Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.

Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.

Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.

The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.

"Let the data speak for itself." Except that the graph itself gives no indication of whether or not the 19 other rich countries represent many, most, all, or a few cherry-picked rich countries.

"Recognize what's important, what's not." Apparently, anything other than plain-vanilla length-of-life is not -- not anything that might be contained in data related to child mortality, not health care dollars relative to medan HH income -- nothing but dollars and years.

"Rather than clutter up the chart, the other 19 lines are anonymized." See above. The flipside of 'anonymized' is 'nondescript to the point of being immune to follow-up research.'

"The axis labeling is superb" This is, frankly, false. Spending per capita per *what*? Not per lifespan -- being born costs > $7K in pretty much every 'rich' country. So the x-axis, i am not exaggerating when i say that i have no idea what it represents. I'm not playing obtuse. I don't know what that 2,000 is.

Look -- I get it, and I agree: end-of-life spending on health care in the US has no relation to a return in lifespan. Or maybe what these charts say that i agree with is that yearly spending in the US doesn't result in higher life expectancy. The point, whatever it is, is well-taken: US health spending is stupid, compared to what we get for it.

But these are *terrible* charts. They're bludgeons, built to serve an agenda other than providing transparency into the data on which they rest.

Are those *all* rich countries? Are there rates attached to the movement of the lines in the second chart? Cause that matters, and its completely invisible here. Also, are the $ over time adjusted by floating, point-in-time exchange rates, or, as the note seems to imply, just a conversion to $ based on, I guess, now? Also, totally matters -- the variance in slope might be nothing more than a vestige of devaluation of the dollar.

How were the 19 other countries chosen? What's rich? Are there others that were left out? Why? What happens to those life-expectancy lines if we control for things that aren't related to health-care expenditure?

But mostly, these charts are the END of a data discussion, not the BEGINNING of one, and I stand firmly on the side of 'charts don't make facts, the World makes facts'. Charts are no substitute for data or its analysis, and these pictures don't help me decide what to look at next. And because of that, they're infographics at best and propaganda at worst.

Sorry. Maybe I'm cranky, but I doubt it since I just had yakatori, which might have another 'i' in it, but since it's a transliteration, i think i'm safe. I just really, really don't like these graphs. I think they make their point very strongly, which would be fine if graphs were for proving points, rather than helping look for the truth.

I am curious what you mean when you say "You can do this in R but that's about it." Do you mean any programming language with a decent graphics package, i.e., that you need to write code to get such a graphic? I lament this too. But if you really do mean that you need R to create this graphics, I can think of a number of languages with graphics support for such visualizations (Python, IDL, Matlab, PV-WAVE, IGOR, not to mention more than a few javascript charting libraries...)

You can do it in Excel, but you have to fiddle with it. Effectively, you have to treat the scales as graph series in their own right; which is not a bad philosophy, but not one that Excel normally encourages. Excel by default treats scales as a support element to graphs, not graphs in themselves.

The effort you have to go to in Excel might be considered "programming" Excel to do this, which needn't be more onerous than the programming you have to do in R to achieve the same result.

The unnecessary physical contact of the two orthogonal scales in the graphs above makes me think of Cleveland, not Tufte. Tufte would have have them separated by a space gap.

Henry: One hopes you held the yakatori down, it's unfortunate that you had to look at these charts while eating your dinner.

"But these are *terrible* charts. They're bludgeons, built to serve an agenda other than providing transparency into the data on which they rest."
I'm sorry to have to disclose that there are no "objective" charts just like there is no "objective" journalism. A chart represents data filtered through the designer's point of view. Every chart has a designer so every chart is subjective, even the ones you make yourself. In fact, if you read my blog regularly, you will notice that I consider having a point of view to be one of the most crucial elements of any chart.

"The flipside of 'anonymized' is 'nondescript to the point of being immune to follow-up research."
The data set behind these charts is accessible. You can do follow-up research if you're up to it.

"Except that the graph itself gives no indication of whether or not the 19 other rich countries represent many, most, all, or a few cherry-picked rich countries."
He did cite OECD as the data source. Besides, the names of the countries can be read directly off the scatter plot. If you (or Hmm for that matter) disagree with the selection, tell us which countries you don't consider "rich" and which "rich" country should have been included.

"these pictures don't help me decide what to look at next."
Use your imagination.

Josh, Sloan: Yes, any programming language like R should be able to produce customized axes. When I wrote that, I'm thinking of graphing software for the mass market. It would be interesting to compare the level of effort in R, Matlab, Python, etc.

@Henry & Hmm: Lazy trolling... Google "OECD health" and you'll find the datasets in 2 minutes (ok I help you: http://stats.oecd.org/index.aspx?DataSetCode=HEALTH_STAT#). To be fair, there are 34 countries with complete data in 2007 (including Mexico, Chile, Turkey...). So I was curious too, and I've replicated the 1st plot myself (R + ggplot2, 3 lines of code, mostly cut & past from here: http://had.co.nz/ggplot2/geom_text.html)... And guess what? The cloud looks exactly the same, with the US as an unbelievable outlier. I tried the "expanditure per capita in US$ purchasing power parity" but there are many export options (% of GDP...) on the OECD website so that you can try to hide this disturbing stubborn pattern with 'devaluation of dollar'... or whatever

What's a good rule of thumb on when the axes should start at 0 vs when they should start at some "reasonable" value like the Y axis does here (starting at 77)?
It seems that sometimes the choice of lower bound on the Y axis is used to exaggerate the differences in data? I guess it's especially offensive when used when the Y axis is a percentage, but are there other good guidelines?

Somebody's got to say it... there are 19 "other countries" besides the U.S., making the line for the U.S. a one-in-twenty outlier. One in twenty, why does that sound eerily familiar. Hmm wonders how they picked the other 19, but in reality such cherry-picking wouldn't even be necessary. In honesty it's very unlikely that this correlation is spurious, but the statistical power is at least worth noting (and I haven't read the original article, in which it probably is).

I think the chart is 'effective' in the sense that it communicates something clearly, but in reality it just raises more questions than it answers. What other factors contribute to life expectancy? How far ahead is the US in medical research than other countries, and how does that impact the cost of our health care? And so on. Chart authors have to be wary of becoming merely persuasive without connection to the truth.

If you say Organisation for Economic Co-operation and Development, people say "what's that?" If you say "other rich countries" they say "how did you pick them?" If you say by picking the OECD countries they say "what's OECD?". At some point you have to conclude the push-back is for convenience, not a genuine request for clarification, and challenge the pushers-back to cite a singel country that has a health expenditure and life expectancy that is in the ballpark of the US, and a name everyone recognises as a fair example of a rich country.

If they say there's cherry-picking, they should point to the unpicked cherry.