Felix linked to a set of charts about guns in the U.S. (and elsewhere). The original charts, by Liz Fosslien, are found here.

I like the clean style used by Fosslien. Some of the charts are thought-provoking. Many of them may raise more questions than they answer. Here are a few that caught my eye.

A simplistic interpretation would claim that banning handguns is futile, and may even have an adverse impact on murder rate. However, this chart does not reveal the direction of causality. Did some countries ban handguns because they are reacting to higher violence? If that is the case, this chart is confirming that the countries with handgun bans are a self-selected group.

***

The U.S. is an outlier, both in terms of firearm ownership and firearm homicides. This makes the analysis much harder because the U.S. is really in a class of its own. It's not at all clear whether there is a positive correlation in the cluster below, and even if there is, whether we can draw a straight line up to the U.S. dot is also dubious.

***

Fosslien is being cheeky to deny us the identity of the other outlier, the country with few firearms but even higher death rate from intentional homicide. These scatter plots are great by the way to show bivariate distributions.

***

I'd still prefer a line chart for this type of data but this particular paired bar chart works for me as well. The contents of this chart is a shock to me.

While doing some research for my statistics blog, I came across a beauty by Lane Kenworthy from almost a year ago (link) via this post by John Schmitt (link).

How embarrassing is the cost effectiveness of U.S. health care spending?

When a chart is executed well, no further words are necessary.

I'd only add that the other countries depicted are "wealthy nations".

***

Even more impressive is this next chart, which plots the evolution of cost effectiveness over time. An important point to note is that the U.S. started out in 1970 similar to the other nations.

Let's appreciate this beauty:

Let the data speak for itself. Time goes from bottom left to upper right. As more money is spent, life expectancy goes up. However, the slope of the line is much smaller for the US than the other countries. There is no need to add colors, data labels, interactivity, animation, etc.

Recognize what's important, what's not. The US line is in a different color, much thicker and properly made the foreground of the chart.

Rather than clutter up the chart, the other 19 lines are anonymized. They all have the same color and thickness, and all given one aggregate label. This is an example of overcoming loss aversion (see this post for more): it is ok to suppress some of the data.

The axis labeling is superb. Tufte preaches this clean style. There is no need to use regularly-spaced axis labels... use data-informed labels. Unfortunately, software is way behind on this issue. You can do this in R but that's about it.

To kick this off, I re-made the Figures 2.1-2.2.8 in the report, which summarized the findings of the Gallup World Poll covering annual samples of 1,000 people aged 15 and over from each of 150 countries. (These charts are effectively the first charts to appear in the report. There is no Figure 1 because Chapter 1 has no charts. The report also inexplicably follows the outdated academic-publishing convention of banishing all diagrams to the end of the report as if they were footnotes.)

In the report, they presented histograms of the 0-10 ratings (10 = happiest) by region of the world, two charts a page running to 5 pages. Here's one such page:

If you're presenting regional data, you're expecting readers to want to compare regions. It's not very nice to make them flip back and forth and task their memory in order to do these comparisons.

This data set is where small multiples show their power. Small multiples are a set of charts all sharing the same execution (type, axes, etc.) but each showing different subsets of the data. This sort of chart is designed for group comparisons, and is one of the key propositions by Edward Tufte in his classic book.

In the following junkart version, I plotted each region's histogram against the global average histogram (indicated in gray as background). The average rating in each region is indicated with the light blue vertical line. The countries are sorted from highest average happiness to lowest.

The same data now occupies only one page of the report. (A topic for a different post: does the higher average rating in N. America/Europe indicate greater happiness or grade inflation?)

***

Alternatively, one can stack up the line charts into a column, as shown on the right. This view is somewhat better for any pairwise comparisons. (Calling JMP developers: how do I rotate the text labels to make them horizontal?)

***

Finally, I made a chart for exploratory purpose, using a scatterplot matrix (see also this post). In this version, every pair of regions is under the microscope. Since there are 10 regions (including the global total), we have (10*9)/2 = 45 pairwise comparisons. Each of these comparisons have its own chart in the matrix, indexed by the labels on the axes.

Each individual chart is a scatter plot of the proportions selecting a particular rating. If the histograms in region A and region B are identical, then we see 11 dots all lined up in a diagonal line going from bottom left to top right.

In addition, the pink area of the chart contains 95% of the data. So the more the pink area resembles a diagonal line, the more correlated are the histograms between the two regions being compared.

For example, the very top chart compares CIS with East Asia. The thinness of the pink area tells us that the histograms of happiness ratings in those two regions resemble each other. You can easily verify this finding by looking at the first two line charts shown in the column of line charts above.

By contrast, the chart comparing CIS and Europe has an expansive pink area, meaning the happiness ratings follow different distributions. This is also verified by looking at the line charts, which show that Europeans are generally happier than people in CIS. There is an "excess" of people with ratings around 6-8 in Europe compared to CIS. The dots corresponding to these ratings would appear above the diagonal.

This scatterplot matrix explores all possible comparisons on one page but it is a lab exercise not suitable for mass consumption because it has too much detail.

***

For those curious, the small-multiples of line charts is made using R. The column of line charts and scatterplot matrix are created using JMP.

The New York Times chose to present the poll results from Super Tuesday in the following chart (link):

It took me a bit of time to take in what this chart has to offer. To save your troubles, I've drawn up a reader's guide:

The graphic is a disguised scatter plot with one axis being Romney's share minus Santorum's share and the other axis being the total share of all other candidates. This is an "uneven canvass" in the sense that the data are much more likely to fall into a small part of the chart area (the orange shaded region).

If the reader just wants to know which segments of the electorate favors Romney v. Santorum, the chart is pretty effective at pointing to the answer. It is quite challenging to learn much else about the data.

***

Here are the results for Ohio, plotted as a stacked bar chart, with three segments in each bar (Romney's share, Santorum's share and the share of all other candidates).

This more standard presentation conveys much more of the underlying information. The trade-off is that the reader has to try harder to figure out the answer for each segment of voters.

[PS: 3/13/2012]:

Thanks to several readers for your comments. I went back to look at the NYT graphic again, and can confirm that it is a ternary chart. The chart area is indeed an equilateral triangle with three equal sides.

What threw me off was the axis labels, particularly the Santorum and Romney labels which give the impression that there is a zero mid-point and some kind of share data along the east-west axis. If this were true, then the chart could not be a ternary plot because Romney and Santorum shares are not mirror images.

In a ternary plot, we must identify Romney, Santorum, and "Other candidates" as the three vertices. The way this chart is labelled, it invites readers to drop a perpendicular line to the horizontal axis to read Santorum's share (e.g.). That doesn't work. Trying to fish the data out of a ternary plot is always challenging. You pick the vertex corresponding to the data series you want, say Romney's share. Then you take the side opposite that vertex. Now draw lines parallel to that side -- as you approach the Romney vertex, Romney's share goes from 0% to 100%. The following chart shows this:

For ternary plots, it's easier to go with the hand-waving principle that the closer you are to the vertex, the greater the weight of that vertex. So with the abortion data point, we see that it is much closer to the Santorum corner than the other two corners.

The vertical line for "other candidates" is also misleading. To read the share of votes that went to other candidates, one has to followS either the OR side or the SO side of the triangle. Basic geometry will show that going up the vertical line will not produce the share of "other candidates".

***

Lastly, here is a scatter plot representation of the data using the Romney-Santorum difference as the horizontal axis and the share of all others as the "other candidates":

The pattern of dots on this chart looks very similar to the ternary chart (that is one other reason why I thought the original graphic was a scatter plot.) However, the two plots are distinct entities. For the scatter plot, the horizontal axis goes from -100% to +100% while the vertical axis can only go from 0% to 100%.

The problem chart is used to present a "net promoter score" analysis by ABB (link). Net promoter score is the difference between people who will recommend a product or company and people who won't. The chart presents the components, the number of people who gave "red cards" and the number giving "green cards".

Unfortunately, the symmetry in the definition of the net promotor score is destroyed by this stacked bar chart. The red bars are all aligned against the vertical axis but the green bars aren't so it's difficult to compare their lengths.

John fixed this problem by aligning both sets of bars against a vertical axis. Sensibly, he places the red bars along the negative direction. He also orders the categories by "margin of victory", whichin effect is the net promoter score, with the category needing the most attention at the top.

The improved chart points out some of the complicating factors in understanding a metric that is composed of two components each of which vary and can be missing. For example, a category like "technical support" is rated among the highest overall but this conceals the fact that it is one category that receives many red cards. Also, consider "product/system quality" versus "health and safety": both categories end up with about the same net promoter score but the former category has many more respondents than the latter.

***

John also tried a scatter plot. This one requires some careful reading. The best categories are going to end up in the top left corner; it may be better to flip the red card axis to a descending order so that the top right corner is the best corner.

The diagonal rays (axes) are great visual aids to help figure out which categories are better and which are worst.

I am a little turned off by the crowdedness in the bottom left corner of the chart. Those are categories with relatively low levels of red or green cards, and also relatively balanced between reds and greens... that is to say, those are categories people don't care too much about, and among those people who care, there isn't a consensus about good or bad. In other words, the survey revealed very little of use about those categories. It bothers me that much of the survey ended up collecting data on such items.

Felix Salmon linked to this whimsical chart, featuring the frequency of laughter at Federal Reserve's FOMC meetings on the lead up to the bubble.

The Daily Stag Hunt blog originated this chart (link), and they juxtaposed it with the Case-Shiller 20-city home price index to make the case that the Fed members were laughing all the way to, er, the McMansion.

***

There is little doubt that the underlying narrative is correct, that the Fed governors did not see the bubble, and failed to respond to it appropriately. It is both tempting and amusing to find correlations of this type that would make this point clear.

But as I have discussed elsewhere, one must be extremely careful when looking at correlations of time-series data. Consider the fact that the FOMC laughter shows a unidirectional (up) pattern throughout the period being depicted. Consider, next, that for much of this period, the U.S. (and much of the world) was riding a massive bubble. These two facts alone will guarantee that we can find hundreds of data series that will show very strong correlation with the FOMC laughter track. Pick up any economic data series from that era, whether it is home sales, mortgages, retail sales, stock market prices, and so on, one will find a unidirectional (up) pattern.

So we must dig into the data more to understand the real connection between FOMC laughter and average large-city home prices.

***

The top chart on the right shows the expected correlation between the Case Shiller Index and laughter. Note that the Case Shiller data is an index with Jan 2000 being 100. I used a 4-meeting moving average to smooth out the fluctuations in the laughter data.

Because meetings are 1 to 2 months apart, one should expect the participants to be reacting to the latest data, i.e. the change in Case Shiller index in the previous 1 or 2 months. What the top chart shows is something different: it's all relative to Jan 2000. (Hard to imagine someone at the Feb 2005 meeting mulling over the 80% increase since Jan 2000, rather than the increase since their last meeting.) Thus, I produced the middle chart, which is a scatter plot of the one-month changes in Case Shiller index against average number of laughs.

The case for strong correlation has disappeared. It now looks like the laughs were most acute when the home prices declined versus the prior month.

The bottom chart is the same as the middle chart, except that I looked at home price changes two months apart. We observe an identical pattern.

***

This pattern shouldn't surprise us because it's actually in the original charts. During the last few meetings of 2006, the index has already stopped rising while the laughter continued to grow. We didn't pay attention because we were mesmerized by the long period of steady increases on both data sets.

Reader Ron D. was not pleased to see this dual-axis chart purporting to show a cause-effect between the decline in union membership and the drop in the proportion of income earned by middle-class households (defined as the middle 60% of households). Click here to read the original article. They credit CAP's David Madland and Karla Waters for this chart.

Using dual axes is a well-tested way of creating correlation where there may be none. Playing with the scales will do that for you. I wrote about this issue here.

However, the correlation in this data cannot be denied, as the scatter plot below shows. Note that the scatter plot is much better at revealing correlational patterns than a chart with multiple time-series lines. (Here's an example of two lines that display a spurious correlation.)

If one were to ask for a linear regression line, one will obtain a very high R-squared indeed (over 0.9). The problem is with the interpretation of this correlation. Any two data series that move with time will be highly correlated with each other, just because each series is highly correlated with time. Despite what you might believe after reading Freakonomics, regression -- especially in social science data -- cannot prove causation.

The writers at Think Progress show no such restraint, from the title "The American middle class was built by unions and will decline without them." to the sentences "these assaults have successfully decreased union membership over time... this has had a detrimental effect on the American middle class."

Note: these statements may in fact be true; I'm just pointing out that the chart does not buttress the assertions.

***

It's often hard to elevate a correlation to a causal effect. We have to try different tests. One such test for this data set is: if a change in union membership causes a change in middle-class incomes, then we'd expect that the annual changes of one to be correlated with the annual changes of the other (at least in direction, better in magnitude).

So, in a year in which union membership declined a lot, one should expect to see middle-class incomes also drop substantially.

The next scatter plot contrasting these annual differences suggests that causation is probably absent. At this smaller time scale, one just doesn't see any correlation at all. Annual declines in the proportion of union membership has been around 2-4% for most of this period but shifts in middle-class incomes have been ranged widely in terms of direction as well as magnitude.

P.S. Andrew suggested connecting the lines. Here are the charts with the lines:

What appears to be a very strong correlation on the left chart does not look that well-coordinated on the right chart! (The lines connect the dots in chronological order.)

The last chart in the infographics on OECD education data asks another intriguing question: do countries that pay teachers more achieve better test scores?

This chart suffers from the same ill as the one previously discussed (here): the data is not suitable to address the question. It is mighty hard to see any pattern in the set of bar charts on offer. This lack of correlation can be confirmed by displaying the data in a scatter plot:

The scatter on the left presents the data as shown in the original, with a regression line drawn in that appears to indicate a positive correlation of higher spending and higher achievement.

Here, spending is measured by the ratio of primary teacher pay after 15 years of service to average GDP while achievement is indicated by the proportion of students who attain a "top" level of proficiency in any or all of the three test subjects.

But notice the solitary point sitting on the top right corner (labelled "1"). That point is Korea, which has both the highest achievement and the highest spending (by far). Korea is an outlier (known as a leverage point). The chart on the right is the same as the one on the left with Korea removed. What appears to be a moderate positive correlation vanishes. (The numbers plotted are the ranking of countries by the proportion of students attaining top proficiency, the metric on the vertical axis.)

So, either the message is that achievement and spending are uncorrelated (for every country except Korea), or that we have a measurement problem. I think the latter is more likely, and would defer to psychometricians to say what are acceptable measures for spending and for achievement. Do primary teachers with 15 years or more of service represent "education spending"? Do top students adequately capture general achievement in the education system?

***

The original chart contains a serious misinterpretation of the data (source: Education at a Glance 2009, OECD). It falsely assumes that the proportion of students attaining top proficiency in each subject is additive. In fact, because the same student could be top in one or more subjects, the base of such a sum would not be 100%.

In my version, the metric used is the proportion of students who attain top proficiency in 1, 2 or all 3 subjects. This metric is computed off a 100% base.

I also removed the breakdown by gender. This creates clutter, and I can't find any interest in the male or female data.

My coworker pointed me to a Huffington Post article claiming a Bill Gates byline that contains some highly dubious analysis and a horrific chart. We presume Gates was fed this information by some analysts but even so, one wishes he wouldn't promote innumeracy. But then, he has a history: Howard Wainer demolished analysis by his foundation used to channel lots of dollars to the "small schools" movement a few years ago; I wrote about that before.

***

First, the offensive chart:

Using double axes earns justified heckles but using two gridlines is a scandal! A scatter plot is the default for this type of data. (See next section for why this particular set of data is not informative anyway.)

I can't understand the choice of scale for the score axis. The orange line, for instance, seems to have a positive slope. In any case, since these scores are "scaled", and the "standard error" is about 1 (this number is surprisingly hard to find, even on Google), it would appear that between 300 and 400 on the score axis, there are 100 units of standard error. By convention, three units of standard error away from the average is considered rare (events). There is no conceivable way that the average score could jump by that much.

***

The analysis is also flawed. Here's the key paragraph:

Over the last four decades, the per-student cost of running our K-12 schools has more than doubled, while our student achievement has remained flat, and other countries have raced ahead. The same pattern holds for higher education. Spending has climbed, but our percentage of college graduates has dropped compared to other countries... For more than 30 years, spending has risen while performance stayed flat. Now we need to raise performance without spending a lot more.

This argument contains several statistical fallacies:

Comparing apples and oranges: a glaring piece of missing information is whether other countries have increased their per-student spending on education, and if so, how fast the growth is compared to that in the U.S. Without this, the analysis makes no sense.

Confusing correlation and causation: so spending increased while test scores stagnated. In order to conclude that there is something wrong with the spending, one must first believe that spending has a causal effect on test scores. Observe that this is not a conclusion from the data; it is an assumption going into the analysis, neither supported nor disputed by the data since the data merely show a (lack of) correlation. This is another instance of "story time": we see data, we see conclusion, we are misled into thinking that data supports conclusion but in fact, the data is an irrelevant distraction. (For other instances of "story time", see this link to my book blog.)

Fallacy #1 and fallacy#2 combined: even if you believe that spending affects test scores, it is still a stretch to say that spending in U.S. schools affects the gap in test scores between U.S. students and foreign students. In the world where foreign countries are frozen in time, maybe so but where foreign countries are investing in education, one can't say anything about the test score gap without first knowing what's going on overseas.

Assumption invalidating the analysis: In a short breath, the analyst admits the possibility of (a) spending increase together with flat scores and (b) score increase together with flat spending. One model under which both of those possibilities coexist is one in which test scores are independent of spending. If so, why would one even look at a plot of these two quantities?

The dilemma of being together (a la Chapter 3 of Numbers Rule Your World): sorry to say but the spending on pupils is likely to have a highly skewed distribution depending on school district. Also, the average test scores is likely to have high variability across school districts. Thus, using an average for the entire country muddies the water.

Needless to say, test scores are a poor measure of the quality of education, especially in light of the frequent discovery of large-scale coordinated cheating by principals and teachers driven by perverse incentives of the high-stakes testing movement.

In the same article, Gates asserts that quality of teaching is the greatest decisive factor explaining student achievement. Which study proves that we are not told. How one can measure such an intangible quantity as "excellent teaching" we are not told. How student achievement is defined, well, you guessed it, we are not told.

It's great that the Gates Foundation supports investment in education. Apparently they need some statistical expertise so that they don't waste more money on unproductive projects based on innumerate analyses.