Bad Graphs

Every year, Dutch Finance Minister Jeroen Dijsselbloem sends a report to Parliament on state participations - companies that are (partially) owned by the state. Recently, the minister answered questions from the Finance Committee of the Lower House. One of them questioned the use of a stacked bar chart to show dividends, «since this isn’t very clear». The minister acknowledges the problem and takes up the challenge:

In creating this bar chart we aimed at comprehensiveness by including all dividends received from all state participations. Because of the large differences in dividend, this results in sub-optimal readability. For the 2015 annual report, it will be considered whether the readability can be improved without making concessions to comprehensiveness.

I’m sure he’ll be interested in good ideas, so if you have any suggestions for improving the chart, tweet them to @j_dijsselbloem. And if you want to give it a try yourself: here’s the data for 2010–2014.

Update April 2016 - Jean Adams shows how the chart can be improved.[1]

Update October 2016 - Alas. Last month, a new issue of the participations report was sent to Parliament. Apparently, the minister hasn’t succeeded in improving the dividend chart; the chart has now been omitted. Instead, the report now has a line chart showing just the total amount of dividend received (further, the simple bar chart showing dividend received per participation for the most recent year has been replaced with a pie chart).

Adams correctly points to a discrepancy between the csv and the original chart: the csv contains data on total dividend paid, whereas the original chart shows the amount received by the state (the two are different for companies owned for less than 100% by the state). ↩

Anyone mildly interested in data visualisation must have come across examples of shamelessly deceptive Fox News charts. Truncated y-axes, distorted x-axes, messing with units - nothing’s too bold when it comes to manipulating the audience. But does this kind of deception actually work? Anshul Vikram Pandey and his colleagues at New York University decided (pdf) to find out. They showed subjects either control or deceptive versions of a number of charts.

The deceptive versions were: a bar chart with truncated y-axis; a bubble chart with one bubble too large relative to the other; a line chart with a more spread-out y-axis, resulting in a less steep rise than in the control version and a chart with an inverted y-axis (inspired by Reuters’ famous Gun Deaths in Florida chart - interesting discussion here). In all cases, the correct numbers were included in the chart.

Of course a truncated y-axis can sometimes be defensible and needn’t be deceptive, as long as it is made clear what’s going on. More problematic is the aspect ratio chart. The authors claim the chart to the right is deceptive and the one to the left not, but how can you tell? You can’t. There’s no rule that says what the number of pixels per year on the x-axis should be.

Be that as it may, the authors found substantial differences in how the deceptive charts were interpreted compared to the control charts. Note that in most cases, they didn’t measure whether deceptive charts were interpreted incorrectly, just whether they were interpreted differently than the control charts. For example, participants were asked how much better access to drinking water was in Silvatown, represented by the bar to the right of the bar plot, relative to Willowtown, represented by the bar to the left (on a 5-point Likert scale ranging from slightly better to substantially better). When shown the control bar chart, the average score was 1.45; with the truncated y-axis the average score was 2.77.

The authors also tried to find out whether factors such as education and familiarity with charts had an influence on how charts were interpreted. It appears that people who are familiar with charts are less easily fooled by a truncated y-axis. Perhaps because truncated y-axes are second on the list of phenomena chart geeks love to hate and criticise (after 3D exploding pie charts, of course).

I missed this one: in October, Dutch economics journal ESB published an article that critically reviews all the charts in a report of the CPB (the semi-offical neoliberal economic institute that dominates Dutch policy debates). Authors Frank Kalshoven and Peter van Bergeijk find that on average, as many as four out of eight aspects of the charts have been done badly.

The authors invented a scale to assess charts, using the following criteria: the title describes what the chart shows; abbreviations and terms are explained; axis units are clearly described; axes are aligned; the source is explicitly mentioned; charts tell a clear story; charts contain little «noise» and there’s an explicit relation between panels in a chart.

One of the charts discussed is the one shown above. Among other things, the source is missing. Further, the y-axes of the bottom panels aren’t aligned, wrongly suggesting that taxes (bottom right panel) are often higher than collective expenditures, whereas in fact expenditures are higher than taxes (note that the government also has other sources of income).

Kalshoven and Van Bergeijk’s analysis seems to be strangely unconnected to the broader universe of data visualisation critique (interestingly, one of their sources of inspiration has - somewhat harshly - been described as «a horrible example of economists not recognizing that outsiders can help them»). Some of the most popular topics of chart criticism are missing from Kalshoven and Van Bergeijk’s article: use of colour; if and when it’s ok to truncate y-axes; legends versus labels; and if and how to use the area size of bubbles or icons to represent quantity.

Using new data from Statistics Netherlands (CBS), cycling expertise centre Fietsberaad reports that cycling has declined in the Netherlands over the past three years, both in terms of the distance traveled and the number of trips per person per day. The chart to the left is from their website.

Fietsberaad does warn against reading too much into this: there have been changes in how the data are collected and analysed, and the weather may have caused short-term fluctuations in cycling (meteorological institute KNMI reports that there were 46 days with minimum temperatures below 0°C in 2011; 50 in 2012 and 64 in 2013). Keeping all this in mind, it’s still interesting to note that the same period saw an increase in cycling in the four largest cities.

Be that as it may, the chart created by Fietsberaad does look worrisome. But what does it actually show? There are no values on the y-axis. Does the y-axis even start at zero? Apparently it doesn't, for otherwise the chart would have looked more like the one below. Which looks slightly less dramatic.

You might think the graph above is about the effort required for climbing, with those little bicycles going up the slope, but it’s not (in fact, it shows for each bicycle type how much more power is required to cycle as speed increases). Apparently, somebody added the bicycles for «fun», without giving much thought to what the graph is supposed to communicate.

The graph is from the book Cycling Science (not to be confused with the intriguing Bicycling Science), a book full of charts that explain how cycling works. Unfortunately, it contains quite a bit of chart junk and some of the graphs raise more questions than they answer.

For example, the chapter on cycling safety has a map that suggests the Netherlands is the most unsafe country for cycling. The problem is that it shows the percentage of road deaths who are cyclists, which says more about how many people cycle than about cycling safety. Another graph says Chris Boardman managed to cycle more than 56 km in an hour when he assumed a super-aerodynamic position, but that he would only manage 15 km when sitting upright. Really?