Book Review: Information is Beautiful by David McCandless

Information is Beautiful is a thought provoking labour of love by one of the first true data journalists, David McCandless. It is a simply structured collection of graphical interpretations of a variety of interesting statistics, factoids and opinions. It is compelling in its ability to provoke exclamations of surprise at the relationships between facts (e.g.

Information is Beautiful is a thought provoking labour of love by one of the first true data journalists, David McCandless. It is a simply structured collection of graphical interpretations of a variety of interesting statistics, factoids and opinions. It is compelling in its ability to provoke exclamations of surprise at the relationships between facts (e.g. the financial crisis costing us almost four times more than the expected total cost of the west’s adventurism in Iraq and Afghanistan) as well as generating respect for the creativity and design that has gone in to presenting the information.

That being said, the book also illustrates the very tricky position of a data journalist (or whatever we eventually call those individuals who render ‘information’ visually). Visualization of data in the form of graphics and the expression of facts, opinions, processes, etc. in the form of information visualizations is, essentially, a new language. As consumers of this new language, we have to place a large amount of trust in the translator.

As is appropriate for a book aimed more at the coffee table than the academic library, Information is Beautiful comes with no explanation of the graphical idioms used. Nor does it come with any summary of conclusions or discussion of the implications drawn from the data or the visualization. It is more like the glossy book of fabulous beaches from around the world which contains little or no indication of where these places are or what is just out of sight, or lurking behind the scene. This is, in my opinion, a grave oversight.

For example, the first piece presents a number of types of spending (e.g. defence budgets, foreign aid payments, etc.) and compares them – via the cleverly engineered positioning of a page turn – with the cost of ‘the financial crisis’ (which I assume is the most recent such event). Here the intended implication is clear – the financial crisis cost a lot more than all that other stuff that you think is costing us a lot. But what is the scope of all the other stuff? The defence budgets for the US, China, the UK, Saudi Arabia and India are presented – are these the largest budgets? If so, what percentage of all defense budgets do they represent? Are there other events which provide more context (e.g. other financial meltdowns, the ‘cost’ of a world war, the cost of other wars). It is clear that the auther has selected the variables being compared with the recession, but without knowledge of the selection criteria it is hard to know either the intended spin, or how meaningful the conclusion that the reader is lead to might be.

The second graphic – an exploration of the values and opinions of left and right leaning political positions – suffers in a similar way from a lack of context. The graphic, for example, appears to make the statement ‘right leaning governments don’t interfere with [the] social lives [of their citizens]’. What are we to make of this? Is this the opinion of the author? Or is it somehow a statement derived from one of the sources quoted at the bottom of the image (wikipedia, britannica.com, etc.)? As there are a number of sources, is this a consensus or is it an amalgam of the information in these sources?

McCandless presents some statistics on the structure of rape reports, prosecutions and convictions in England and Wales. I’ve approximated the visualization below:

Overlapping circles evoke the common concept of the venn diagram. However, here the semantics would appear to indicate that Prosecutions include rapes that are not reported, and that Convictions are exclusively obtained for non-reported rapes. I can’t make sense of that.

McCandless often uses Google’s Insights tool to make observations about the relative importance of various concepts. This type of analysis requires some amount of preparation for the reader. These graphs have no vertical axis label or units. Google labels the y-axis as ‘interest over time’ and provides a reasonable amount of explanation about the graphs and how to interpret them, including:

A downward trending line means that a search term’s popularity is decreasing. It doesn’t mean that the absolute, or total, number of searches for that term is decreasing.

In other words, a peak doesn’t necessarily mean that there is more absolute interest – it could just as easily indicate that there is a reduced amount of interest in some other topic which therefore takes away mass from the denominator. Quite possibly the conclusions that can be drawn from these comparative time series are reasonable where there are differences between the trends (this may not be the case for compared series that show correlated peaks).

In terms of colour palette, McCandless is clearly from the Wired circa 2000 school – a school which embraces challenging colour schemes (such as white characters on a yellow background – see ‘Lack of Conviction’) and where the semantics of colours often trumps the contrast (for example, using a range of similar colours in a legend to a graphic leaving the reader to guess which blob goes with which meaning – see ‘Most Successful Rock Bands’).

Overall, this is a fascinating book. It has received popular coverage in the media and I’m sure it continues to sell well all over the globe. As a community of readers, we have to become data literate so that we can consume this type of content with the same critical eye as if we were reading the statements in dry text. It is not the information that is beautiful per se, it is the presentation of the information. I feel that this book would be far more useful if it contained a preface of some sort helping the reader to understand this new language, to educate them in the skills required to draw insights, but also to question the translation.