On Friday, I'm attending and speaking at the Leaders in Software and Art Conference, organized by Isabel Draves. LISA is an amazing gathering of artists interested in technology and software. For example, there is a panel on 3D printing and hardware hacking, and one on "creative coding, art and advertising". Check out videos from past years, and click here to register. My talk is at around 3:30 in a tightly packed day of activities.

Andrew summarized the above chart thus: "Sadly, America is home to far more climate skeptics than the global average."

This conclusion may be correct but the chart is less convincing than it appears.

Let's pull out the Junk Charts Trifecta Checkup. Recall that there are three sides to the triangle. The question is well-posed, and the bar chart is an adequate choice for this data. We thank the designer for not printing the entire data set on the tight space, and to start the vertical axis at zero.

There are a few improvements one can still make to the bar chart. Start with turning it around so that the reader doesn't have to turn his/her head around. Also, extend the axis to 100% helps the interpretation a little bit.

If you have keen eyes, you notice that Greece showed up at the top of the revamped chart. The bar for Korea is also a tad too short in the original chart; it should be at 85%.

To what extent is the set of countries "global"? Take a look:

It missed all of Scandanavia, most of Indochina, India, much of Africa, and all of Central America.

In the Trifecta checkup, we note that the data may not be complete for the posed question. Given this flaw, the map is perhaps a better choice to show us where the holes are.

Abhinav asks me to check out his blog post on a chart on global warming (I prefer the term climate change) featured on Wonkblog. The chart is sourced to a report by the World Metereological Association (link to PDF).

Hello, start the axis at zero whenever you are using plotting columns. That's as fundamental as only plot proportions on a pie chart.

There is a reason why the designer didn't like to start the axis at zero. It is this (Abhinav helpfully made all these charts):

The trouble is that for this data set (on global average temperature), the area below 13 is completely useless. It's like plotting body temperature on a scale of 0 - 100 Celsius when all feasible values fall into a tight range, maybe 35-38 Celsius. I recount a similar situation that led to a college president saying something stupid in Chapter 1 of my new book, Numbersense. (Information on the book is here.)

So we understand the desire to get rid of the irrelevant white space. This is accomplished by using a line chart. (I'd prefer to omit the data values, and rely on the axis.)

Abhinav then created various versions of this by compressing and expanding the vertical scales. I don't think there is anything wrong with the above scale. As I mentioned, the scale should focus on the range of values that are feasible.

Here's a chart from one of the Italian dailies I picked up in Rome last August . It apparently plots the number of hectares of farmland that was burnt during various fires over time.

While the chart is clean and pleasing to the eye, it has a malformed time axis. In the side-by-side comparison shown below, you can see how the evenly-spaced time axis completely distorts the cadence of the data.

In fact, the data should be put into a bar chart, rather than a line chart. Lines are used primarily to denote trends, and sometimes to compare profiles. Neither of these cases apply here.

The bar chart requires proper spacing too to present the years in which no hectares were burnt by fires.

The first thing we know about kitchen cabinets is that they are not large enough. If you live in a small city apartment, you're always looking for ways to maximize your space. If your McMansion has a huge kitchen, you'll run out of space all the same, after splurging on the breadmaker, and the ice-cream maker, and the panini grill, and containers for garlic, onions, different shapes of pastas, and the peelers for apples, garlic, carrots, the egg-separator, the foam-maker, and so on.

Another thing we know is that no matter how many and how large the cabinets are, there is not enough premium space, by which we mean front-facing space within arm's reach. What has this to do with graphs and charts? We'll find out soon enough.

***

In this weekend's edition of New York Times (link 1, link 2), several climate scientists wrote about droughts in America: "widespread annual droughts, once a rare calamity, have become more frequent and are set to become the 'new normal'". What caught my eye was the following graphic showing precipitation levels, enticingly titled "21 Centuries of Rainfall in New Mexico". (See the full graphic here).

The blue lines going up represent years in which rainfall was higher than normal; the orange lines going down show years of below-normal rainfall. "Normal" is defined as the average rainfall between 1931 and 1990. Particularly useful were the annotations telling us in certain centuries, the number of years below normal.

I immediately needed to see the following chart:

This just takes the annotations and plotted them directly (I made up the data where they were not noted.)

What we are seeing here, at the scale of centuries, is that in the most recent period (only up to 1992), New Mexico is getting wetter.

Yes, this chart doesn't seem to support the scientists' assertion. In fact, I'm not sure why the NYT decided to insert this "news analysis" next to the opinion column. It's not that the analyst doesn't see the contradiction - he stated "the bigger picture, from El Malpais, suggests that the West has endured far drier periods. Uncomfortably drier."

I have major issues with this juxtaposition. If the NYT does not think the opinion column is correct science, it should decide not to print it. If the NYT thinks there are people who might object to the science, it should counter it by citing the work of other scientists (none is cited in the sidebar). While El Malpais may provide the longest measure of rainfall conditions, this is no basis to claim it measures "the bigger picture", and specifically it is shocking, perhaps reflecting the cultural bias of the newspaper, to see that this data from New Mexico tell us something about "the West" and somehow not about "the East". Why would it generalize to the West but not the East?

***

Back to the chart, and specifically kitchen cabinets.

We have two charts from one data set. The chart of blue and orange spikes contains every data point. The stacked column chart shows only aggregated data, specifically how many above-average and below-average years in each century. The first chart makes readers work very hard to get any information out of it. The designer recognizes this and adds useful notes, generally about the proportion of below-average years. Assuming that those proportions are the key to deciphering the chart, why not plot them directly?

The objection is that much information is lost by not including the rest of the data. One cannot deny it. For example, just looking at the stacked column chart, you cannot know that the late 1980s and early 1990s were extremely wet years in New Mexico, over three standard deviations above the norm, nor can you know that there was a mega-drought in the late 16th century lasting decades.

However, chart designers should realize that there is a shortage of front-facing, accessible space in the kitchen cabinets. Putting more into a chart means some of the information will be pushed to the back, or out of arm's reach. If you need a ladder to get to that cabinet, what would you put in it? Would you rather leave it empty? I know I would.

The following chart by the Financial Times reminds me of the famous Napoleon Russian Campaign map:

I also love it when geographical data, in this case average house price data by region, are plotted without a map. If plotted on a map, the relative prices are typically differentiated by color. On this chart, they are encoded in the heights of the columns. Our brains are just not wired to translate color differences into numeric differences so every time we can avoid color scales, we should.

Like Minard's chart, multiple dimensions are comfortably accommodated. The location along the river bank. The north/south orientation of the location. The "width" of a neighborhood.

A minor quibble is with the choice of data series. I wonder if price per square feet would be a better metric. One can also try a relative scale (indexed to the average).

Reader Dave S. was disturbed by the graphics in the inaugural World Happiness Report, published by Jeffrey Sachs's Earth Institute (link). It's a 200-page document with lots of graphs, many of which require rework.

Here's a pie chart showing (purportedly) what "happy" people in Bhutan are happy about:

I'm really curious how these domains add up to 100% exactly. Since the data came from some kind of survey, you typically would allow each respondent to pick more than one domains in which he or she is happy. If that is the case, then it would not make sense to add up responses, nor would the total (100%) signify anything.

If, on the other hand, respondents are forced to pick only one domain, it is very suspicious that all 9 domains would essentially receive the same number of votes. Nor would it make sense to ask survey-takers to select only one domain if all 9 domains contribute to someone's happiness.

Pie charts are perhaps the most abused chart type. There are just endless examples of poorly executed pie charts (just browse my last few posts). The prevalence of abuse may be reason enough to ban them.

Compare the captions. What's the difference between "In which domains do happy people enjoy sufficiency?" and "Indicators in which happy people enjoy sufficiency"? The categories are related but not identical (Education vs. Schooling, Health vs. Self reported health status, etc.) However, in Figure 5, the distribution is uniform as in Figure 4. Is the data contradictory? Or the captions misleading?

This column chart would be better presented as a horizontal bar chart so that readers don't have to break their necks trying to read the category names.

The designer should also perform the routine task to get rid of the 120% tick mark on the proportion axis that comes from Excel.

Reader Jim S. was rightfully mystified by the following map that appeared on the Ars Technica blog (link), and purported to demonstrate that high temperatures of March 2012 across most of the U.S. were of historical significance.

I must say the production values of this map, produced by the people at NOAA, are superb. I love, love, love the caption that the Ars Technica editors added to the map. I wish they had blown it up to 20-point font, and made it shiny :) Besides that, the colors are well-chosen, and it doesn't feel cluttered despite having 48 numbers printed on it.

Like Jim, I'm hypnotized by the drumbeat of 118, 118, 118, ... all over the red area. What could the numbers mean? They could be temperatures in Fahrenheit (although 118 degrees in March surely would have been newsworthy). The legend does lend support to this interpretation (see right), what with the extra-large font announcing "Temperature". Jim commented: "But it seems odd that such a large area would have precisely the same high."

***

Not so soon, Jim. The NOAA also made the chart shown on the right (link). So indeed, the entire country could be given one value of 118.

If not Fahrenheit, what could the numbers mean? They could be some kind of index in which case the average value would seem to be 50 (the white patch). That would be one strange index.

Too bad this map is produced by specialists for specialists, leaving us commoners guessing. The only clue we got is in the title, "Statewide Ranks".

But this isn't very helpful either. The 118s are still ringing in my ear. If the numbers are ranks, then 118 would likely be the maximum rank, given as there are so many 118s. But I can't figure out which metric has 118 levels.

I finally found my way to this page, which explains what NOAA calls "climatological ranking". The page also has a chart (below), which can serve as a sort of legend for the maps, but is almost as difficult to read.

Apparently there are 118 years worth of recorded temperatures, going back to 1895. And within each state, the annual temperatures for the past 118 years were ranked from lowest to highest, meaning that 118 is the hottest on record.

Given that there is lop-sided attention to hotter temperatures (global warming), it would be much better to reverse the ranking so that 1 is the hottest month year!

The chart also explains that the years are grouped into three equal buckets to indicate "below normal", "near normal" and "above normal".

Too bad this chart gives us three or five levels of ranking while in the map they use seven colors (levels).

They really ought to include on the map (a) the definition of the ranking and (b) the range of ranks corresponding to each color.

***

While researching this post, I found this wonderful page of NOAA maps (link). This is a beautiful illustration of the process of statistical aggregation. Notice the trade-off between simplicity and loss of information. The art in statistics is to figure out the right balance between the two.

***

I always like to explore doing away with the unofficial rule that says spatial data must be plotted on maps. Conceptually I'd like to see the following heatmap, where a concentration of red cells at the top of the chart would indicate extraordinarily hot temperatures across the states.

I couldn't make this chart because the NOAA website has this insane interface where I can only grab the rank for one state for one year one at a time. But you get the gist of the concept.

***

Did I tell you I love, love, love the caption? Go right ahead, and make a slogan for your chart today!

[PS: Reader Mark Bulling (see his comment below) contributes a realization of my heatmap suggestion above. One of the benefits of this chart is its economy, as a small version of it shows: