Stan at Mashable praised "5 Amazing Infographics for the Health Conscious". They belong to the class of "pretty things" that are touted all over the Web but from a statistical graphics perspective, they are dull.

Reader Mike L. poked me about the snake oil chart (right) while I was writing up this post. The snake oil chart is by David McCandless whose Twitter chart I liked quite a bit.

This one, not very much.

If the location and cluster membership of the substances depicted have some meaning, I might even feel ok about the effervescence. But I don't think so.

I continue to love his pithy text labels though; the "worth it line", truly.

The data (if verified) is pretty useful though since there are so many health supplements out there, and as a consumer, it's impossible to know which ones are sham. (Ben Goldacre's site may help.)

***

Now, let's run through the low lights of the rest:

I'm still trying to figure out what plus-minus means in the Dirty Water graphic.

The fact that the four buildings are not considered one complete unit also trips me up. The Truckee Meadows is depicted as 7 buildings, not divisible by 4. In addition, if 2 short buildings + 1 tall + 1 medium = 200,000 people, how many people live in 2 tall + 1 medium + 4 short buildings?

The obesity charts are pinatas.

The cost of health care chart is boring, just a prettied up data table. Why are life expectancy statistics expressed in 2 decimal places, and not in years and months?

Why 78.11 years and not 78 years (or 78 years, 1 month)?

The scatter chart relating survival rates of people with various ailments and the survival rates of virues/bacteria left outside our bodies is alright but do we care about this correlation?

***

I hate to be so negative but I can't believe these are examples of good infographics.

Reader Jeff G. sent us to this post from Floating Sheep, which walks through an analysis showing which states have the highest beer consumption in the United States. Jeff is not amused by several of their maps.

The first one utilizes overlapping bubbles, which is generally a bad idea but especially bad when the data is as dense as depicted here:

This is a great example to illustrate why the default use of maps for geographical data is sometimes misplaced. The greatest feature of this map (and many others) is the scarcity of data in the middle and the density around major city centers. This just tells us about the overall population density!

When we plot data on maps, we usually want to highlight something other than population density.

The second map, called the "beer belly of America", has circulated a bit on the Web. This is a case of throwing out too much data. It appears that the original data set contains the number of times bars and grocery stores were searched by location of Google Map users (two numbers per location). The plotted data consists only of whether bars or grocery stores were searched more, thus one bit (binary datum) per location.

Because of excessive data reduction, it appears that most of the country is a vast expanse of yellow. I'm sure if one goes back to the frequency of bar or grocery store searches, one will find that yellow comes in many shades.

While the maps are quite ugly, I like the way the website walks through their analysis process. I would say that their maps are less intended for final presentation as they are intended to aid exploration during the analysis. Indeed, at one point, they computed the number of bars per 10,000 residents (starting with North Dakota, 6.54), which is really the best way to summarize this information. It would be interesting to see this data plotted at the state level.

One technical note: the "beer belly" map contains a hidden assumption that the distribution of Google Map searches is the same as the distribution of population. If not, what we are looking at is the combined effect of the popularity of Google Map searches and beer consumption.

The graphs in this BBC article comparing several recent earthquakes hit us like aftershocks.

This chart tries to inform us the size of the quake in China was by far the largest. (The Richter scale is a power scale.)

The spirals feel like the Austin Powers time machine, disorienting, and also distracting because the bubble chart uses the entire area of the circles to represent magnitude. Try to guess what the relative amplitudes are before I disclose them below. (The red spiral for Italy was arbitrarily chosen as the index, with relative amplitude 1.) Bubbles are just horrible constructs, and for such a simple chart, they are worse than printing the data.

Amazingly, this is a double-axes bubble chart! The spirals hide the fact that the three gray circles are of different sizes, presumably color-coded to fit the "Strength" of the quakes. The other axis is "Relative Amplitude" represented by the red circles. Even though the two metrics are on hugely different scales, both the gray circles and the red spirals were anchored off the Italy red spiral (area = 1).

The following junkart version, which places the three quakes relative to the underlying relationship between strength and amplitude, is more informative with less fuss.

In the next chart, the Italians are shown to have no math skills (when in fact they have a strong tradition in math). How is it that 295 and 2000 have equal-sized bars? That's because the selected scale does not fit the data.

It's a mystery why Deaths and Injuries make friends while they ostracize the Homeless. The three series (deaths, injuries and homeless) can be displayed separately.

A simpledata table, with appropriate highlighting, gets this information across without the confusion.

This next chart is decent. It is more effective if they make the Italy and Haiti blocks 20 across (same as the China blocks), stacking them one over the other. By doing so, the chart reduces to one dimension and we do not need to judge areas.

I think there is a calculation error with the Italy numbers. If 1 in every 190 affected died, then the number of affected is 190 x deaths, which from the above bar chart, equals 56,000. If only 56,000 were affected, how could 1.5 million be left homeless? (Wikipedia said 65,000 were made homeless.)

Overlapping non-concentric bubbles are also in need of rescue. Bubbles encode data in areas, areas are a square function of radii, the distance from the center to the circumference. When circles are not concentric, the centers do not coincide. This makes judging radii harder, which makes judging areas harder.

Look at Haiti vs. Italy. According to the printed data, the light gray area is about 60, which would be 40% of the dark gray circle. Who would have guessed? (I checked the areas, and indeed the Haiti area was 40% larger than the Italy area.)

By the way, in the first chart, the relative amplitudes were 40, 1 and 5. Who would have guessed?

Andrew Gelman recently talked about good graphics being hard. Graphs are easy to make but hard to perfect. These examples show the need for care.

Reference: "Why did so many people die in Haiti's quake?", BBC News, 14 February, 2010.

Here are some things I have been reading while I'm traveling (the posting schedule will be erratic):

Does the vaccine matter? Shannon Brownlee and Jeanne Lenzer investigates for The Atlantic. About 100 million Americans get the flu shot each year; what benefit does it confer? This is an excellent article.

Some provocative quotes:

Flu comes and goes with the seasons, and often it does not kill people
directly, but rather contributes to death by making the body more
susceptible to secondary infections like pneumonia or bronchitis. For
this reason, researchers studying the impact of flu vaccination
typically look at deaths from all causes during flu season, and compare
the vaccinated and unvaccinated populations.

The estimate of 50 percent mortality reduction is based on “cohort
studies,” which compare death rates in large groups, or cohorts, of
people who choose to be vaccinated, against death rates in groups who
don’t. But people who choose to be vaccinated may differ in many
important respects from people who go unvaccinated—and those
differences can influence the chance of death during flu season. [Ed: people who can afford the flu shot vs. those who can't; people who are more health-conscious vs. those who aren't, etc.]

“For a vaccine to reduce mortality by 50 percent and up to 90 percent
in some studies means it has to prevent deaths not just from influenza,
but also from falls, fires, heart disease, strokes, and car accidents.
That’s not a vaccine, that’s a miracle.”

In the flu-vaccine world, Jefferson’s call for placebo-controlled
studies is considered so radical that even some of his fellow skeptics
oppose it. ... “It is considered unethical to do trials in populations that are
recommended to have vaccine,” a stance that is shared by everybody from
the CDC’s Nancy Cox to Anthony Fauci at the NIH. They feel strongly
that vaccine has been shown to be effective and that a sham vaccine
would put test subjects at unnecessary risk of getting a serious case
of the flu.

Clean Water Act Violations, New York Times. Can we trust tap water? As usual, a set of small bars would work better than concentric circles.

How does your state compare to California? (via Pew and Mother Jones) This is a nice illustration that often it is better to plot data derived from the raw data, as opposed to the raw data itself. Since the designer decided to hide the information, let's figure out what were the cut-off points for the color categories. If the size of each category is not the same, the designer needs to explain the scale. Also, the two shades of light blue are hard to tell apart. But all in all, a good effort here.

What they did was to print the ranks for every country, except the top four in Europe for which the ranks are placed next to the country name (in small font), and the actual amounts are placed in the middle of the bubbles. The ranks, of course, are pretty useless, and they obliterate the scale of the differences between countries.

Besides, the bigger the polluter, the smaller the rank but the larger the bubble. This built-in disconnect can also be disorienting.

Every bubble chart typically contains lots of data labels, and the reason is that the bubble form lacks self-sufficiency. Without the data labels, the reader has trouble comparing the areas.

I finally checked the Junk Charts mailbox again, and I found an uprising against bubble charts and pie charts. It appears that despite their shortcomings amply demonstrated here and elsewhere, editors everywhere continue to believe that the public has a lovefest with these creatures.

I will start off the parade with this one from the Wall Street Journal, purportedly showing that the Bank of England has continued to inject cash into the economy, and at ever increasing rates. The headline said Bank of England to expand bond-buy plan.

This chart has a variety of problems, in addition to the use of overlapping bubbles. As has been documented, it is almost impossible to gauge the relative sizes of circular areas, especially when they are overlapping.

If we remove all but one of the data labels, the chart is non-functional. This is what we mean by not self-sufficient: the interpretation of this chart requires, indeed demands, that all the underlying data be printed on the same chart. The only way readers can understand what is going on is by reading the data itself!

The horizontal axis (indicating time) is also non sensical. The separation from month to month is variable. Besides, and this is the key flaw of the chart, the projected number is a three-month total cumulative growth being treated like a monthly figure.

Since the Bank is projected to inject 175 50 billion extra pounds in the next three months, that would work out to be roughly 60 16 billion per month. That would turn the story upside down: one would conclude that the Bank is gradually slowing the rate of injection. The following bar chart points this out with little fuss:

When bars are used, there is no need to print every single data point. The relative lengths of the bars can be estimated easily. The months are equally spaced.

One final point: the exchange rate cited is not very helpful. What would have been more useful for readers would be the scale of the cash injection with respect to each nation's GDP.

So said a reader, Stephen B., of the following graphic (note: pdf) in the London Times concerning Andy Murray's recent tennis triumphs.

How can we disagree? Shocking? Yes. Failure? Definitely. Failing to communicate? No doubt.

Let's first start with the five tennis balls at the bottom. It fails the self-sufficiency test. It makes no difference whether the balls (bubbles) are the same size, or different sizes. Readers will look at the data and ignore the bubbles.

Amazingly, the caption said that "Murray has one of the best returns of serve in the game." And yet, the graphic showed the five players who were better than Murray, and nobody worse! For those unfamiliar with tennis statistics, it does not provide any helpful statistics like averages, medians, etc. to help us understand the data.

So we're told: the 75% of first-serve points won in the fourth round was 25.6% of the sum of the percentages of first-serve points won from first to fourth rounds (75%+70%+71%+76%). What does this mean? Why should we care?

The challenge with these two statistics is that they are correlated and have to be interpreted together. If a first-serve is won, then there would be no second serve, etc. Here's one attempt at it, using statistics from the Soderling-Federer match. It's clear that Federer was better on both serves.

IA said "The general idea is that the history of subway ridership tells a story
about the history of a neighborhood that is much richer than the
overall trend."

Okay but what about these sparklines would clarify that history? From what I can tell, this is a case of making the chart and then making sense of it.

The chart designer did make a memorable comment in his blog entry: "Hammer in hand, I of course saw this spreadsheet as a bucket of nails." The hammer is a piece of software he created; the nails, the data of trips taken.

Nathan at FlowingData gave a reluctant passing grade to this Wall Street Journal bubbles chart illustrating the recent U.S. bank "stress" test.

One should fight grade inflation with an iron fist. (Hat tip to Dean Malkiel at Princeton.) A simple profile chart would work nicely since the focus is primarily on ranks. The bubbles, as usual, add nothing to the chart, especially where one can create any kind of dramatic effect by scaling them differently.

Nathan also pointed to the maps of the seven sins, which garnered some national attention. This set of maps is a great illustration of the weakness of maps to study spatial distribution of anything that is highly correlated with population distribution. Do cows have envy too? See related discussion at the Gelman blog.