Flowing Data has been doing some fine work on the baby names data. The names voyager is a successful project by Martin Wattenberg that has received praise from many corners. It's one of these projects that have taken on a commercial life as you can see from the link.

The typical insight one takes from this chart is that the name "Michael" (as a boy's name) reached a peak in the 1970s and have not been as popular lately. The data is organized as a series of trend lines, for each name and each gender.

Speaking of area charts, I have never understood their appeal. If I were to click on Michael in the above chart, the design responds by restricting itself to all names starting with "Michael", meaning it includes Michael given to a girl, and Michaela, for example. See below.

What is curious is that the peak has a red lining. At first thought, one expects to find hiding behind the blue Michael a girl's name that is almost as popular. But this is a stacked area chart so in fact, the girl's name (Michael given to a girl, if you mouse over it) is much less popular than the boy Michael (20,000 to 500 roughly).

***

Nathan decides to dig a layer deeper. Is there more information beyond the popularity of baby names over time?

In this post, Nathan zones in on the subset of names that are "unisex," that is to say, have been used to name both boys and girls. He selects the top 35 names based on a mean-square-error criterion and exposes the gender bias for each name. The metric being plotted is no longer pure popularity but gender popularity. The larger the red area, the greater the proportion of girls being given that name.

You can readily see some interesting trends. Kim (#34) has become almost predominantly female since the 1960s. On the other hand, Robbie (#18) used to be predominantly female but is now mostly a boy's name.

One useful tip when performing this analysis is to pay attention to the popularity of each name (the original metric) even though you've decided to switch to the new metric of gender bias. This is because the relative proportions are unstable and difficult to interpret for less popular names. For example, the Name Voyager shows no values for Gale (#29) after the 1970s, which probably explains the massive gyrations in the 1990s and beyond.

This example transformed a grouped bar chart into a line chart, something that I have long advocated. I'm still waiting for the day when market research companies start to switch from bars to lines.

***

Jorge Camoes, also a long-time reader, produced a redesign of a chart on military spending first printed in Time magazine. (link)

Dual-axis plots have been pilloried here often, especially when the two axes have different and incompatible units, as in here. As usual, transforming to a scatter plot is a good first step, which is what Jorge has done here. He then connected the dots to indicate the time evolution of the relationship. This is a smart move here just because the pattern is so stark.

The chart now illustrates an "inflexion point" in 2000. Prior to 2000, troop size was decreasing while the budget was stable. After 2000, budget increased sharply while troop size remained relatively stable.

Now peer back at the original chart. You can discern the sharp decrease in troop size over time, and the sharp increase in budget over time, but separately. The chart teases a cross-over point around 1995 which turned out to be misleading. This is a great illustration of why dual-axis plots are dangerous.

The Giants QB Eli Manning is in the news for the wrong reason this season. His hometown paper, the New York Times, looked the other way, focusing on one metric that he still excels at, which is longevity. This is like the Cal Ripken of baseball. The graphic (link) though is fun to look at while managing to put Eli's streak in context. It is a great illustration of recognition of foreground/background issues. (I had to snip the bottom of the chart.)

After playing around with this graphic, please go read Kevin QuigleyQuealy's behind-the-scenes description of the various looks that were discarded (link). He showed 19 sketches of the data. Sketching cannot be stressed enough. If you don't have discarded sketches, you don't have a great chart.

Pay attention to tradeoffs that are being made along the way. For example, one of the sketches showed the proportion of possible games started:

I like this chart quite a bit. The final selection arranges the data by team rather than by player so necessarily, the information about proportion of possible games started fell by the wayside.

(Disclosure: I'm on Team Philip. Good to see that he is right there with Eli even on this metric.)

Notice the inspired touch of the black circles to trace the outline of Blackberry's market share. They are a guide to experiencing the chart.

I wish they had put the Palm section above Blackberry. In an area chart, the only clean section is the bottom section in which the market share is not cumulated. Given the focus on Blackberry, it's a pity readers have to perform subtractions to tease out the shares.

I also wonder if the black circles should contain Blackberry's market share rather than the year labels.

You should not vary the width of the bars (unless you are introducing another dimension), and

You should space the bars unevenly if your measurement times are unevenly spaced.

I mean, how is it in the year 2013, the BBC shows viewers this? (tip from UK reader Clarke C.)

The chart is absurd on its face. Men did not double in height between 1871 and 1971. This chart was broadcast in the show "breakfast" which apparently is the BBC UK version of Good Morning America.

I'd just use a line chart. The figurine construct is cute but too much trouble because you have to grow the width while growing the height. If you encode data in the area, then the height is no longer proportional to the real height.

Years ago, we featured something similar: how penguins evolved into humans (link). Curiously, also a gift from British media.

One piece of advice I give for those wanting to get into data visualization is to trash the defaults (see the last part of this interview with me). Jon Schwabish, an economist with the government, gives a detailed example of how this is done in a guest blog on the Why Axis.

Here are the highlights of his piece.

***

He starts with a basic chart, published by the Bureau of Labor Statistics. You can see the hallmarks of the Excel chart using the Excel defaults. The blue, red, green color scheme is most telling.

Just by making small changes, like using tints as opposed to different colors, using columns instead of bars, reordering the industry categories, and placing the legend text next to the columns, Schwabish made the chart more visually appealing and more effective.

The final version uses lines instead of columns, which will outrage some readers. It is usually true that a grouped bar chart should be replaced by overlaid line charts, and this should not be limited to so-called discrete data.

Schwabish included several bells and whistles. The three data points are not evenly spaced in time. The year-on-year difference is separately plotted as a bar chart on the same canvass. I'd consider using a line chart here as well... and lose the vertical axis since all the data are printed on the chart (or else, lose the data labels).

This version is considerably cleaner than the original.

***

I noticed that the first person to comment on the Why Axis post said that internal BLS readers resist more innovative charts, claiming "they don't understand it". This is always a consideration when departing from standard chart types.

Another reader likes the "alphabetical order" (so to speak) of the industries. He raises another key consideration: who is your audience? If the chart is only intended for specialist readers who expect to find certain things in certain places, then the designer's freedom is curtailed. If the chart is used as a data store, then the designer might as well recuse him/herself.