When I look at this chart (from Business Insider), I try to understand the decisions made by its designer - which things are important to her/him, and which things are less important.

The chart shows average salaries in the top 2 percent of income earners. The data are split by gender and by state.

First, I notice that the designer chooses to use the map form. This decision suggests that the spatial pattern of top incomes is of top interest to the designer because she/he is willing to accept the map's constraints - namely, the designer loses control of the x and y dimensions, as well as the area and shape of the data containers. For the U.S. state map, there is no elegant solution to the large number of small states problem in the Northeast.

Second, I notice the color choice. The designer provides actual values on the visualization but also groups all state-average incomes into five categories. It's not clear how she/he determines the boundaries of these income brackets. There are many more dark blue states than there are light blue states in the map for men. Because women incomes are everywhere lower than men, the map at the bottom fits all states into two large buckets, plus Connecticut. Women incomes are lower than men but there is no need to break the data down by gender to convey this message.

Third, the use of two maps indicates that the designer does not care much about gender comparisons within each state. These comparisons are difficult to accomplish on the chart - one must involuntarily bob one's head up and down to make the comparisons. The head bobbing isn't even enough: then you must pull out your calculator and compute the ratio of women to men average. If the designer wants to highlight state-level comparisons, she/he could have plotted the gender ratio on a single map, like this:

***

So far, I infer that the key questions are (a) the gender gap in aggregate (b) the variability of incomes within each gender, or the spatial clustering (c) the gender gap within each state.

(a) is better conveyed in more aggregate form. Goal (b) is defeated by the lack of clear clustering. (c) is not helped by the top-bottom split.

In making the above chart, I discover a pattern - that women fare better in the smaller states like Montana, Iowa, North & South Dakota. Meanwhile, the disparity in New York is of the same degree as Oklahoma and Wyoming.

This chart tells readers a bit more about the underlying data, without having to print the entire dataset on the page.

The only reason why the IEEE Spectrum magazine editors chose this chart form is because they think they need to deliver precise salary figures to readers.

This chart is just so... sad.

The color scheme is all wrong, the black suggesting a funeral. The printed data occupying at least half of the width of each bar frustrate any attempt to compare lengths. We enter an unusual place where higher numbers appear under smaller numbers. The job titles are regrettably dressed in the same cloth as the median salary bars. It's not clear how the regions are ordered but in any case, it's hard to figure out regional disparities. In reality, no one is getting precisely the listed salaries - rounding up those numbers makes them easier to grasp.

This is a chart that repels rather than attracts readers.

***

A test of sufficiency immediately nails the problem. When the data set is removed, there is almost nothing to see:

This chart (shown right), published by Zillow in a report on housing in 2012, looks quite standard, apparently avoiding the worst of Excel defaults.

In real estate, it’s all about location. In dataviz, it’s all about details.

What are some details that I caught my eye on this chart?

Readers have to get over the hurdle that “negative equity” is the same as “underwater homes.” This is not readily understood unless one reads the surrounding text. For example, the first row for the U.S. average proclaims that 31% of U.S. homes are “underwater” and among these underwater homes, 10% of the mortgages are delinquent. The former is concerned with the valuation of the property while the latter deals with payments or lack thereof.

According to the legend, the blue segments stand for the proportions of underwater homes in different metro areas but it’s not quite true – the blue part represents underwater but not delinquent mortgages while the red and blue combined represents all underwater mortgages. This is a common problem in stacked bar charts.

The metro areas are in alphabetical order by city, which means an opportunity is missed to help readers discern patterns. Patterns related to city-name alphabets is not of interest to most (except certain econometrics journal editors). Try arranging by region, or by decreasing level of negative equity, or some other meaningful variable.

The designer tried to do something clever with the horizontal axis labels and I don't think it succeeds. To see what is going on, read the note below the chart. The trick is to let readers look at the number of underwater and delinquent mortgages in two ways, as a proportion of underwater mortgages (through the white data labels) and as a proportion of all mortgages (through the axis labels). That's a mess, sorry to say.

Finally, I like the horizontal axis to extend to 100% because underlying the proportions shown in blue and on the horizontal axis is the population of all mortgages.

***

Perhaps a shock to many readers. The task of showing underwater delinquent mortgages simultaneously as a proportion of underwater mortgages and as a proportion of all mortgages is solved using .... pie charts.

I just created a couple of examples here:

The deep orange sector can be compared to the entire circle, or to the larger orange sector. Readers usually don't have a problem with pies with only three slices.

Twitter follower @ashwink_s didn't see eye-to-eye with the following charts that appeared in an Indian publication.

There is the infamous racetrack chart:

In the racetrack chart, the designer has embedded data in the angles at the center of the concentric circles but the visual cues point to the arc lengths. If the same proportion of people voted Yes as voted No, the two arcs should look like this:

The length of the red arc is much larger than the length of the gray arc, even though they encode the same value. There is no reason to double over, just pull them back straight pronto!

***Next, we have a busy chart:

We are starstruck.

All those stars are redundant as they just illustrate the rating numbers printed to their left. The story here is that the government received a 7.5 rating, with no one rating it below 4, and the majority giving a 7 or 8. (It's curious that no one at all rated the government below 4. In most rating polls that I've come across, primarily in the U.S., there are extreme views.)

After the makeover:

P.S. Thanks to Matt F. who noticed the switched bars in the original post, and messaged me. The chart has now been fixed.

Twitter friend Jimmy A. asked if I can help Elon Musk make this chart "more readable".

Let's start with a couple of things he did right. Placing SpaceX, his firm's data, at the bottom of the chart is perfect, as the bottom part of a stacked column chart is the only part that is immediately readable. Combining all of Europe into one category and Other U.S. into one group reduce the number of necessary colors.

Why is this chart unreadable? Here is a line-up of the culprits:

Red Russia is stealing the thunder

SpaceX is sharing the blues with Japan/China/Other U.S.

The legend is sorted in the opposite way as the column segments (courtesy of Excel defaults)

Axis labels given to two decimal places for market share split only a small number of ways

It's unclear what "market share" means: is it share of the number of launches or the revenues generated by those launches? Is the "base" of the market share changing over time?

The last two columns are speculative and these are the two years in which SpaceX has a noticeable advantage (unless they are talking about contracts already concluded)

According to the underlying data, there are some very big changes at foot. The following small-multiples chart shows what is going on:

Twitter friend Janie H. asked how I would visualize a hypothetical third column of this chart that contains the change from 2016 to 2017:

This table records the results from a survey question by eMarketer, asking respondents ("marketers") to identify their top 5 technology priorities in the next 12 months.

I suggested the following:

A hype-chasing phenomemon is clearly at play. Internet of Things and wearable technology are so last year. This year, it's all about A.I. Interestingly, something like "Big data" has been able to sustain the hype for another year.

A design decision I made is to encode the magnitude of the change in the bar lengths while encoding the direction of the change in the colors. One can of course follow the more canonical design of placing the negative bars on the left side of the data labels. My decision is a subtle way of imposing the hierarchy - first I care about magnitude, then I care about direction.

Here is a third way:

This design imposes a different hierarchy. Your eyes are drawn to the top/bottom of the chart.

Any of these designs beat the data table by a mile. It's just too much work for the reader to figure out the value of the changes from the table.

In addition to Xan's "packed bars" (which I discussed here), there are some related efforts to improve upon the treemap. To recap, treemap is a design to show parts against the whole, and it works by packing rectangles into the bounding box. Frequently, this leads to odd-shaped rectangles, e.g. really thin and really tall ones, and it asks readers to estimate relative areas of differently-scaled boxes. We often make mistakes in this task.

The packed bar chart approaches this challenge by allowing only the width of the box to vary with the data. The height of every box is identical, so readers only have to compare lengths.

Via Twitter, Adil pointed me to this article by him and his collaborators that describes a few alternatives.

One of the options is the "wrapped bar chart" introduced by Stephen Few. Like Xan, he also restricts the variation to legnths of bars while keeping the heights fixed. But he goes further, and abandons packing completely. Instead of packing, Few wraps the bars. Start with a large bar chart with many categories filling up a tall plotting area. He then divides the bars into different blocks and place them side by side. Here is an example showing 50 states, ranked by total electoral votes:

You can see the white space because there is no packing. This version makes it easier to see the relative importance of the different blocks of states but it is tough to tell how much the first block of 13 states accounts for. The wrapped barchart is organized similar to a small multiples, except that the scale in each panel is allowed to vary.

Another option is the "piled bars." This option, presented by Yalçın, Elmqvist, and Bederson, brings packing back. But unlike the packed bars or the treemap, the outside envelope no longer represents the total amount. In the "piled bars" design, the top X categories act as the canvas, and the smaller categories are packed inside these bars rather than around them. Take a look at this example, which plots GDP growth of different countries:

The inset on the left column is instructive. The green (smallest) and red (medium) bars are packed inside the blue (largest) bars. In this example, it doesn't make sense to add up GDP growth rates, so it doesn't matter that the outer envelope does not equal the total. It would not work as well with the electoral vote data in the previous example.

I wonder whether a piled dot plot works better than a piled bar chart. This piled bar chart shares a problem with the stacked area chart, which is that other than the first piece, all the other pieces represent the differences between the respective data and the next lower category, rather than the value of the data point. Readers are led to compare the green, red and blue pieces but the corresponding values are not truly comparable, or of primary interest.

This problem goes away if the bars are represented by dots.

***

What strikes me as the most key paragraph in the Yalcin, et. al.'s article is the following:

To understand graphical perception performance, we studied three basic tasks:

1) How accurately can we estimate the difference between two data points?2) How accurately can we estimate the rank of a data point among all the rest?3) How accurately can we guess the distribution characteristic of the whole dataset?

As a chart designer, we have to prioritize these tasks. There is unlikely to be a single chart form that will prevail on all three tasks. So if the designer starts with the question that he or she wants to address, that leads to the key task that the visualization should enable, which leads to the chart form that facilitates that task the best.

Xan Gregg - my partner in the #onelesspie campaign to replace terrible Wikipedia pie charts one at a time - has come up with a new chart form that he calls "packed bars". It's a combination of bar charts and the treemap.

Here is an example of a packed barchart, in which the top 10 companies on the S&P500 index are displayed:

What he's doing is to add context to help interpret the data. So frequently these days, we encounter data analyses of the "Top X" or "Bottom Y" type. Such analyses are extremely limited in utility as it ignores the bulk of the data. The extreme values have little to nothing to say about the rest of the data. This problem is particularly acute in skewed data.

Compare the two versions:

The left chart is a Top 10 analysis. The reader knows nothing about the market cap of the other 490 companies. The right chart provides the context. We can see that the Top 10 companies have a combined market cap that is roughly a quarter of the total market cap in the S&P 500. We also learn about the size of the next 10 versus the Top 10, etc.

As with any chart form, a nice dataset can really surface its power. I really like what the packed barchart reveals about the election data by county:

(Thanks to Xan for providing me this image.)

Notice the preponderance of red on the right side and the gradual shift from blue/purple to pink/red moving left to right. This is very effective at showing one of the most important patterns in American politics - the small counties are mostly deep red while the Democratic base is to be found primarily in large metropolitan areas. I have previously featured a number of interesting election graphics here. Washington Post's nation of peaks is another way to surface this pattern.

Xan would love to get feedback about this chart type. He has put up a blog post here with more details. I also love this animation he created to show how the packing occurs.

This is one of those innocent-looking charts that could have been a poster child for artistic embellishment. The straightforward time-series chart is deemed too boring. The designer shows admirable constraint in inserting “information-free” content, such as the dense gridlines (graph paper) and the 3D effect (ticker).

Seem harmless but not really.

Here I turn off the color.

After the 3D effect is applied, the reader no longer knows whether to look at the top or bottom edge of the ticker.