This table indeed presents the insight clearly. Those fund sectors in which Vanguard does not compete have much higher costs than the fund sectors in which Vanguard is a player. The author calls this the "Vanguard effect."

This is a case where finding a visual design to beat this table is hard.

For a certain type of audience, namely financial, the spreadsheet is like rice or pasta; you simply can't live without it. The Bloomberg spreadsheet does one better: the bands of blue contrast with the white cells, which neatly divides those funds into two groups.

If you use spreadsheets a lot, you should definitely look into in-cell charts. Perhaps Tufte's sparkline is the most famous but use your imagination. I also wish vendors would support in-cell charts more eagerly.

Here is a vision of what in-cell technology can do with the above spreadsheet. (The chart is generated in R.)

Consider the following two charts that illustrate the same data. (I deliberately took out the header text to make a point. The original chart came from the Wall Street Journal.)

To me, the line chart gets to the point more quickly: that Burberry stores are more numerous in those places shown on the left and fewer in those places shown on the right, relative to comparable luxury brands (Prada and Louis Vuitton).

The reason why the tiled bar chart is tougher to decipher is its inefficient use of space. Within each country group, the three places are plotted on two levels, one on the upper level, and two on the lower level. Then the two groups of countries are placed top and bottom. Readers have to first size up the individual group of three countries, then make a comparison between the two groups.

***

From a Trifecta checkup perspective, the bigger issue here is the data. The full story seems to be that those two country groups have different currency experiences... Japan and the continental European countries have weakening currencies, which tends to make their goods cheaper for Chinese consumers. This crucial part of the story is not anywhere on the chart.

In addition, the number of stores is not a telling statistic, because stores may have different areas, and certainly the revenues generated by these stores differ, potentially by country. A measure such as change in same-store sales in each country is more informative.

It is also not true that the distribution of stores is purely a matter of business strategy, as Burberry is a British brand, Prada is Italian and Louis Vuitton is French. They each have more stores in their home countries, which seems very logical.

The credit for today's headline goes to Andrew Gelman, who said something like that when I presented the following chart at his Statistical Graphics class yesterday:

With this chart (which appeared in a large ad in the NY Times), Fidelity Investment wants to tell potential customers to move money into the consumer staples category because of "greater return" and "lower risk". You just might wonder what a "consumer staple" is. Toothbrushes, you see.

There are too many issues with the chart to fit into one blog post. My biggest problem concerns the visual trickery used to illustrate "greater" and "lower". The designer wants to focus readers on the two orange brushes: return for consumer staples is higher, and risk is lower, you see.

The "greater" (i.e. right-facing) toothbrush is associated with longer brushes and higher elevation; the "lower" (left-facing) toothbrush, with shorter brushes and lower elevation.

But looking carefully at the scales reveals that the return ranges from 6% to 14% and the risk ranges from 10% to 25%. So larger numbers are depicted by shorter brushes and lower elevation, exactly the opposite of one's expectation. The orange brushes happen to represent the same value of 14.3% but the one on the right is at least four times as large as the one on the left. As the dentist says, time to rinse out!

The vertical axis represents ranking of the investment categories in terms of decreasing return and/or risk so on both toothbrushes, the axis should run from 1 to 10.

***

How would the dentist fix this?

The first step is to visit the Q corner of the Trifecta Checkup. The purpose of this chart is for investors to realize that (using the chosen metrics) consumer durables have the best combination of risk and return. In finance, risk is measured as the volatility of return. So, in effect, all the investors care about is the probability of getting a certain level of return.

The trouble with any chart that shows both risk and return is that readers have no way of going from the pair of numbers to the probability of getting a certain level of return.

The fix is to plot the probability of returns directly.

In the above sketch, I just assumed a normal probability model, which is incorrect; but it is not hard to substitute this with an empirial distribution, if you obtain the raw data.

Unlike the original chart, it does not appear that consumer staples is a clearcut winner.

I like to uaeuse declarative titles for charts. This chart below, found in an investment magazine published by Charles Schwab, wants to tell us that emerging markets "perform differently."

That is a nice concise message. Now, what does the chart say?

Readers have to jump through some hoops. First, the axes are flipped from their normal posture. Time typically is shown running horizontally. And market returns which range widely from positive to negative values are frequently displayed vertically. But not here.

Second, this chart equally treats all three categories of equity returns (domestic, international developed markets, international emerging markets) when the title draws attention to emerging markets. In fact, emerging markets is placed last in the legend. Try blocking the top section, just staring at the grouped bar chart -- the emerging markets do not jump out.

Third, we are asking ourselves what the designer/analyst means by "performing differently." The most obvious difference is the blue spike corresponding to the 79% return in 2009. But in many other years, the blue bar is not obviously different.

One way to interpret "perform differently" is that the emerging market returns exhibit low correlation with the returns in either domestic or international-developed markets. (Such a finding would be helpful to investors looking for diversification.) The scatter plot can be used to examine correlations.

The pattern is surprising. The chart on the left shows that emerging market returns are highly correlated in a linear way with international devleoped-market returns. The chart on the right shows that domestic returns are less correlated with emerging market returns but the correlation is still pretty strong.

There were two unusual years, one (2009) in which emerging markets did quite a bit better and another (2013) in which emerging marketss did quite a bit worse.

These observations imply that the data do not really support the title of the original chart.

I found this chart on a Munich publication called Süddeutsche Zeitung. This appeared during the most recent Greek/Euro crisis.

The bags of money were financial obligations that were coming due from June 2015 to December 2015. There were three creditors, indicated by red, blue and gray.

This graphic answers one question well: individual debt obligations for a given month and given creditor. However, by privileging these details, the chart fails to convey cumulative totals well - readers have to make calculations in their heads.

In the revision, I wanted to convey two key messages: the total amount of debt that was coming due in those seven months, and the relative proportion of debt owed to the three creditors. An area chart brings this out better.

Conversely, it is much harder to figure out individual debt obligations by month and creditor from this version.

This points to the importance of determining your key message(s) before choosing a form.

A reader Alex V. nominated this chart as one of the most incomprehensible ever:

This comes from the Annual Report 2014 of Allison Transmission.

I applaud the fact that they obviously spent time making the charts. This is not something that comes straight out of Excel.

And someone really tried here--but you'd hope someone else came to the rescue and let them know this is impossible to understand.

The use of leader lines to point to the actual data doesn't work, not least because there are only two margins to fit three lists of numbers. It's like the two little kids being forced to share one seat on the left margin.

The rightmost column adds to 100%. The largest three sections appear to say Allison used cash to pay dividends, to buy back share and to repay debt. The three uses accounted for almost 95% of the positive change in cash (really not sure what the base is of these percentages). The gap of 5% is split into two parts which are explained by labels that are quite uninformative ("Other, net", "Change in cash, Net").

Even this interpretation is flawed because the blue section is net change in cash, which presumably was positive in 2014 (and 2013). However, dividends, share repurchase and debt repayment all cause a negative shift in cash so how could they point in the same direction as a positive net change in cash?

Things fall apart if I apply this interpretation to 2012. The -28% blue section seems to indicate that Allison had a cash deficit that year. This is weird because that would imply cash increased 100% exactly in 2013 and in 2014.

Further, someone is trying to hide bad news. Compare the -28% section in blue for 2012 and the 25% section in red for 2013. The blue section is slightly smaller than it should be. Part of the trick is to draw the horizontal axis in the same blue as the blue block. The top edge of the blue block is really not part of the block!

***Now you might argue that the distortion is so small it could be accidental. But then this happens again on a different chart on the same page:

The highest sales number was achieved in 2014 (the blue column). But no! The number in 2012 is 2,142 which is larger than the sales in 2014.

This chart, which I found flipping through Stern magazine in Germany, accomplishes one important goal. It makes me stop flipping, and look.

The chart presents a point of view that is refreshing. The Airbus A320 is a true collaborative effort. The chart presents a good amount of information efficiently. Reminds me of diagrams in instruction manuals for building airplane models.

It is in essence a map. And as with maps, it has a built-in bias. The size of a part is not proportional to its importance or value. So, one issue with this diagram is it draws attention to large parts with uncomplicated shapes.

One way to address this is to use an informative legend. Notice that the map up top takes up a lot of space while serving little purpose. Instead, one can use a bar chart with a colored bar for each country. This bar chart allows one to add an extra measure. For example, the proportion of value accounted for by each country.

European readers: I wonder if there is a standard color scheme for different countries. What do you think of their choice of color?

This wonderful data visualization made me stop in my tracks at a train station somewhere in Bavaria.

It conveys so much information in such an efficient manner.

At a glance, the diagram tells passengers the configuration of the train they will be getting on, how many carriages, what types of carriage and crucially at which location the train will come to a stop at the current station.

The most important item is the curvy red line running vertically. This tells you where you are standing in relationship to the entire platform. I was standing right near the middle. If someone is standing on the sides, there are many trains they will not be able to get on.

The entire chart is in German but I didn't need to know German. This is what great data visualization accomplishes.

***

Would you be willing to miss a train just to admire this work? I would.

PS. I was a bit overexcited when I wrote the above. I hope my German readers will tell me what the red, yellow, green colors signify. Also why do some trains appear to have two or three disconnected carriages?

Ted Ballachine wrote me about his website Pension360 pointing me to a recent attempt at visualizing pension benefits in various retirement systems in the state of Illinois. The link to the blog post is here.

One of the things they did right is to start with an extended guide to reading the chart. This type of thing should be done more often. Here is the top part of this section.

It turns out that the reading guide is vital for this visualization! The reason is that they made some decisions that shake up our expectations.

Similarly, a person's service increases as you go down the vertical axis, not up.

I have recommended that they switch those since there doesn't seem to be a strong reason to change those conventions.

***

This display facilitates comparing the structure of different retirement systems. For example, I have placed next to each other the images for the Illinois Teacher's Retirement System (blue), and the Chicago Teacher's Pension Fund (black).

It is immediately clear that the Chicago system is miserly. The light gray parts extend only to half of the width compared to the blue cells in the top chart. The fact that the annual payout grows somewhat linearly as the years of service increase makes sense.

What doesn't make sense to me, in the blue chart, is the extreme variance in the annual payout for the beneficiary with "average" tenure of about 35 years. If you look at all of the charts, there are several examples of retirement systems in which employees with similar tenure have payouts that differ by an order of magnitude. Can someone explain that?

***

One consideration for those who make heatmaps using conditional formatting in Excel.

These charts code the count of people in the shades of colors. The reference population is the entire table. This is actually not the only way to code the data. This way of coding it prevents us from understanding the "sparsely populated" regions of the heatmap.

Look at any of the pension charts. Darkness reigns at the bottom of each one, in the rows for people with 50 or 60 years of service. This is because there are few such employees (relative to the total population). An alternative is to color code each row separately. Then you have surfaced the distribution of benefits within each tenure group. (The trade-off is the revised chart no longer tells the reader how service years are distributed.)

Excel's conditional formatting procedure is terrible. It does not remember how you code the colors. It is almost guaranteed that the next time you go back and look at your heatmap, you can't recall whether you did this row by row, column by column, or the entire table at once. And if you coded it cell by cell, my condolences.

It's very frustrating to read the mainstream articles about the recent unemployment report. For example, the New York Times said "U.S. Jobless Claims Hit 15-year Low." (link)

At this point, everyone should be aware of how employment statistics, in particular, the unemployment rate, is computed. Certainly, the editors at the Times have heard of U3 and U6 metrics, and the employment-population ratio. Any report that does not provide all of these metrics is a report that you can't trust.

***

Here is a brief version of the story in a few charts that anyone can easily generate from the FRED site. (link)

If we only report the headline unemployment rate, the picture looks rosy.

The unemployment rate has been in steady decline since peaking in 2010. The current level hasn't been seen since mid-2008, and we may soon see levels reach levels prior to the recession, i.e. the level of the boom years!

Isn't that a surprise? That's what the mainstream media are reporting.

We are facing a far less rosy picture if we consider a different metric of unemployment.

It turns out the headline statistic uses a very liberal view of who's employed. This second chart is a more "common-sense" count of who's unemployed. Even though the first unemployment metric says we are almost back to pre-recession performance, the second metric says we are still about 2 percentage points above what it used to be in 2008. That is a much less happy picture.

There are two major distinctions between the metrics. If you have a part-time job for even one hour during the period when the government conducts its survey, you are considered "employed" on the first chart but not on the second. Besides, if you are too discouraged to even look for a job, you are not considered unemployed in the first chart but you are unemployed in the second.

***

The most important chart, though, is the employment-population ratio. You might think that an unemployment rate of 5.5% means that 5.5% of the nation's population are unemployed. Not true. Perhaps it means 5.5% of the working-age population (excluding kids and elderly) are unemployed? Still not true.

As a result of a bipartisan effort, the base of that proportion is the number of people whom the government deems to be "wanting a job".

Before the latest recession, the proportion of people who "want a job" has been around 63% for a very long time. During the recession, this proportion plunged to below 59%. Currently, it has moved above 59% but this is about 4% below the mid-2008 level. An extra four percent of the population has decided that they "don't want a job", and they are not counted at all in the unemployment rate in the first chart above.

***

This series of charts illustrate why looking at a single metric is dangerous. By the first metric, the job market is the same as in mid-2008. When we look at the other two metrics, we immediately see that it's the same but not really the same.

I have a whole chapter in Numbersense (link) on employment statistics. In the chapter, I mentioned John Crudele's columns at the New York Post. As usual, he is one who will peel back the onion. His take on the latest statistics is here. While his views can be a bit extreme, reading his take on these statistics is more beneficial to your health than those of the usual sources.