This sort of chart is, unfortunately, quite common in business circles. Just about the only thing one can read readily from this chart is the overall growth in the plug-in vehicle market (the heights of the columns).

To fix this chart, start subtracting. First, we can condense the monthly data to quarterly:

This version is a bit less busy but there are still too many colors, and too many things to look at.

Next, we can condense the makes of the vehicles and focus on the manufacturers:

This version is still less busy and more readable. We can now see Chevrolet, Nissan, Toyota, Ford and Tesla being the five biggest manufacturers in this category. All the small brands have been aggregated into the "Others" category. The stacked column chart still makes it hard to know what's going on with each individual brand's share, other than the one brand situated at the bottom of the stack.

This shows the growth in the overall market, as well as several interesting developments:

The growth in the number of competitors in the market especially since 2012

The fragmentation of the market. Before mid 2012, Chevrolet was dominating the market. Since then, there are five or six brands splitting the market

The first-to-market brands have not been able to sustain their advantage

A smoothed version of the line chart is even more readable:

Graphics is a discipline that often rewards subtracting. Less is more.

***

In the above discussion, I focused on the Visual aspect of the Trifecta Checkup. This dataset is really difficult to interpret, and I'd not want to visualize it directly.

The real question we are after is to assess which manufacturer is leading the pack in plug-in vehicles.

There are a number of obstacles in our path. Different makes are being launched at different times, and it takes many months for a new make to establish itself in the market. Thus, comparing one make that just launched with another that has been in the market for twelve months is a problem.

Also, makes are of different vehicle types: compacts, SUVs, sedans, etc. More expensive vehicles will have fewer sales whether they are plug-ins or not.

Thirdly, population grows over time. The analyst would need to establish growth that is above the level of population growth.

Announcement: I'm giving a free public lecture on telling and finding stories via data visualization at NYU on 7/15/2014. More information and registration here.

***

The Economist states the obvious, that the current World Cup is atypically high-scoring (or poorly defended, for anyone who've never been bothered by the goal count). They dubiously dub it the Brazil effect (link).

Perhaps in a sly vote of dissent, the graphic designer came up with this effort:

(Thanks to Arati for the tip.)

The list of problems with this chart is long but let's start with the absence of the host country and the absence of the current tournament, both conspiring against our ability to find an answer to the posed question: did Brazil make them do it?

***

Turns out that without 2014 on the chart, the only other year in which Brazil hosted a tournament was 1950. But 1950 is not even comparable to the modern era. In 1950, there was no knock-out stage. They had four groups in the group stage but divided into two groups of four, one group of three and one group of two. Then, four teams were selected to play a round-robin final stage. This format is so different from today's format that I find it silly to try to place them on the same chart.

This data simply provide no clue as to whether there is a Brazil effect.

***

The chosen design is a homework assignment for the fastidious reader. The histogram plots the absolute number of drawn matches. The number of matches played has tripled from 16 to 48 over those years so the absolute counts are highly misleading. It's worse than nothing because the accompanying article wants to make the point that we are seeing fewer draws this World Cup compared to the past. The visual presents exactly the opposite message! (Hint: Trifecta Checkup)

Unless you realize this is a homework assignment. You can take the row of numbers listed below the Cup years and compute the proportion of draws yourself. BYOC (Bring Your Own Calculator). Now, pay attention because you want to use the numbers in parentheses (the number of matches), not the first number (that of teams).

Further, don't get too distracted by the typos: in both 1982 and 1994, there were 24 teams playing, not 16 or 32. The number of matches (52 in each case) is correctly stated.

***

Wait, the designer provides the proportions at the bottom of the chart, via this device:

I find the legend challenging as well. The presentation should be flipped: look at the proportion of ties within each round, instead of looking at the overall proprotion of ties and then breaking those ties by round.

The so-called "knockout round" has many formats over the years. In early years, there were often two round-robin stages, followed by a smaller knockout round. Presumably the second round-robin stage has been classified as "knockout stage".

Also notice the footnote, stating that third-place games are excluded from the histogram. This is exactly how I would do it too because the third-place match is a dead rubber, in which no rational team would want to play extra-time and penalty shootout.

The trouble is inconsistency. The number of matches shown underneath the chart includes that third-place match so the homework assignment above actually has a further wrinkle: subtract one from the numbers in parentheses. The designer gets caught in this booby trap. The computed proportion of draws displayed at the bottom of the chart includes the third-place match, at odds with the histogram.

***

Here is a revised version of the chart:

A few observations are in order:

The proportion of ties has been slowly declining over the last few Cups.

The drop in proportion of ties in 2014 is not drastic.

While the proportion of ties has dropped in the 2014 World Cup, the proportion of 0-0 ties has increased. (The gap between the two lines shows the ties with goals.)

In later rounds, since the 1980s, the proportion of ties has been fairly stable, between 20 and 35 percent.

Another reason for separate treatment is that the knockout stage has not started yet in 2014 when this chart was published. Instead of removing all of 2014, as the Economist did, I can include the group stage for 2014 but exclude 2014 from the knockout round analysis.

In the Trifecta Checkup, this is Type DV. The data do not address the question being posed, and the visual conveys the wrong impression.

***

Finally, there is one glaring gap in all of this. Some time ago (the football fans can fill in the exact timing), FIFA decided to award three points for a win instead of two. This was a deliberate effort to increase the point differential between winning and drawing, supposedly to reduce the chance of ties. Any time-series exploration of the frequency of ties would clearly have to look into this issue.

Darin Myers at PGi was kind enough to send over an analysis of a chart using the Trifecta Checkup framework. I'm reproducing the critique in full, with a comment at the end.

***

At first glance this looks like a valid question, with good data, presented poorly (Type V). Checking the fine print (glad it’s included), the data falls apart.

Question

It’s a good question…What device are we using the most? With so much digital entertainment being published every day, it pays to know what your audience is using to access your content. The problem is this data doesn’t really answer that question conclusively.

DATA

This was based on Survey data asking respondents “Roughly how long did you spend yesterday…watching television (not online) / using the internet on a laptop or PC / on a smartphone / on a tablet? Survey respondents were limited to those who owned or had access to a TV and a smartphone and/or tablet.

What about feature phones?

Did they ask everyone on the same day, random days, or are some days over represented here?

This is self-reported, not tracked…who accurately remembers their average screen time on each device a day later? I imagine the vast majority of answers were round numbers (30, 45 minutes or 2 hours). This data shows accuracy to the minute that is not really provided by the users.

In fact the Council for Research Excellence found that self-reported screen time does not correlate with actual screen time. “Some media tend to be over-reported whereas others tend to be under-reported – sometimes to an alarming extent.” -Mike Bloxham, director of insight and research for Ball State

VISUAL

The visual has the usual problems with stacked bar charts where it is easy to see the first bar and the total, but not to judge the other values. This may not be an issue based on the question, but the presentation is focusing on an individual piece of tech (smartphones), so the design should focus on smartphones. At the very least, smartphones should be the first column in the chart and it should be sorted by smartphone usage.

My implementation is simply to compare the smartphone usage to the usage of the next highest device. Overall 53% of the time people are using a smartphone compared to something else. I went back and forth on whether I should keep the Tablet category in the Key though it was not the first or second used device. In the end, I decided to keep it to parallel the source visual.

Despite the data problems, I was really interested in seeing the breakdowns in each country by device, so I built the chart below with rank added (in bold). I also built some simple interaction to sort by column when you click the header [Ed: I did not attach the interactive excel sheet that came with the submission]. As a final touch, I displayed the color corresponding to the highest usage as a box to the left of the country name. It’s easy to see that the vast majority of countries use smartphones the most.

***

Hope you enjoyed Darin's analysis and revamp of the chart. The diagnosis is spot on. I like the second revision of the chart, especially for analysts who really want to know the exact numbers. The first redo has the benefit of greater simplicity--it can be a tough sell to an audience, especially when using color to indicate the second most popular device while disassociating the color and the length of the bar.

The biggest problem in the original treatment is the misalignment of the data with the question being asked. In addition to the points made by Darin, the glaring issue relates to the responder population. The analysis only includes people who have at least a smartphone or a tablet. But many people in lesser developed countries do not have either device. In those countries, it is likely that the TV screen time has been strongly underestimated. People who watch TV but do not own a smartphone or tablet are simply dropped from consideration.

For this same reason, the other footnoted comment claiming that the sampling frame accounts for ~70 percent of the global population is an irrelevance.

New York Times columnist Floyd Norris published a set of charts purportedly to show that the housing market in the U.S. is on the mend. Not so quick Floyd.

His theory - originating from an economist at Hanley Wood, a real estate research firm - is that in a recovering market, the share of new home sales by home builders should be higher than the share by banks, as the bank share is associated with foreclosed houses. The data offered are both in aggregate and by regions. I'm particularly interested in the regional chart from a design perspective.

The published chart is the one shown on the left below. I am not a fan of nested bar charts. I don't think there is any justification for treating two data series (here, share by banks and share by builders) differently. Which of the two series should one assign to the fatter bars?

If we slim the fat bars down, we retrieve the more conventional paired bars chart, shown on the right. Among these two, I prefer the paired version.

***

There is a weakness with both versions. The theory rests on the relative share, which is clearer in a stacked presentation as shown on the right.

This presentation also shines the light on a dark corner of Norris's analysis. In every city but Detroit, an unmentioned group of sellers accounts for the majority of home sales! Nowhere in the article did Norris tell readers who those sellers are, and why they are ignored.

In all these charts, I have kept the original order of cities. Before reading further, see if you can tease out the criterion for sorting the cities.

With some effort, you'll learn that the cities are arranged in the order of degree of housing recovery, which is measured by the difference in share: the cities at the top (Houston, Dallas, etc.) have a higher share of builders selling than banks selling.

Ironically, the difference in share is the least emphasized data in a nested bar chart. In fact, how you compute the difference depends on the relative share! When the olive bar is longer than the blue bar, the reader sizes up the white space between the edges of the bars; when the blue bar is longer, though, the reader must look inside the blue area, and compute the interior distance.

The reader can use some help here. Possible fixes include using a footnote, or adding a note informing readers that up implies stronger recovery, or creating a visual separation between those cities in which the share by builders exceeds that by banks, and vice versa.

Here is a dotplot with annotations. The separation between the dots is easily estimated.

***

Recall the theory that in recovering markets, banks account for a lesser share of home sales. The analyst turned this into a metric, by taking the difference in the share by builders from the share by banks.

This metric is highly problematic. The first problem, already discussed, is that there exist more than these two types of sellers, and it is absolutely not the case that if the share by banks goes down, the share by builders goes up.

Another issue is that the structure of the housing market in different cities is probably different. The chart promotes the view that there is a general trend that extends to all markets. In fact, the variation over time within one city should be more telling than the variation across twenty cities of a point in time.

And there is the third strike.

This is a confusion between forward and reverse causation (see Andrew's post here for a general discussion of this important practical issue). The Floyd Norris/Hanley Wood theory expresses a forward causation: if a housing market is recovering, then banks will work through its inventory of foreclosed homes, and account for a decreasing share of home sales.

The analysis addresses the reverse of this relationship. The analyst observes that banks (in some cities) are selling fewer homes, and concludes that the housing market is recovering. Notice that this is a problem of reverse causation: instead of cause -> effect, we have effect -> cause. The rub is that any given outcome has many possible causes. Banks sell fewer homes for many possible reasons, only one of which is a recovering market.

Here are some other possibilities. The banks expect prices to rise in the future, and they are holding on to the inventory. The economy is sputtering and banks are tightening up on mortgage lending, making it harder to sell homes. Instead of selling the homes, the banks decide to destroy the homes to reduce supply and raise prices. The mysterious third group of sellers has put a lot of homes on the market. etc.

In making claims based on observational data, one must conduct side investigations to rule out other causes.

***

From a Trifecta Checkup perspective, this chart addresses an interesting Question. The Visual design has hiccups. The biggest problem is that the Data provide an unsatisfactory answer to the question at hand. (Type DV)

It's here! Many readers have requested a reference to the Junk Charts Trifecta Checkup. I finally found time to write this up. Here is the introduction:

The Junk Charts Trifecta Checkup is a general framework for data visualization criticism. It captures how I like to organize the thinking behind my critique pieces.

The need for such a framework is clear. Opinion pieces on specific data graphics frequently come across as stream of conscience. Proclaiming a chart "mind-blowing" or "worst of the century" isn't worth much if the author cannot articulate why. The state of dataviz criticism has not progressed further than assembling a set of "rules of thumb".

In putting this framework together, I aimed to make it simple to use and broadly applicable.

The Trifecta Checkup framework allows me to classify all dataviz critiques into eight types. A visualization of the eight types is as follows.

Please click here to read the entire post. Also, link to that permanent page for reference.

As usual, clicking on the "Trifecta Checkup" tag brings up all prior posts that apply the concept.

Josh Katz, who did the dialect maps I featured recently, is at it again. He's one of the co-authors of a series of maps (link) published by the New York Times about the fan territorities of major league baseball teams.

Similar to the dialect maps, these are very pleasing to look at, and also statistically interesting. The authors correctly point out that the primary points of interest are at the boundaries, and provide fourteen insets on particular regions. This small gesture represents a major shift from years past, when designers would have just printed an interactivemap, letting readers figure out where the interesting stuff is.

The other interesting areas are the "no-man lands", the areas in which there are no local teams. The map uses the same kind of spatial averaging technology that blends the colors. The challenge here would be the larger number of colors.

I'd have preferred that they have given distinct colors to the teams like the Yankees and the Red Sox that have broader appeal. Maybe the Yankees is the only national team they discovered, since it does have the unique gray color which is very subtle.

I also think it is smart to hide the political boundaries of state, zip, etc. in the maps (unless you click on them).

I'd like to see a separate series of maps: small multiples by team, showing the geographical extent of each team. This is a solution to the domination issue to be addressed below.

***

The issue of co-dominant groups I discussed in the dialect maps also shows up here. Notably, in New York, the Mets are invisible, and in the Bay Area, the Oakland As similarly do not appear on the map.

Recall that the each zip code is represented by the team with the highest absolute proportion of fans. It may be true that the Mets are perennial #2 in all relevant zip codes. Zooming into the Yankee territory, I didn't see any zip code in which Mets fans are more numerous. So this may be the perfect example of what falls through the cracks when the algorithm just drops everything but the top level.

***

Now, in the Trifecta checkup, we want to understand what the data is saying. I have to say this is a bit challenging. The core dataset contains Facebook Likes (aggregated to the zip-code level). It is not even clear what the base of those proportions are. Is it the total population in a zip code? the total Facebook users? the total potential baseball fans?

As I have said elsewhere, Facebook data is often taken to be "N=All". This is an assumption, not a fact of the data. Different baseball teams may have different social-media/Facebook strategies. Different teams may have different types of fans, who are more/less likely to be on Facebook. This is particularly true of cross-town rivals.

Apart from the obvious problem with brands buying or otherwise managing Likes, "Like" is a binary metric that doesn't measure fan fervor. It is a static measure as I don't believe Facebook users manage their list of Likes actively (please correct me if I am wrong about this behavior.)

We are not provided any real numbers, and none of the maps have scales. Unless we see some absolute counts, it is hard to know if the data make sense relative to other measures of fandom, like merchandise and ticket sales. With Facebook data, it is sometimes possible to have too much--in other words, you might find there are more team fans than potential baseball fans or even population in a specific zip code.

It is very likely that Facebook, which is the source of the aggregated data, did not want to have raw counts published. This is par for the course for the Internet giants, and also something I find completely baffling. Here are the evangelizers of privacy is dead, and they stockpile our data, and yet they lock the data up in their data centers, away from our reach. Does that make any sense?

Carl Bialik used to be the Numbers Guy at Wall Street Journal - he's now with FiveThirtyEight. Apparently, he left a huge void. John Eppley sent me to this set of charts via Twitter.

This chart about Citibike is very disappointing.

Using the Trifecta checkup, I first notice that it addresses a stale question and produces a stale answer. The caption below the chart says "the peak times ... seem to be around 9 am and 6 pm." What a shock!

I sense a degree of meekness in usnig "seem to be". There is not much to inspire confidence in the data: rather than the full statistics which you'd think someone at Citibike has, the chart is based on "a two-day sample last autumn". The number of days is less concerning than the question of whether those two autumn days are representative of the year. Curious readers might want to know what data was collected, how it was collected, and the sample size.

Finally, the graph makes a mess of the data. While the black line appears to be data-rich, it is not. In fact, the blue dots might as well be randomly scattered and connected. As you can see from the annotations below, the scale of the chart makes no sense.

Plus, the execution is sloppy, with a missing data label.

***

The next chart is not much better.

The biggest howler is the choice of pie charts to illustrate three numbers that are not that different.

But I have to say the chart raises more questions than it answers. I am not an expert in pregnancy but doesn't a pregnant woman's weight include the weight of the baby she's carrying? So the more weight the woman gains, on average, the heavier is her baby. What a shock!

***

The last and maybe the least is this chart about basketball players in the playoff.

It's the dreaded bubble chart. The players are arranged in a perplexing order. I wonder if there is a natural numbering system for basketball positions (center = #1, etc.), like there is in soccer. Even if there is such a natural numbering system, I still question the decision to confound that system with a complicated ranking of current-year playoff players against all-time players.

Above all, the question being asked is uninteresting, and so the chart is uninformative. A more interesting question to me is whether the best players are playing in this year's playoff. To answer this question, the designer should be comparing only currently active players, and showing the all-time ranks of those players who are playing in the playoffs versus those who aren't.

It's one of those charts that has conceptual appeal but does not do the data justice. As the name implies, the designer has a strong message, that the arctic sea ice volume has dramatically declined over time. This message is there in the chart but the reader has to work hard to find it.

Why doesn't this spider chart work? We can be more precise.

A big problem is the lack of scalability. This chart looks different every year. If you add an extra year to the chart, you either have to increase the density of the years or you have to drop the earliest year.

Years are not circular or periodic so the metaphor doesn't quite work.

Axis labeling is also awkward. Because of the polar coordinates, the axes are radiating so the numbers run up toward the top but run down toward the bottom.

This specific instance of spider chart benefits from the well-behaved data: the between-year variability is much lower than the within-year variability. As a result, the lines don't cross each other much. If the variability from year to year fluctuates a lot, we would have seen a bunch of noodles.

This is a pity because the designer did very well in aligning two corners of the Trifecta Checkup, namely what is the question and what does the data show? It is a great idea to control for month of year, and look at year to year changes. (A more typical view would be to look at month to month changes and plot one line per year.)

This is an example of a chart that does well on one side of the checkup but the failure is that the graph isn't in tune with the data or the question being addressed.

Whenever I see a spider chart, I want to unroll the spiral and see if a line chart is better. Thus:

The dramatic decrease in Arctic ice volume (no matter the month) is clear as day. You can actually read off the magnitude of the drop. (Try doing that in the spider chart, say between 1978 and 1995.)

This chart still has issues, namely too many colors. One can color the lines by season of the year, like this:

Or switch to a small-multiples set up with three lines per chart and one chart per season.

The seasonal arrangement is not arbitrary. You can see the effect of season by looking at side by side boxplots:

The pattern is UP-DOWN-DOWN-UP.

In fact, a side-by-side boxplot of the data provides a very informative look:

The monthly series is obscured in this view, built into the vertical variability, which we can see is quite stable. The idea of controlling for month is to make it irrelevant. This view emphasizes the year on year decline of the entire distribution.

If you're worried that dropping too much information, the data can be grouped by season as before in a small-multiples setup like this:

Regardless of season, the trend is down.

PS. Alberto reminds me of his post about one example of a spider chart (radar chart) that works. Here's the link. It works because the graphical element is more in tune with the data. While the ice cap data has a linear trend over time, the voting data is all about differences in distribution. Also, the designer is expecting readers to care about the high-level pattern, not about the specifics.

If you are like me, that is, you have knowledge in your head of time-seriesline charts, you probably experienced that moment where the bottom fell out and you didn't know which way was up.

This is the double edge of novelty in charts. There should be a very high bar against running counter to convention. Readers do bring their "baggage" to the chart, and the designer should take that into consideration.

Some commentators are complaining about trickery. That may be true. But it's also possible the designer actually thought reversing the direction of the vertical axis made the chart better.

Don't forget about we have another convention: up is good and down is bad. Fewer murders is good and more murders is bad. So why not make it such that a rising line indicates goodness (fewer murders)?

***

Going back to the Trifecta Checkup. This chart has dual problems. We just talked about the syncing between the data and the graphical element.

The other issue is that the data is insufficient to draw conclusions about the underlying question: what explains the shift in number of murders since the late 2000s? This is a complex problem--the chapter in Freakonomics about abortion and crime rate is still instructive, not for the disputed conclusion but for the process of testing various hypotheses. The reduction of the complex causal structure to a single factor is dissatisfying.

The question on the table is motivated by the extraordinary performance of a young baseball player Mike Trout. The early success can be interpreted either as evidence of future potential or as evidence of a future drought. As an analogy, someone wins a lottery. You can argue that the odds are so low that winning again is impossible. Or you can argue that winning once indicates that this person is "lucky" and lucky people might win again.

The chart shows the proportion of players who performed even better after the initial success, given the age at which they first broke out. One way to read this chart is to mentally replace the bubbles with dots (or columns), and then interpret the size of the bubbles as the statistical significance of the corresponding probability estimate. The legend says number of players, which is the sample size, which governs the error bar associated with that particular number.

This bubble chart is no different from others: it is impossible to judge the relative sizes of bubbles. Even though the legend provides us two reference points (a nice enough idea on its own), it is still impossible to know, for example, what proportion of players did better later in life when they first peaked at age 24. The bubble for age 23 looks like it's exactly five players but I still cannot figure out how many players the adjacent bubble represents.

The designer should have just replaced each bubble with an error bar, and the chart is instantly more readable. (I have another version of this at the end of the post.)

The rest of the design elements are clean and well-done, particularly use of notes to point out interesting aspects of the data.

***

From a Trifecta checkup perspective, I am uncertain about how the nature of the data used to investigate the interesting question posed above.

Readers should note the concept of "early success" and "later success" are not universally defined. The author here selects two proxies. Reaching an early peak is equated to "batters first posting 15+ WAR over two seasons". Next, reversion to the mean is defined as not having a better two-year span subsequent to the aforementioned early peak.

Why two seasons? Why WAR and not a different metric? Why 15 as the cutoff? These are all design decisions made while working with the data.

One can make reasonable arguments to justify the above two questions. A bigger head-scratcher relates to the horizontal axis, which identifies the first time a player reaches his "early peak," as defined above. The way the above chart is set up, it is almost preordained to exhibit a negative slope. The older the player is when he reaches the first peak, the fewer years left in his playing career to try to emulate or surpass that feat.

This last point is nicely illustrated in the next chart of the article:

This chart is excellent on many levels. It's not clear, though, whether it says anything other than aging.

***

Near the end of the post, the author rightfully pointed out that "there’s not really enough data to demonstrate this effect". Going back to the first chart, it appears that no single bubble contains a double-digit count of players. So every sample size is between one and, say, seven. We should be wary of conclusions based on so little data.

It's always fun to find examples of the Law of Small Numbers, courtesy of Kahneman & Tversky.

***

Here is a sketch of how I might re-make the first chart (I made up data; see the note below).

While making this chart, I realize another issue with the original bubble chart. When the proportion of players improving on their early peak is zero percent, how many players did not make it is quite hidden. In the revised chart, this data is clearly seen (look at age 22).

Note: I wonder if I totally missed the point of the original chart.... I actually had trouble eyeballing the data so I ended up making up numbers. The bubble at age 22 looks like it should stand for 5 players and yet it sits at precisely 50%, which would map to 2.5 players. If I assume the 22 bubble to be 4 players, then I don't know what the 26 bubble is. If it is 4 players also, then the minimum non-zero proportion should have been 1/4, but the bubble clearly lies below 25%. If it is 3 players, the minimum non-zero proportion is 1/3, which should be at 33%.