Alberto links to a nice Propublica chart on average annual spend per dialysis patient on ambulances by state. (link to chart and article)

It's a nice small-multiples setup with two tabs, one showing the states in order of descending spend and the other, alphabetical.

In the article itself, they excerpt the top of the chart containing the states that have suspiciously high per-patient spend.

Several types of comparisons are facilitated: comparison over time within each state, comparison of each state against the national average, comparison of trend across states, and comparison of state to state given the year.

The first comparison is simple as it happens inside each chart component.

The second type of comparison is enabled by the orange line being replicated on every component. (I'd have removed the columns from the first component as it is both redundant and potentially confusing, although I suspect that the designer may need it for technical reasons.)

The third type of comparison is also relatively easy. Just look at the shape of the columns from one component to the next.

The fourth type of comparison is where the challenge lies for any small-multiples construction. This is also a secret of this chart. If you mouse over any year on any component, every component now highlights that particular year's data so that one can easily make state by state comparisons. Like this for 2008:

You see that every chart now shows 2008 on the horizontal axis and the data label is the amount for 2008. The respective columns are given a different color. Of course, if this is the most important comparison, then the dimensions should be switched around so that this particular set of comparisons occurs within a chart component--but obviously, this is a minor comparison so it gets minor billing.

***

I love to see this type of thoughtfulness! This is an example of using interactivity in a smart way, to enhance the user experience.

The Boston subway charts I featured before also introduce interactivity in a smart way. Make sure you read that post.

Also, I have a few comments about the data analysis on the sister blog.

I was traveling quite a lot recently, and last week, read the Wall Street Journal cover to cover for the first time in a while. I am happy to report that there are many more data graphics than I remember of past editions.

The following chart illustrating findings of an FCC report on broadband speeds has a number of issues (a related blog post containing this chart can be found here):

The biggest problem with the visual elements is the lack of linkage between the two components. The two charts should be connected: the one on the right presents ISP averages by the broadband technology while the one on the left presents individual ISP results. Evidently, the designer treats the two parts as separate.

If that was the intention, there are two decisions that create confusion for readers. First, the charts use two different but related scales. Just add 100% to the scale of the left chart and you get the scale of the right chart. There really is no need for two different scales.

Secondly, orange and blue are used in both charts but for different purposes. In the left chart, orange denotes all ISPs whose actual speeds were below their advertised speeds. In the right chart, orange denotes ISPs using DSL technology.

I also do not understand why some ISP names are bolded. The bolded companies include several cable providers (but not all), several DSL providers (but not all), one fiber provider and no satellite.

Lastly, I'd prefer they stick to one of "advertised" and "promised". I do like the axis labels, saying "faster than" and "slower".

***

One challenge of the data is that the FCC report (here) does not provide a mathematical linkage between the technology averages and the ISP data. We know that 91% for DSL is the average of the ISPs that use DSL as shown on the left of the chart, but we don't know the weights (relative popularity) of each ISP so we can't check the computation.

But if we think of the average by technology as a reference point to measure individual ISPs, we can still use the data, and more efficiently, such as in the following dot plot where the vertical lines indicate the appropriate technology average:

(The cable section should have come before the DSL section but you get the idea.)

The key message of the chart, in my mind, is that DSL providers as a class over-promise and under-deliver.

Reader Joe D. tipped me about a nice visualization project by a pair of grad students at WPI (link). They displayed data about the Boston subway system (i.e. the T).

The project has many components, one of which is the visualization of the location of every train in the Boston T system on a given day. This results in a very tall chart, the top of which I clipped:

I recall that Tufte praised this type of chart in one of his books. It is indeed an exquisite design, attributed to Marey. It provides data on both time and space dimensions in a compact manner. The slope of each line is positively correlated with the velocity of the train (I use the word correlated because the distances between stations are not constant as portrayed in this chart). The authors acknowledge the influence of Tufte in their credits, and I recognize a couple of signatures:

For once, I like how they hide the names of the intermediate stations along each line while retaining the names of the key stations. Too often, modern charts banish all labels to hover-overs, which is a practice I dislike. When you move the mouse horizontally across the chart, you will see the names of the unnamed stations.

The text annotations on the right column are crucial to generating interest in this tall, busy chart. Without those hints, readers may get confused and lost in the tapestry of schedules. If you scroll to the middle, you find an instance of train delay caused by a disabled train. Even with the hints, I find that it takes time to comprehend what the notes are saying. This is definitely a chart that rewards patience.

Clicking on a particular schedule highlights that train, pushing all the other lines into the background. The side panel provides a different visual of the same data, using a schematic subway map.

Notice that my mouse is hovering over the 6:11 am moment (represented by the horizontal guide on the right side). This generates a snapshot of the entire T system shown on the left. This map shows the momentary location of every train in the system at 6:11 am. The circled dot is the particular Red Line train I have clicked on before.

This is a master class in linking multiple charts and using interactivity wisely.

***

You may feel that the chart using the subway map is more intuitive and much easier to comprehend. It also becomes very attractive when the dots (i.e., trains) are animated and shown to move through the system. That is the image that project designers have blessed with the top position of their Github page.

However, the image above allows us to see why the Marey diagram is the far superior representation of the data.

What are some of the questions you might want to answer with this dataset? (The Q of our Trifecta Checkup)

Perhaps figure out which trains were behind schedule on a given day. We can define behind-schedule as slower than the average train on the same route.

It is impossible to figure this out on the subway map. The static version presents a snapshot while the dynamic version has moving dots, from which readers are challenged to estimate their velocities. The Marey diagram shows all of the other schedules, making it easier to find the late trains.

Another question you might ask is how a delay in one train propagates to other trains. Again, the subway map doesn't show this at all but the Marey diagram does - although here one can nitpick and say even the Marey diagram suffers from overcrowding.

***

On that last question, the project designers offer up an alternative Marey. Think of this as an indiced view. Each trip is indiced to its starting point. The following setting shows the morning rush hour compared to the rest of the day:

I think they can utilize this display better if they did not show every single schedule but show the hourly average. Instead of letting readers play with the time scale, they should pre-compute the periods that are the most interesting, which according to the text, are the morning rush, afternoon rush, midday lull and evening lull.

The trouble with showing every line is that the density of lines is affected by the frequency of trains. The rush hours have more trains, causing the lines to be denser. The density gradient competes with the steepness of the lines for our attention, and completely overwhelms it.

***

There really is a lot to savor in this project. You should definitely spend some time reviewing it. Click here.

Also, there is still time to sign up for my NYU chart-making workshop, starting on Saturday. For more information, see here.

Darin Myers at PGi was kind enough to send over an analysis of a chart using the Trifecta Checkup framework. I'm reproducing the critique in full, with a comment at the end.

***

At first glance this looks like a valid question, with good data, presented poorly (Type V). Checking the fine print (glad it’s included), the data falls apart.

Question

It’s a good question…What device are we using the most? With so much digital entertainment being published every day, it pays to know what your audience is using to access your content. The problem is this data doesn’t really answer that question conclusively.

DATA

This was based on Survey data asking respondents “Roughly how long did you spend yesterday…watching television (not online) / using the internet on a laptop or PC / on a smartphone / on a tablet? Survey respondents were limited to those who owned or had access to a TV and a smartphone and/or tablet.

What about feature phones?

Did they ask everyone on the same day, random days, or are some days over represented here?

This is self-reported, not tracked…who accurately remembers their average screen time on each device a day later? I imagine the vast majority of answers were round numbers (30, 45 minutes or 2 hours). This data shows accuracy to the minute that is not really provided by the users.

In fact the Council for Research Excellence found that self-reported screen time does not correlate with actual screen time. “Some media tend to be over-reported whereas others tend to be under-reported – sometimes to an alarming extent.” -Mike Bloxham, director of insight and research for Ball State

VISUAL

The visual has the usual problems with stacked bar charts where it is easy to see the first bar and the total, but not to judge the other values. This may not be an issue based on the question, but the presentation is focusing on an individual piece of tech (smartphones), so the design should focus on smartphones. At the very least, smartphones should be the first column in the chart and it should be sorted by smartphone usage.

My implementation is simply to compare the smartphone usage to the usage of the next highest device. Overall 53% of the time people are using a smartphone compared to something else. I went back and forth on whether I should keep the Tablet category in the Key though it was not the first or second used device. In the end, I decided to keep it to parallel the source visual.

Despite the data problems, I was really interested in seeing the breakdowns in each country by device, so I built the chart below with rank added (in bold). I also built some simple interaction to sort by column when you click the header [Ed: I did not attach the interactive excel sheet that came with the submission]. As a final touch, I displayed the color corresponding to the highest usage as a box to the left of the country name. It’s easy to see that the vast majority of countries use smartphones the most.

***

Hope you enjoyed Darin's analysis and revamp of the chart. The diagnosis is spot on. I like the second revision of the chart, especially for analysts who really want to know the exact numbers. The first redo has the benefit of greater simplicity--it can be a tough sell to an audience, especially when using color to indicate the second most popular device while disassociating the color and the length of the bar.

The biggest problem in the original treatment is the misalignment of the data with the question being asked. In addition to the points made by Darin, the glaring issue relates to the responder population. The analysis only includes people who have at least a smartphone or a tablet. But many people in lesser developed countries do not have either device. In those countries, it is likely that the TV screen time has been strongly underestimated. People who watch TV but do not own a smartphone or tablet are simply dropped from consideration.

For this same reason, the other footnoted comment claiming that the sampling frame accounts for ~70 percent of the global population is an irrelevance.

New York Times columnist Floyd Norris published a set of charts purportedly to show that the housing market in the U.S. is on the mend. Not so quick Floyd.

His theory - originating from an economist at Hanley Wood, a real estate research firm - is that in a recovering market, the share of new home sales by home builders should be higher than the share by banks, as the bank share is associated with foreclosed houses. The data offered are both in aggregate and by regions. I'm particularly interested in the regional chart from a design perspective.

The published chart is the one shown on the left below. I am not a fan of nested bar charts. I don't think there is any justification for treating two data series (here, share by banks and share by builders) differently. Which of the two series should one assign to the fatter bars?

If we slim the fat bars down, we retrieve the more conventional paired bars chart, shown on the right. Among these two, I prefer the paired version.

***

There is a weakness with both versions. The theory rests on the relative share, which is clearer in a stacked presentation as shown on the right.

This presentation also shines the light on a dark corner of Norris's analysis. In every city but Detroit, an unmentioned group of sellers accounts for the majority of home sales! Nowhere in the article did Norris tell readers who those sellers are, and why they are ignored.

In all these charts, I have kept the original order of cities. Before reading further, see if you can tease out the criterion for sorting the cities.

With some effort, you'll learn that the cities are arranged in the order of degree of housing recovery, which is measured by the difference in share: the cities at the top (Houston, Dallas, etc.) have a higher share of builders selling than banks selling.

Ironically, the difference in share is the least emphasized data in a nested bar chart. In fact, how you compute the difference depends on the relative share! When the olive bar is longer than the blue bar, the reader sizes up the white space between the edges of the bars; when the blue bar is longer, though, the reader must look inside the blue area, and compute the interior distance.

The reader can use some help here. Possible fixes include using a footnote, or adding a note informing readers that up implies stronger recovery, or creating a visual separation between those cities in which the share by builders exceeds that by banks, and vice versa.

Here is a dotplot with annotations. The separation between the dots is easily estimated.

***

Recall the theory that in recovering markets, banks account for a lesser share of home sales. The analyst turned this into a metric, by taking the difference in the share by builders from the share by banks.

This metric is highly problematic. The first problem, already discussed, is that there exist more than these two types of sellers, and it is absolutely not the case that if the share by banks goes down, the share by builders goes up.

Another issue is that the structure of the housing market in different cities is probably different. The chart promotes the view that there is a general trend that extends to all markets. In fact, the variation over time within one city should be more telling than the variation across twenty cities of a point in time.

And there is the third strike.

This is a confusion between forward and reverse causation (see Andrew's post here for a general discussion of this important practical issue). The Floyd Norris/Hanley Wood theory expresses a forward causation: if a housing market is recovering, then banks will work through its inventory of foreclosed homes, and account for a decreasing share of home sales.

The analysis addresses the reverse of this relationship. The analyst observes that banks (in some cities) are selling fewer homes, and concludes that the housing market is recovering. Notice that this is a problem of reverse causation: instead of cause -> effect, we have effect -> cause. The rub is that any given outcome has many possible causes. Banks sell fewer homes for many possible reasons, only one of which is a recovering market.

Here are some other possibilities. The banks expect prices to rise in the future, and they are holding on to the inventory. The economy is sputtering and banks are tightening up on mortgage lending, making it harder to sell homes. Instead of selling the homes, the banks decide to destroy the homes to reduce supply and raise prices. The mysterious third group of sellers has put a lot of homes on the market. etc.

In making claims based on observational data, one must conduct side investigations to rule out other causes.

***

From a Trifecta Checkup perspective, this chart addresses an interesting Question. The Visual design has hiccups. The biggest problem is that the Data provide an unsatisfactory answer to the question at hand. (Type DV)

Through twitter, Antonio Rinaldi sent the following chart that accompanied a New York Timespiece talking about the CPI (inflation index). The article concerns a very important topic--that many middle- to lower-income households have barely any saving after spending on necessities--and only touches upon the issue raised by this chart, which is that the official CPI is an average of prices of a basket of goods, and there is much variability in the price changes of different categories of goods.

I cover this subject in much greater detail in Chapter 7 of Numbersense (link). There are many reasons why the official inflation rate seems to diverge from our own experiences. One of the reasons is that we tend to notice and worry about price increases but we don't notice or take for granted price decreases. In the book, I cover the fascinating subject of the psychology of remembering prices. Obviously, this is a subject of utmost importance if we are to use surveys to understand perceived prices.

The price of a T-shirt (unbranded) has remained the same or may have declined in the last decades. Besides, the chart reveals that phone and accessories, computers and televisions have all enjoyed deflation over the last decade. Actually, much of the "deflation" is due to a controversial adjustment known as "hedonics". This is to claim that part of any price change is attributed to product or technology improvements. So, if you pay the same price today for an HDTV as in the past for a standard definition TV, then in reality, the price you paid today is cheaper than that in the past.

That adjustment is reasonable only to a certain extent. For instance, my cell phone company stuffs my plan with hundreds of unused and unusable minutes so on a per-minute basis, I am sure prices have come down substantially but on a per-used-minute basis, I'm not so sure.

***

Let's get to what we care about on this blog... the visual. There is one big puzzle embedded in this chart. Look at the line for televisions. It dipped below -100 percent! Like Antonio, many readers should be scratching their heads--did the price of television go negative? did the hedonic adjustment go bonkers?

As an aside, I don't like the current NYT convention of hiding too many axis labels. What period of time is this chart depicting? You'd only find out by reading the label of the vertical axis! I mentioned something similar the other day.

The key to understanding a chart like this is to learn what is being plotted. The first instinct is to think the change in prices over time. A quick glance at the vertical axis label would correct that misunderstanding. It said "Change in prices relative to a 23% increase in price for all items, 2005-2014".

This label is doing a lot of work--probably too much for its inconspicious location and unbolded, uncolored status.

Readers have to know that the official CPI is a weighted average of changes in prices of a specified basket of goods. Some but not all of the components are being graphed.

Then readers have to understand that there is an index of an index. The prices of each "item" (i.e. category or component of the CPI) are indiced to 1984 levels. So the prices of television is first re-indiced to 2005 as the baseline. This establishes a growth trajectory for television. But this is not what is being depicted.

Here is what the chart would have looked like if we plotted the growth of the television index (red), the apparel index and the all-items index (blue).

The blue line reflects the 23% average increase in prices in that 10-year period. Notice that the red line does not exhibit any weirdness--television prices have gone down by 90 percent. It's not negative.

What the designer tried to do is to index this data another time. Think of pulling the blue line down to the horizontal axis, and then see what happens to the gray and red lines.

***Now, even this index on an index should not present a mathematical curiosity. If all items moved to 1.23 while apparel moved to 1.10, you might compute 110%/123% which is roughly 0.. You'd say the apparel index is 90% of the way to where the all-item index went. Similarly for TVs, you would compute 10%/123% which is 0.08. That would be saying the TV index ended up 8% of where the all-item index landed.

That still doesn't yield -100%. The clue here is that the baseline is zero percent, not 100, not 1.0, etc. So if there is an item that moved in sync with all items, its trajectory would have been horizontal at zero percent. That means that the second index is not a division but a subtraction. So for TV, it's -90% - 23% = -113%. For apparel, it's +10%-23% = -13%.

Even though I reverse-engineered the chart, I don't understand the reason for using subtractions rather than division for the second layer of indicing. It's strange to me to add or subtract the two indices that have different baseline quantities.

Here is the same chart but using division:

I usually avoid telescoping indices. They are more trouble than it's worth. Here is an old post on the same subject.

Carl Bialik used to be the Numbers Guy at Wall Street Journal - he's now with FiveThirtyEight. Apparently, he left a huge void. John Eppley sent me to this set of charts via Twitter.

This chart about Citibike is very disappointing.

Using the Trifecta checkup, I first notice that it addresses a stale question and produces a stale answer. The caption below the chart says "the peak times ... seem to be around 9 am and 6 pm." What a shock!

I sense a degree of meekness in usnig "seem to be". There is not much to inspire confidence in the data: rather than the full statistics which you'd think someone at Citibike has, the chart is based on "a two-day sample last autumn". The number of days is less concerning than the question of whether those two autumn days are representative of the year. Curious readers might want to know what data was collected, how it was collected, and the sample size.

Finally, the graph makes a mess of the data. While the black line appears to be data-rich, it is not. In fact, the blue dots might as well be randomly scattered and connected. As you can see from the annotations below, the scale of the chart makes no sense.

Plus, the execution is sloppy, with a missing data label.

***

The next chart is not much better.

The biggest howler is the choice of pie charts to illustrate three numbers that are not that different.

But I have to say the chart raises more questions than it answers. I am not an expert in pregnancy but doesn't a pregnant woman's weight include the weight of the baby she's carrying? So the more weight the woman gains, on average, the heavier is her baby. What a shock!

***

The last and maybe the least is this chart about basketball players in the playoff.

It's the dreaded bubble chart. The players are arranged in a perplexing order. I wonder if there is a natural numbering system for basketball positions (center = #1, etc.), like there is in soccer. Even if there is such a natural numbering system, I still question the decision to confound that system with a complicated ranking of current-year playoff players against all-time players.

Above all, the question being asked is uninteresting, and so the chart is uninformative. A more interesting question to me is whether the best players are playing in this year's playoff. To answer this question, the designer should be comparing only currently active players, and showing the all-time ranks of those players who are playing in the playoffs versus those who aren't.

I highlighted the columns for 1993 and 1996. Visually, the height of one column is twice that of the other column. And yet the axis labels tell us that the difference is 65% versus 62.5%.

***

The reason for the start-at-zero rule is to avoid exaggerating meaningless differences.

To judge whether a change is meaningful or not, in time-series data like this, we have to use history to understand the general variability in college enrollment rates. Based on what we can see in this data (about 20 years), the college enrollment rate hovers between 60 and 70 percent. There is no data between 0 and 60 percent. Those are irrelevant values for this data series. This is why starting at zero is counterproductive.

Here is the line chart starting at zero:

This display has the unintended effect of squashing meaningful changes over time by inserting a lot of empty space below the line.

The question on the table is motivated by the extraordinary performance of a young baseball player Mike Trout. The early success can be interpreted either as evidence of future potential or as evidence of a future drought. As an analogy, someone wins a lottery. You can argue that the odds are so low that winning again is impossible. Or you can argue that winning once indicates that this person is "lucky" and lucky people might win again.

The chart shows the proportion of players who performed even better after the initial success, given the age at which they first broke out. One way to read this chart is to mentally replace the bubbles with dots (or columns), and then interpret the size of the bubbles as the statistical significance of the corresponding probability estimate. The legend says number of players, which is the sample size, which governs the error bar associated with that particular number.

This bubble chart is no different from others: it is impossible to judge the relative sizes of bubbles. Even though the legend provides us two reference points (a nice enough idea on its own), it is still impossible to know, for example, what proportion of players did better later in life when they first peaked at age 24. The bubble for age 23 looks like it's exactly five players but I still cannot figure out how many players the adjacent bubble represents.

The designer should have just replaced each bubble with an error bar, and the chart is instantly more readable. (I have another version of this at the end of the post.)

The rest of the design elements are clean and well-done, particularly use of notes to point out interesting aspects of the data.

***

From a Trifecta checkup perspective, I am uncertain about how the nature of the data used to investigate the interesting question posed above.

Readers should note the concept of "early success" and "later success" are not universally defined. The author here selects two proxies. Reaching an early peak is equated to "batters first posting 15+ WAR over two seasons". Next, reversion to the mean is defined as not having a better two-year span subsequent to the aforementioned early peak.

Why two seasons? Why WAR and not a different metric? Why 15 as the cutoff? These are all design decisions made while working with the data.

One can make reasonable arguments to justify the above two questions. A bigger head-scratcher relates to the horizontal axis, which identifies the first time a player reaches his "early peak," as defined above. The way the above chart is set up, it is almost preordained to exhibit a negative slope. The older the player is when he reaches the first peak, the fewer years left in his playing career to try to emulate or surpass that feat.

This last point is nicely illustrated in the next chart of the article:

This chart is excellent on many levels. It's not clear, though, whether it says anything other than aging.

***

Near the end of the post, the author rightfully pointed out that "there’s not really enough data to demonstrate this effect". Going back to the first chart, it appears that no single bubble contains a double-digit count of players. So every sample size is between one and, say, seven. We should be wary of conclusions based on so little data.

It's always fun to find examples of the Law of Small Numbers, courtesy of Kahneman & Tversky.

***

Here is a sketch of how I might re-make the first chart (I made up data; see the note below).

While making this chart, I realize another issue with the original bubble chart. When the proportion of players improving on their early peak is zero percent, how many players did not make it is quite hidden. In the revised chart, this data is clearly seen (look at age 22).

Note: I wonder if I totally missed the point of the original chart.... I actually had trouble eyeballing the data so I ended up making up numbers. The bubble at age 22 looks like it should stand for 5 players and yet it sits at precisely 50%, which would map to 2.5 players. If I assume the 22 bubble to be 4 players, then I don't know what the 26 bubble is. If it is 4 players also, then the minimum non-zero proportion should have been 1/4, but the bubble clearly lies below 25%. If it is 3 players, the minimum non-zero proportion is 1/3, which should be at 33%.