Remember David Wilkins, former US ambassador to Canada? Well, if you do a Google search on him, this map from whitepages.com comes up near the top, showing the distribution of telephone directory listings matching his name:

Since they apparently generate these automatically for most any name, I thought of doing my own. But, I figured that I would take another opportunity to increase the fame and internet profile of Mr. Wilkins. Can’t pass that up.

The colors are certainly less than ideal – as with so many of the maps seen here, there’s a mismatch between an orderable data set (number of listings) and an un-orderable symbology (the colors chosen to represent those numbers). Though, I suppose one can see a weak progression in the colors, depending on your perspective. But it’s still far from a good match to the data. Running from a light to a dark blue would be perfect. It would also be more friendly to people with color vision impairments.

It would also be nice if I didn’t have to assume that white means zero listings, since it could also reasonably mean “no data available.” Troubling is the fact that some of the small states are filled in with white on the main map, but on the inset, where they are enlarged, they are given a color. The inset needs to be consistent with the main map – else it makes it harder to understand that the inset is, in fact, a zoomed-in version of the main map.

A sacrifice made with a classed choropleth map like this is that you lose some precision in getting the numbers off of it. Look at the states in light blue – they all have anywhere from 1 to 11 listings for “David Wilkins.” Grouping states like this is perfectly reasonable, to help reduce the number of colors used on the map and make it easier for someone to pick out one distinct color and match it to the legend. Some ambiguity is necessary as part of this process. But, look at Texas – the only state colored in dark red. It apparently has anywhere from 43 to 53 listings. It’s the only state in its class – why is the exact number not specified?

The classification scheme in general is a bit odd. There are a few big goals you want to try and go for when deciding how to group your states. One is to minimize intra-class differences – that is, keep the class sizes small. You don’t want a class that goes 1 to 11 listings, and one that goes 12 to 500 listings. The second one is way too broad. Another is to try and make each class roughly the same size, which this map has a problem with. There’s one state in the dark red class, two in the orange class, and twenty-five in the light blue class. A third goal for class breaks is to try and have class breaks that are relatively even in number – as an astute reader points out below, the class breaks change in size just a bit, though they’re roughly pretty even, so I think they hold up pretty well. There are a few other goals, but I’ll leave it at that. As you might expect, it’s hard to fulfill all the goals at once, but the severity of the difference between 1 red state and 25 light blue ones is still pretty bad. The two lowest classes cover most of the country, and the two upper classes cover only three states. It makes those three states stand out, but more than they should. There’s not a large, unusual, and worth-pointing-out difference between the upper and lower end states, to my mind.

These data should probably be normalized, as well. Consider Texas again: a lot of people named David Wilkins live there. This is probably because a lot of people live there in the first place – it’s one of the most populous states. More populated places will probably have more people named David Wilkins. Likewise, you can’t find anyone named David Wilkins in places like Wyoming or South Dakota, because approximately no one lives in those states. The pattern shown by this map is highly correlated to the population distribution of the United States. It does not show whether or not people from Texas are more likely than people from Wisconsin to be named David Wilkins. Instead of making a map of how many telephone listings there are in each state for David Wilkins, the author(s) should plot how many listings there are for David Wilkins per million inhabitants of the state. Then you would find out that Delaware has 8.1 listings for David Wilkins per million inhabitants, vs. only 2.2 for Texas. The name is also particularly popular in South Carolina, which state the Ambassador calls home.

I find it a bit odd that they have region names listed for New England and the Mid Atlantic, but not the rest of the country. Also, I was under the impression that Maine was part of New England.

One Nice Thing: Those inset maps to the right sure are handy.

With that, I will leave off today’s effort to make this blog the #1 item on a Google search for David Wilkins.

Like this:

Related

8 Responses to “Where Does David Wilkins Live?”

Can you comment at all on the inconsistencies in the numerical ranges given? I’m looking at the second-place digits (the “ones” value) for the numbers in the lower end of the ranges, and that column reads 1 2 2 3 3. The “ones” value for the numbers at the upper end of the range read 1 1 2 2 3. But the number of possible values in those ranges are not consistent. The red offers the possibility of 11 values, whereas the teal offers 10 possible values. (Assuming that no states have a partial David Wilkins.) So would the next range be 54-63 (in keeping with the numerical pattern) or 54-64,in keeping with the number of possible values in the ranges denoted by apple green and pale blue? And how long will it take for the pattern to roll over – that is, how many more colors/ranges would we need until we’re back to a “ones” value of 1 in both the upper and lower limit of the ranges?

I’m inclined to dissent on a couple of the supposed cartastrophic aspects of this map. The only thing I fault it for is its use of white as you mentioned: the unlabeled white=0 and white on the main map for states that are not actually white in the insets. Oh, and the not-so-great color scheme.

The classification scheme I don’t have a problem with, apart from perhaps the poor labeling of the single-member highest class. It’s just an equal interval classification. The class ranges aren’t even because the whole range of 1-53 isn’t divisible by 5. Hence three classes of 11 and two of 10 (11-10-11-10-11). Granted, equal intervals don’t provide a clear picture of the subtleties of the spatial distribution of a phenomenon, but they do give a pretty unbiased look at it. There is some value in seeing that the value is very low in a lot of the places and pretty high in just a couple of spots. The trickiness of finding a balance is well illustrated by trying to map world countries by population: you can either have 80% of countries in the lowest class, or you can distribute them more evenly but entirely lose the fact that India has, like, 800 million more people than the next most populous country. (That’s oversimplifying the options, I acknowledge.) How you classify it depends on what you’re trying to convey.

Normalizing the data is another point I would dispute, though to do so is to be a bit more of a carto-rebel. Normalizing data, as you know, is a pretty strong rule in choropleth mapping, but I have come to agree with our pal Zach Johnson’s challenge of that notion, namely that normalizing by area is the only thing that makes sense as a hard rule. If not area, then it’s fair to toss the normalization rule out the window if we want. Without normalization the map here does deliver on its promise to show the “Distribution of David Wilkins Listings across the United States,” and in fact to normalize it by anything other than area would actually be showing something else. But this is opening a whole different can of worms. Talk about thinking too much!

All that said, I’d bet large sums of money that the developers of these maps didn’t give a second of thought to those issues. Maybe it worked out for them… or maybe it didn’t.

You do make some very good points. Maybe I won’t criticize maps when I’m sick, anymore, as it seems to impair my thought processes and encourage me to say dissent-generating things.

I actually usually disagree with the whole normalize-by-population thing, as you and Zach do. But in this case I was willing to repeat the party line – though I did toy with the notion of making an argument that they should show the population density of David Wilkinses. Maybe it’s because I just found it boring, which is an unfair criticism. What I think might be better, actually, would be to put a normalized and un-normalized map side-by-side on the page.

I believe it should remain un-normalized…the end user of this particular image isn’t looking for the likelihood for the of a person in a state to be named “X”, in any averaging scheme, nor the density. The end user is saying OK, I’m looking for X – which listing will I pull up that shows the most X’s. Now if you wanted to break apart the map (and related data) into further subgroups to facilitate such an end user request, this is another story.

Remember that this is from the Whitepages, so users aren’t there to see how many David Wilkins’ there are based on the population of the state. They just want to know where David Wilkins might be, so I think the un-normalized works in this case. I’m not sure really the population of the state is really that relevant is it?

And also remember that this map is automatically produced based on your query, so having slightly irregular ranges, and a range for a single value could be forgiven for the fact that there is no human involved in the production of each map, and while it may work really well for one query (John Smith maybe?) it may not work so well for another. I guess this really is the difference between rapidly produced map for quick information, and a cartographic work-of-art :) or am I being too kind?

I looked up additional names, which turns out to be really fun and addicting. But, I found out that DC is simply marked as a red dot on all maps. Another problem in the map, especially since red is one of the colors used for classification.