Some U.S. demographic data at zipcode level conveniently in R

I chuckled when I read your recent “R Sucks” post. Some of the comments were a bit … heated … so I thought to send you an email instead.

I agree with your point that some of the datasets in R are not particularly relevant. The way that I’ve addressed that is by adding more interesting datasets to my packages. For an example of this you can see my blog post choroplethr v3.1.0: Better Summary Demographic Data. By typing just a few characters you can now view eight demographic statistics (race, income, etc.) of each state, county and zip code in the US. Additionally, mapping the data is trivial.

I haven’t tried this myself, but assuming it works . . . that’s great to be able to make maps of American Community Survey data at the zipcode level!

When we looked at databases for purchase, we saw that all the zip code level material was interpolated or extrapolated and that the reliability was misleadingly precise. I think that’s a huge problem in lots of data analysis: the appearance of precision which hides an entire process that has generated numbers with apparent precision but which actually include substantial error that gets buried in results until you test something that needs this level of precision to be more than apparent. I like to use this kind of idea to point out that “black swans” are everywhere and that what we label as such are just ways of saying there was hidden error, that it’s hidden variability which only pops out when something tests that area. But then I loved when I’d get a projection with a set increase in insurance premiums repeating because then I could ask “how much of your net depends on that stability?” It was really easy to hide stuff, pathetically simple even, but what bothered me was the stuff being hidden from me! Talk about forking paths!

Many of your readers will probably know this, but I’d feel remiss if someone didn’t point out that technically the Census releases data into “Zip Code Tabulation Areas”, which are areal approximations of actual USPS zip codes, which are only assigned to points.