Menu

On Salutary Obfuscation

Last week, a map which I made about swearing on Twitter gained its fifteen minutes of Internet fame. I heard a lot of comments on the design, and one of the things that many of the more negative commenters (on sites other than mine) were displeased by was the color scheme. It was, as they said, very hard to distinguish between the fifteen different shades of red used to indicate the profanity rate. This complaint was probably a good thing, because I did not particularly want readers to tell the shades of red apart and trace them back to a specific number.

In designing the map, I took a couple of steps which made it more difficult for people to get specific data off of it. Before I can explain why I would want to do this, first you need a quick, general background on how the map was made.

This is a map based on a very limited sample of tweets. Twitter will give you a live feed of tweets people are making, but they will only give you about 10% of them, chosen randomly. On top of that, I could only use tweets that are geocoded, which means the user had to have a smart phone that could add a GPS reading to the tweet. A third limitation was that I could only use tweets which were flagged as being in English, being as I don’t know any curse words in other languages besides Latin. Finally, there were occasional technical glitches in collecting the data, which caused the program my colleagues and I were using to stop listening to the live feed from time to time. If you add those four limitations up, it means that I made use of somewhere between about 0.5% and 1% of all tweets going on in the US during the time period analyzed. Possibly not a strongly representative sample, but still a large one at 1.5 million data points.

In that limited sample, I searched for profanities. This is based on my subjective assessment of what may be a profanity (as many readers sought to remind me), and the simple word searches I did may have missed more creative uses of language. Once I had the number and location of profanities, I could start to do some spatial analysis. I didn’t want to make a map of simply the number of profanities, because that just shows where most people live, not how likely they are to be swearing. So, I set up some calculations in my software so that each isoline gives the number of profanities in the nearest 500 tweets, giving a rate of profanity instead of a raw total. Unfortunately, for places that are really sparsely populated, like Utah, the algorithm had to search pretty far, sometimes 100 miles, to get 500 tweets, meaning the lines you see there are based partially on swearing people did (or, didn’t) in places far away. If I hadn’t done this, then there would be too few data points in Utah and similar places to get a good, robust rate (counting the # of profanities in 10 tweets is probably not the most representative sample, we need something much bigger to be stable). Maybe I should have just put “no data” in those low areas, but that’s another debate.

So, the map is based on a limited sample of tweets, and the analysis requires some subjective judgments of what’s a swear word, and then some heavy smoothing and borrowing of data from areas nearby in order to get a good number. What all that means is: you shouldn’t take this as a really precise assessment of the swearing rate in your city. If I had chosen to look for different words, or if the Twitter feed had sent a different random 10% of tweets, or if I had chosen to search profanities in the nearest 300, rather than 500 tweets, then the map would end up looking different. Peaks would drift around some and change shape. But my feeling is that the big picture would not change significantly. Under different conditions, you’d still see a general trend of more profanity in the southeast, a low area around Utah, etc. The finer details of the distribution are the most shaky.

Okay, back to my main point about trying to make it difficult to get specific numbers. What I wanted readers to do is focus on that big picture, which I think is much more stable and reliable. And so I made some decisions in the design that were intended to gently push them toward my desired reading. First off is that color scheme, which has only small changes between each level of swearing, which makes it hard to look at your home and tell if it’s in the zone for 12 or 13 or 14 profanities per 100 tweets. What’s important is that you know your home is at the high end. Whether it measured at 12 or 14 doesn’t matter, because that number is based on a lot of assumptions and limitations, and is likely to go up or down some if I collected on a different day. The color scheme makes the overall patterns pretty clear — bright areas, dark areas, medium areas, which is where I want the reader to focus. It’s weaker in showing the details I would rather they avoid.

The other thing I did was to smooth out the isolines manually. The isolines I got from my software had a very precise look to them. Lots of little detailed bends and jogs, which makes it look like I knew exactly where the line between 8 and 9 profanities per 100 tweets was. It lends an impression of precision which is at odds with the reality of the data set, so I generalized them to look broader and more sweeping. The line’s exact location is not entirely reflective of reality, so there’s no harm in moving it a bit, since it would shift around quite a bit on its own if the sample had been different.

Original digital isolines

Manually smoothed in Illustrator

This is a subtler change, but I hope it helped make the map feel a bit less like 100% truth and more like a general idea of swearing. Readers have a rather frightening propensity for assuming what they see in a map is true (myself included), and I’d rather than not take the little details as though they were fact.

Had I to do it over again I probably would have made it smaller (it’s 18″ x 24″). Doing it at 8.5″ x 11″ would have taken the small details even further out of focus and perhaps kept people thinking about regional, rather than local patterns. Maybe I shouldn’t have used isolines at all, but rather a continuous-tone raster surface. There are many ways to second-guess how I made the map.

Anyway, the point I mostly want to make about all of this is that it’s sometimes preferable to make the design focus on the big picture, and to do so you may need to obfuscate the little things. Certainly, though, a number of people were unhappy that I impaired their preferred reading of the map. People like precision, and getting specific information about places. But I didn’t feel like I had the data to support that, and I would be misleading readers if I allowed them to do so easily.

I think you make a very fair point, and it’s something I’ve thought about doing since then. I think, at the time, I was still interested in fooling around with filled isolines as a form of practice. But, if I had known mass numbers of people were going to see the map, I might have reconsidered.

Thanks for the kind words. I tend to think of cartography as simply a particular variety of graphic design, and I try and teach my students to think of themselves as artists.

The analysis and projection were done in ArcMap and then sent over to Adobe Illustrator for colors, labeling, and other visual elements. This is the standard workflow that we mostly use at UW-Madison — it’s what I was taught as a student here, so it’s what I know best.

Last cartography class I took was about 10 years ago with Arc View on ancient computers and no art software in sight.

I am very familiarized with the layout tools on ArcGIS 9.3 but I can’t create something as nice as your map. I especially hate that I cannot give lovely faded shadows to borders. I am not very good with Illustrator but I am trying to expand my skills. So I will give Illustrator a go and get a book or two.

Thank you for the information, but especially for your blogs. They help with design ideas and learning what mistakes not to do. Although I have to be honest that I got a bit depressed after discovering your Cartastrophy blog. Made me rethink my ‘already published/cannot be changed’ maps and cringe.

I think the map demonstrates the usefulness of social data in the spatial realm and that is absolutely important. However, I have to disagree – we have no way of knowing whether or not the big picture would change significantly or not. The interpretation is like taking the “means of means” and then the means again. This is such a limited sample and subjective use its like listening to a Lipitor commercial and it’s adverse affects – you tune out after the third or fourth symptom. Nevertheless, I admire your spatial prowess and hope you continue forcing the brain to think!

What an excellent — and telling — map. I am curious about your choice of color gradient. On my screen at least, it all comes across as quite dark. I wonder if a gray scale would serve the eye better, or perhaps a larger “range” of red.