Suppression: The Mystery of the Vanishing Data

I am a reformed database administrator who now works in the Human Dimensions program doing GIS work with data about people. While I work with data about people, I'm not normally allowed to talk TO people since I am a former database administrator.
I started with my GIS work at a small college in the hinterlands of upstate New York before deciding to move to the warmer climate of South Carolina, getting a Masters of Environmental Science at the College of Charleston as a consequence. The College of Knowledge is where I really got my GIS chops, working on things from mapping bobcats on a barrier island to creating an interactive campus map. I was lucky enough to get an internship at the NOAA Office for Coastal Management straight out of grad school, and thus began the circuitous path from spatial analyst to general tech geek to database administrator to spatial analyst. Now I spend my days clicking buttons and looking at maps about things like ocean economics and social vulnerability.

Jobs, GDP, and other economic indicators are all the rage these days. You can’t turn on the TV or radio without hearing about some sort of jobs report or analysis of how fast (or slow) the economy is growing. With this huge focus on economics, more and more people are requesting economic data for their counties and states, only to find that sometimes there are these nasty little codes where data should be. These codes signify that data are suppressed. Suppression. Just the sound of the word makes me feel deeply…annoyed. But what is suppression? And why are we talking about economic data on a geospatial blog? Why thank you for asking!

In simple terms, “suppression” means that even though the data exist, federal and state laws prevent us from reporting them. These laws are designed to protect the privacy of individual businesses. Say you work for the Acme Co, and I work for Widgets, Inc. If I know how much you pay your employees, I could have a competitive advantage when we are both trying to hire the same person. To avoid giving anyone an unfair advantage, the law requires government agencies to hide or “suppress” any data that could be used to learn things about a specific business–when a statistic represents only a few businesses or is dominated by the data of one or two very large businesses. Suppressions usually occur in small areas, like counties, or in very narrowly defined classes of business like Seafood Processing Plants. The three most common reasons for suppressing data are:

a statistic represents a very small number of businesses

a statistic is dominated by data from one or two very large businesses

statistic could be used to compute the value of another suppressed statistic

Still confused? It’s like this– let’s say you’re watching an NFL game on TV. In a crowd like that, it’s hard to see who any one person is. But if they zoom in to show someone in the crowd who is dressed like a chicken, you could be shocked to see that your daughter is standing next to that guy, holding his hand. When she gets home, your daughter will probably be wishing they had used that little fuzzy circle to blur out her face on the telecast. Suppression is like a fuzzy circle– where you expect to find data, you’ll see a character like “D” (indicating that there’s a “disclosure” issue) or an impossibly wrong number (like -9999 employees).

The take-home message is clear: federal and state laws do not allow the publication of statistics that could be used to harm businesses. Federal agencies are very serious about this. You get to play a lot of basketball in jail, but the food is terrible.

We do the same thing with maps sometimes. In Google Maps, many military installments and other “sensitive” locations do not allow Google Street View, and some of them have data that are heavily pixelated so they cannot be seen clearly when zoomed in. Those data could be considered “suppressed.” You know the data are there, but you just can’t get it at the same scale as the other data. An example of this can be seen in the image below.

So, after learning all of this, you’re probably thinking that using economic data is a lost cause, but have heart! Totals for a smaller geography like a county often include the suppressed data (although there is some tricky math used so you can’t just subtract from the total to get the number for the suppressed sector). In addition, the larger the geography, the less suppression is needed. States have much less suppression than counties, and so on up the chain. Below is an example from the ENOW Explorer:

So, when you see suppression within the data, don’t fret, but dig a little deeper to find out what may be going on. You should be able to find out what sectors are being suppressed, and through a little investigatory work, maybe figure out what is causing the suppression. A lot of times, local knowledge will provide the answer to the two really big shipping companies or the only shipyard in the area. For a more in-depth look at the different types of suppression, check out this article:http://www.incontext.indiana.edu/2008/july-august/2.asp