Location Data Is Not Anonymous

They know where you live.

It is not much consolation if they do not yet know your name, is it?

It made headlines this month when an intentional data release from a fitness tracking software company disclosed a long list of national security secrets, along with the fitness habits of thousands of specific named individuals.

The data released was supposedly anonymous. Unfortunately, as we are now learning, location data is not effectively anonymous if it is sufficiently precise or if it is even a little bit cumulative.

To understand how direct the link between location and identity can be, consider this hypothetical question: who is that person sitting at Rick’s desk day after day? It is not much of a riddle, is it? This simple question points to the ease with which location data can be converted to a specific identity. To make matters worse, the question is not quite as hypothetical as it might seem. GPS data, when averaged over a short time span, may be precise enough to separate one desk from the next. Mobile advertisers who know your GPS location may not need any more information to guess who you are.

As information researchers pointed out years ago, it becomes easier to identify people if you know two locations. For almost all of us, the places where we spend the most time are home, work, school, homes of close friends and family members, and favorite commercial establishments. If someone merely knows what building a person is in at night, they may have just about identified the person. Given two or three regular locations, you are talking about one specific person. Years ago, researchers were able to de-anonymize most data records they studied knowing only a person’s year of birth and home and work ZIP codes. ZIP code, though, is only a broad measure of location. Substitute the more precise GPS location, and anonymity disappears completely.

I have seen this effect myself when I have commuted by train. When I got on the train in the morning, a hundred people might be getting on the same train at the same station. When I change trains, though, perhaps only ten people from that first station are changing trains with me. By the time I reach my final station, I will probably be the only passenger from the original station making that exact trip. Even in the limited domain of a rail network, then, place is identity.

Problems arise because location is not treated as the sensitive data that it is. The exercise map that showed what houses people live in, their favorite jogging trails, and the top-secret military bases where they work became possible only because no one gave any thought to the security implications of compiling and releasing that data. We will need to arrive at a better collective sense of the possible meaning of location data and the steps needed to protect it.