Can Cellphone Metadata Hurt Your Privacy and Save America?

By Brian Fung

May 13, 2013

When Stone Librande took breaks from playtesting the new SimCity, he noticed something mesmerizing. Pausing from the construction frenzy that defines the rest of the game, SimCity’s lead designer discovered that his citizens’ schedules often created beautiful, shifting patterns of motion.

“These flows of Sims build up at certain times and then the buses and streets are empty and then they build back up again,” he told Venue. “There's something really hypnotic about that when you play the game. I find myself not doing anything but just watching in this mesmerized state—almost hypnotized—where I just want to watch people drive and move around in these flows. At that point, you're not looking at any one person; you're looking at the aggregate of them all. It's like watching waves flow back and forth like on a beach.”

Increasingly in the real world, researchers want to be able to emulate that same experience. There’s a lot to be learned by examining the activities of whole urban populations, and one way we’re beginning to do that is by grabbing metadata off of people’s mobile phones. That includes information about nearby cell towers, the timestamps of calls and text messages, and other knowledge that’s bundled into “call detail records” collected by wireless carriers. In fact, with those (anonymized) records, it’s now possible to track a single user’s movements across the course of a day using no more than four time- and location-specific data points.

That technique doesn’t expose personally identifiable information such as names and addresses—at least, not on its own. But cross-referenced with enough outside information, those records could still lead a determined sleuth to specific individuals.

Luckily, scientists have been working on a way to prevent that. If the problem with anonymized data is that it can be combined with still more data to reveal a person’s identity, then the solution is to change the data deliberately—not so much that it loses its research value, but by just enough so that people looking at the data can’t confidently make the connection between the anonymized data and the non-anonymized data.

This approach is called using “synthetic” records, and it relies on a process known as “differential privacy”:

Injecting noise includes deliberately altering the aggregated home and work locations to reduce the reliance on any one individual’s data. Likewise, the aggregated call times are changed to mask any individual’s contribution. Taken together, such tweaks to the data would throw off any efforts to align databases.