Universal Identifier

At birth, your data trail began. You were given a name, your height and weight were recorded, and your Social Security number was created merely to keep track of your earnings. A few years later, you were enrolled in day care, you received your first birthday party invitation, and you were recorded in a census. Today, you have bank accounts and credit cards, and a smart phone that always knows where you are.

Perhaps you post family pictures on Facebook; tweet about politics; and reveal your changing interests, worries, and desires in thousands of Google searches. Sometimes you share data intentionally, with friends, strangers, companies, and governments. But vast amounts of information about you are collected with only perfunctory consent—or none at all. Soon, your entire genome may be sequenced and shared by researchers around the world along with your medical records, flying cameras may hover over your neighborhood, and sophisticated software may recognize your face as you enter a store or an airport.

As the Internet has developed, the concept of privacy is changing, if not eroding, quickly. In order to “exist” in the Internet, you have to share data about yourself, your friends, family and even more. Privacy, as we once knew it is gone. And in this information/surveillance world, it may turn out that George Orwell’s vision of the future was far more ideal than we ever imagined. In his dystopia, he had one Big Brother; in our world, we have many such “Brothers” and we don’t know who most of them are.

Massive Data Collection

Massive data collection by businesses and governments calls into question traditional methods for protecting privacy, underpinned by two core principles: notice, that there should be no data collection system whose existence is secret, and consent, that data collected for one purpose not be used for another without user permission. But notice, designated as a fundamental privacy principle in a different era, makes little sense in situations where collection consists of lots and lots of small amounts of information, whereas consent is no longer realistic, given the complexity and number of decisions that must be made.

The news that rocked much of the privacy world recently comes from a study conducted by a number of data scientists from around the world. Most of the privacy laws in the U.S. encourage anonymization as a key means of privacy protection. However, in a study appearing in this Science–part of the journal’s “Privacy in a Data-Driven World” special issue– data scientists showed they can identify a person with more than 90 percent accuracy by looking at just four purchases…. and only three purchases if the price is included. As an example, the researchers wrote about looking at data from September 23 and 24 and who went to a bakery one day and a restaurant the other. Searching through the data set, they found there could be only person who fits the bill — they called him Scott. The study states: “and we now know all of his other transactions, such as the fact that he went shopping for shoes and groceries on 23 September, and how much he spent.”

They were able to accomplish this feat even after companies ‘anonymized’ the transaction records, i.e. saying they wiped away names and other personal details. Using both the credit card and transaction information the researchers identified 90 percent of the individuals in the data set. This study blew away the notion that anonymizing data creates some semblance of privacy. Their research found that adding just a glimmer of information about a person from an outside source was enough to identify him or her in the trove of financial transactions they studied.

Correlation Attacks

This study substantiates privacy advocates concerns about “correlation Attacks.” The study points out that hacks of personal data became famous in 2014 when the New York City Taxi and Limousine Commission released a data set of the times, routes, and cab fares for 173 million rides. Passenger names were not included. But armed with time-stamped photos of celebrities getting in and out of taxis—there are websites devoted to celebrity spotting—bloggers, after deciphering taxi driver medallion numbers, easily figured out which celebrities paid which fares. The MIT scientists were able to demonstrate how relatively simple it was for knowledgeable data crunchers to duplicate such personal data hacks, like the New York City Taxi hack.

Thanks to revelations about National Security Agency programs from Edward Snowden, the former NSA contractor who stole an estimated 1.7 million secret files, Congress, the courts, President Barack Obama and the public are wrestling with the question of how much information the government should be allowed to collect. How big does a pile of mundane data about ordinary citizens have to be before the people who are supposedly keeping us safe turn into an even BIGGER, Big Brother?

This and other important discussions about privacy has begun and some excellent resources are now available for people trying to get their arms around privacy’s changing looks.