Term and Conditions: Personal Privacy In The Age of Big Data

Every time we make use of the free and infinite knowledge offered by the Internet, our digital footprint is aggregated in a distant cloud somewhere to be either monetized as valuable information for corporations or studied to garner a deeper understanding of our social world. Every time we download a new app, we press “I Agree” just as soon as we are presented with the specific user agreement. Agreeing with the user terms offers us connectivity, convenience, and value. Researchers and developers are now exposing how our personal data is not as private or anonymous as users expect. To ensure online privacy, many developers across the globe are now launching initiatives that promote ownership of personal data.

We are on the verge of what many call a “Big Data Revolution.” Although the concept of big data has been around for a while, what makes it valuable today is the increased ability of the technologists to mine reality from it—i.e. leverage it to better understand our social world.

Big data “revolutionaries” are primarily concerned with mediating the risks and shortcomings in user privacy agreements. Recently, researchers have launched new initiatives to ensure our online privacy. Such initiatives are critical for the future of this “revolution” because they promise the privacy of individual users while also easing legal accountability for other people’s personal data.

MIT Media Lab researchers Sandy Pentland, a pioneer in the field of big data, and Yves-Alexandre Montjoye, the founder of openPDS—an open personal data store—are helping to innovate the future of personal data protection. Researchers at the lab have discovered the extent to which our user agreements cannot guarantee the anonymization of our digital identities. Montjoye conducted a study to quantify the difficulty in identifying individuals in a supposedly anonymized database. “We wanted to characterize the amount of steps needed if I wanted to access an anonymous database and re-identify you,” explained Montjoye in a recent interview with the Tufts Observer.“ When it comes to location data, it is surprisingly low… Four approximated places and times where you were is enough for someone to re-identify 95 percent of the people in the anonymous database.”

How should individuals manage their personal data if they have no concrete idea what data is being collected, where it is going, and who has the ability to access it? Researchers like Montjoye have answered this by creating a more efficient and transparent system of data exchanging. Montjoye’s company, openPDS, has constructed a platform through which we can limit the amount of data that we expose through user agreements. SafeAnswers, an extension of openPDS, organizes personal data into a storage through which individuals have the power to decide where their data goes and how it becomes used. “The idea of [openPDS Safe Answers] is you connect to your data, it exists in your own data store, and when you give access to a third party, the third party can use the data to compute something and then only the result of the computation is retained by the third party,” said Montjoye. “If an app wants to know if you are in a certain city or not, it will query openPDS, it will compute the data and only receive an answer that relates to the question.”

In short, openPDS is an open personal data store that functions as a filter between the digital data you generate and the third parties who access your data. It is able to match the data that is most relevant for the app with something called metadata. Metadata is implicit data that is gathered from pieces of users’ personal data. “Metadata is like breadcrumbs,” explains Montjoye. “It is not the content, rather the trace you leave behind.” Leveraging metadata prevents information about your specific identity from becoming integrated in the collection of your data. If you use a credit card to purchase something, the database only collects information about when, where and how the purchase took place, ignoring your personal identity. “They say metadata is not content,” says Montjoye. “Metadata is extremely useful for research at a societal level but it also allows you to infer a lot about on individuals. And this is why giving metadata to the individual and making sure who uses it and for what is so important.”

Big data has the potential to transform not only the future of apps but also healthcare, city planning, education, banking and other initiatives that deliver value back to us. Although its value is largely based on how data miners leverage the data, the technology world is increasingly concerned with using data to optimize our understanding of ourselves. Using geospatial data, for example, allows researchers to map out global mobility flows to track the spread of epidemics, such as malaria. Healthmap, a website dedicated to inform the public on disease outbreaks, filters and collects data from online content to monitor epidemics; it was among the few health data miners to identify and report the spread of Ebola, doing so days before the World Health Organization.

Researchers like Montjoye posit that as we shape a society dependent on the flow of personal data, it is critical that we develop trust-building mechanisms: platforms for data protection, user privacy, and data accountability. Integrating openPDS to the revolution can facilitate efficient and honest communication between the individual and third parties. With the establishment of initiatives such as openPDS, the control of personal data returns to where it should have been all along: in the hands of the individual users themselves.