Data Snatchers! The Booming Market for Your Online Identity

Big Data Puts Privacy in a New Light

In the Target case, future parents were served with highly relevant ads and offers, and the retailer found a new way to reach its customers and pump up sales. No problem, right?

Wrong, say privacy advocates. The warehousing and analysis of so much data, and so many types of data, might lead the curators of the databases to infer things about us that we never intended to share with anybody. The data might even predict our future behaviors—what even we don’t yet know that we’re going to do.

The “predictive analysis” of Big Data is often called “inductive analysis” in academic and re­­search circles because it induces large meanings from small sets of facts or markers.

“Inductive analysis concerns itself with singular things that can seem to be innocuous, but that when combined with other innocuous data points—like your favorite soda—can create meaningful predictors of behaviors,” says Solon Barocas, a New York University graduate student who is working on a dissertation about inductive analysis.

Target, for instance, didn’t even need to know the names of the women it ended up sending pregnancy ads to. It simply delivered a target ad to a group of addresses with the right demographics and a common pattern of past purchases. A process so totally cold and machinelike being used to predict something so human, so personal, like pregnancy, is creepy.

In the next ten years, marketers and advertisers will spend more and more on Big Data science, focusing on finding analysts who can discern patterns in large pools of data. Big Data analysis positions are the new hot jobs, and the people who will fill them are a new breed, with new skills. “These people need traditional statistics and computer-science backgrounds, but also some coding and basic hacking skills,” Barocas says.

Big Data analysts don’t just help target ads for products. A political campaign might do a survey of 10,000 people to learn about their demographics and political choices. It might buy more data about those people from one of the large data sellers, like Acxiom or Experian, then search for unique markers in the data that would predict future political leanings.

But those predictors may bear no ob­­vious relation to what they predict, Ba­­rocas says. “For instance, the analysts might find that something odd—like what fashion-magazine subscription people hold—is a strong predictor of the kind of candidate they’re likely to vote for.”

In future elections and ballot initiatives, billions will be spent on making inferences about voters, and about the issues, candidates, and political ad content that they might be sympathetic to. The campaign with the best personal data and the best analysts may win. That seems like a very undemocratic way to choose our policies and leaders.

Experts say that in the future, predictive analysis will advance to the point where it can tease out information about people’s lives and preferences using far more, and far more subtle, data points than were used in the Target case. The inductive models that some companies al­­ready use are huge, containing up to 10,000 different variables—each with an assigned weight based on its ability to predict.

But Big Data analysis may have a built-in public relations problem, be­­cause its way of predicting human behavior seems to have little to do with human behavior. Unlike traditional analysis, which seeks to predict future preferences or be­­haviors based on past ones, the field’s inductive analysis concerns itself only with patterns in the numbers.

Will They Stop?: Browsers like Firefox 11 and Internet Explorer 9 and 10 allow you to tell websites (via an HTML message sent to the Web server) that you don’t want to be tracked. Unfortunately, not all websites and online advertisers have committed to honoring this message. And many sites merely stop serving targeted ads to the browsers of users who send a “do not track” request, but continue to collect the user’s personal data. After Target “targeted” baby ads at women it thought were pregnant, the women and their families criticized the company’s tactics. They were creeped out by the ads be­­cause Target’s inference about them could not be mapped to any piece of data that they had already provided. Even though Target was correct in its inferences, it was simply not intuitive that the purchase of cotton balls and lotion would predict that the buyer was pregnant and would soon be buying diapers.

More than anything else, this new, mathematical method of analysis may force us to look at our privacy and the way we manage our personal data in a whole new light. After all, it’s unsettling to know that hundreds of unrelated bits of our data can be pulled together from a hundred different sources (perhaps verified by fingerprinting technology like BlueCava’s) and analyzed to reveal numeric patterns in our behavior and preferences.

“Even the smallest, most trivial piece of information might be strung together with other pieces of information in a pattern that is sufficient enough to infer something about you, and that’s a challenging world to live in because it upsets our basic intuitions about discretion,” Barocas says.

Transparency, Inclusion Might Help Everyone

When Target realized its baby-products ads were getting a negative re­­sponse, it didn’t pull the ads; instead, it elected to hide them among un­­related and less-targeted ads when showing them to pregnant women. Rather than asking female customers if they were interested in special offers for baby products, the company chose to infer the answer in secret.

And that lack of transparency may be the single biggest objection to consumer tracking and targeting today. Advertisers are spending millions to combine, transmit, and analyze personal data to help them infer things about consumers that they would not ask directly. Their practices with regard to personal data remain hidden, and they’re ac­­ceptable only because people don’t know about them.

Such tracking and targeting also feels arrogant. Consumers may not mind being marketed to, but they don’t want to be treated as if they were faceless numbers to be manipulated by uncaring marketers. Even the term “targeting” betrays a not-so-friendly attitude toward consumers.

Ironically, advertisers might be far more successful if they pulled back the curtain and included consumers in the process. It’s well known that the personal data in the databases of marketers and advertisers is far from completely accurate.

Maybe, as several people I talked to for this story pointed out, the best way to collect accurate data about consumers is to just ask them. And if an advertiser is hesitant to ask for a certain piece of personal data, the advertiser shouldn’t infer it.

“What our organization is trying to work out is whether or not there’s a way to [collect personal data] where the user knows what’s happening and companies [get] their data not by stalking [users] but by asking them,” says Kaliya Hamlin of the Personal Data Ecosystem Consortium.

"We’re saying there’s a tremendous opportunity for businesses to tap into all that data by doing it in a way that involves and empowers the consumer." --Kaliya Hamlin, Personal Data Ecosystem ConsortiumIt might sound something like this, Hamlin says: “You tell us your income and your age and some of your interests, and we promise to use this information to present you with relevant content, [such as] an ad that matches your interests.”

Internet Needs to Grow Up

Still, many people—on both the privacy and advertising sides of the fence—believe there is room both for consumer privacy and for Web advertisements and content targeting using personal data. But the veil of secrecy around the use of personal data would have to be lifted.

For that to happen, many believe, everybody in the personal data economy must be more realistic about the economics of the Internet. Advertising, in one form or another, pays the bill for all things free online. Everything that website publishers, content creators, and app developers give away online is paid for with advertising—advertising that is targeted by using consumers’ personal data.

Consumers are complicit in the growth of the personal data economy because we have come to expect lots of free services online. From the Internet’s earliest days, we’ve always ex­­pected a level of anonymity—but the more free services we use, the more personal data we must give away, and the less privacy and control over our data we have. It’s up to us to find our own comfort zone between those two ideals, but we need information and transparency to make that choice.

The online advertising industry needs to become much more transparent about the ways it collects and uses our personal data. If it did so, we might be more inclined to believe its claim that carefully targeted ads actually help us by making Web content more relevant and less spammy.

If a website publisher or social network is offering a “free” service in ex­­change for the user’s personal data, the site should be very clear about that exchange. The online advertising industry should give people options—a choice between “free and tracked” or “paid and not tracked,” for in­­stance. That idea is nothing new; it’s very similar to the free, ad-based services that also offer an ad-free premium service.

It’s not a zero-sum game, where either privacy or targeting wins outright. Advertisers won’t stop using personal data to target ads. And few consumers will quit using Facebook or other sites that collect personal data after they read this article. We can’t expect complete privacy and anonymity online, but advertisers and marketers must understand where we expect privacy.

The challenge now is for everyone involved—consumers, advertisers, In­­ternet companies, and regulators—to understand how the personal data economy really works.

Only then can we start getting busy developing some rules of the road that balance the business needs of advertisers with the privacy needs of consumers.