When Social Media Mining Gets It Wrong

When Social Media Mining Gets It Wrong

In their experiment, the team was able to match about one-third of subjects to the correct profiles. From there, they made other predictions. Seventy-five percent of the time, they correctly predicted subjects’ interests. They correctly predicted the first five digits of volunteers’ social security numbers about 16 percent of the time given two tries. (Accuracy increased with more attempts.)

But this means that two-thirds of the time, they did not identify people correctly. And those who were correctly identified were still incorrectly matched 25 percent of the time to particular personal interests, and more than 80 percent of the time to the wrong social security number.

Acquisti expects facial recognition technology to continue improving in coming years, and he asks what will happen once it is considered good enough to be trusted most of the time. It could be nightmarish for those who are misidentified. “There’s nothing that we, as individuals, can control,” he says.

Other researchers are exploring the reliability of mining social data. At Defcon, a hacking conference in Las Vegas last weekend, a group called the Online Privacy Foundation presented results of its “Big Five Experiment,” a study that aimed to match volunteers’ personality traits to qualities on Facebook profiles. After administering a personality test to volunteers, they mined profiles to identify key characteristics.

The Online Privacy Foundation researchers found a positive correlation between people whose personalities tended toward openness and those whose Facebook profiles were loaded with more information: longer lists of interests, longer bios, and more discussion of money, religion, death, and negative emotions. They also found a positive correlation between “agreeable people”—defined as “being compassionate, cooperative, having the ability to forgive and be pragmatic”—and Facebook statuses that were written in longer sentences, that discussed positive emotions, or had relatively more comments, friends, and photos. However, in both cases, the correlations were relatively weak.

The researchers conclude that a Facebook profile is hardly a reliable source of information. “The key point is to remember that this is a bet,” says the foundation’s cofounder Chris Sumner. “The message is that, yes, there is a link, but don’t use it on its own for critical decisions.”

Acquisti and Sumner say that new government policies may be needed to protect individuals from excessive data mining and from the misuse of their information. This could involve setting standards of accuracy for organizations to abide by. “The defining question of our time,” Acquisti says, “is how do we, as a society, deal with big data?”