Facebook users who click "like" on a variety of cultural subjects reveal a surprisingly large amount of information about themselves even if they've taken steps to tighten up their privacy settings.

A recently published study by researchers at Cambridge University in the UK and Microsoft Research, used an automated analysis of 58,000 volunteers' Facebook "likes" to make highly accurate predictions about a person's private and very sensitive personal attributes.

The researchers developed a model that could predict whether a man was homosexual 88 percent of the time, and 75 percent of the time for women; ethnic origin (95 percent), gender (93 percent), religion (82 percent), political affiliation (85 percent), if they use addictive substances (75 percent), and relationship status (67 percent).

Margaret Weigel, writing on Journalist's Resource, noted that clicking "Like" on popular subjects such as "Britney Spears" or "Desperate Housewives" were among signs of a homosexual orientation.

The model was less accurate when attempting to predict the length of the parents' marriage (60 percent). "Individuals with parents who separated have a higher probability of liking statements preoccupied with relationships, such as 'If I'm with you then I'm with you. I don't want anyone else.'"

Foremski's take

Predictive models such as the ones used by the researchers in this study become even more accurate as more data is collected.

Easy access to such highly sensitive information could be used by employers, landlords, government agencies, educational institutes, and private organizations in ways that discriminate and punish individuals. And there's no way to fight it.

Big data is growing into a massive threat to individual well being in society. There is no difference between big data and Big Brother when it comes to commercial interests.

Big data is the Stasi of our online worlds

There are many "silent listeners" in social networks that collect people's "Likes" and other online behaviors so that the information can be sold discretely to third parties. Facebook, Google, and all other social networks also collect such behavioral information.

While the companies say that their behavioral big data is stripped of users' names, it is possible to use other databases such as electoral records, demographic information, and location data, to identify individuals by name.

It's essentially a secret dossier on more than a 1 billion social network users.

While this dossier is fragmented at the moment, sophisticated new technologies will soon make it trivial to pull together a massive amount of sensitive private data on every individual who interacts with the internet in any way.

Your phone records, or information about events and parties you attend, could implicate you in the future if they show a connection with people that are later identified as drug dealers, criminals, terrorists, or maybe even paedophiles.

Big data gradually accumulates a cloud of suspicion around you simply by association.

Deleting bad links

Google, for example, already assesses the quality of a website by who links to it. If you have lots of what Google considers to be low-quality, spammy site back links, it will downgrade your website's all-important PageRank and bury it deep within its index.

Think about how such methodology could be applied to determine the "TrustRank" of an individual in Google's world. People can't erase past links to friends and associates now considered "low quality" or possibly even criminal.

Welcome to the future obligations of your present life. Will people start purging their online social circles of unsavory characters? Or people they think might turn to criminal activities?

Big data knows you better than you know you

Big data technologies currently under development will be able to make highly accurate predictions about nearly every important aspect of your existence: Your health, your lifespan, even your sanity.

Big businesses loves big data because it helps them manage their risks — it's what corporations do best.

I was at a recent Cisco event featuring their futurists. One of them talked about people's FICA scores, which determine their ability to get a mortgage, and developing a type of healthcare "FICA" score for each person as a way of determining their ability to get healthcare insurance.

I pointed out that it was a brilliant idea for cutting healthcare costs, since it would likely result in mass hospital closures as companies only insured people with a low risk of needing their services.

The future benefits of big data are stacked firmly against the individual.