March 11, 2013

This is a graphic from the You Are What You Like Facebook app. Credit: David Stillwell, University of Cambridge

New research, published today in the journal PNAS, shows that surprisingly accurate estimates of Facebook users' race, age, IQ, sexuality, personality, substance use and political views can be inferred from automated analysis of only their Facebook Likes - information currently publicly available by default.

In the study, researchers describe Facebook Likes as a "generic class" of digital record - similar to web search queries and browsing histories - and suggest that such techniques could be used to extract sensitive information for almost anyone regularly online.

Researchers at Cambridge's Psychometrics Centre, in collaboration with Microsoft Research Cambridge, analysed a dataset of over 58,000 US Facebook users, who volunteered their Likes, demographic profiles and psychometric testing results through the myPersonality application. Users opted in to provide data and gave consent to have profile information recorded for analysis.

Facebook Likes were fed into algorithms and corroborated with information from profiles and personality tests. Researchers created statistical models able to predict personal details using Facebook Likes alone.

Models proved 88% accurate for determining male sexuality, 95% accurate distinguishing African-American from Caucasian American and 85% accurate differentiating Republican from Democrat. Christians and Muslims were correctly classified in 82% of cases, and good prediction accuracy was achieved for relationship status and substance abuse – between 65 and 73%.

But few users clicked Likes explicitly revealing these attributes. For example, less that 5% of gay users clicked obvious Likes such as Gay Marriage. Accurate predictions relied on 'inference' - aggregating huge amounts of less informative but more popular Likes such as music and TV shows to produce incisive personal profiles.

Even seemingly opaque personal details such as whether users' parents separated before the user reached the age of 21 were accurate to 60%, enough to make the information "worthwhile for advertisers", suggest the researchers.

While they highlight the potential for personalised marketing to improve online services using predictive models, the researchers also warn of the threats posed to users' privacy.

They argue that many online consumers might feel such levels of digital exposure exceed acceptable limits - as corporations, governments, and even individuals could use predictive software to accurately infer highly sensitive information from Facebook Likes and other digital 'traces'.

The researchers also tested for personality traits including intelligence, emotional stability, openness and extraversion.

While such latent traits are far more difficult to gauge, the accuracy of the analysis was striking. Study of the openness trait – the spectrum of those who dislike change to those who welcome it – revealed that observation of Likes alone is roughly as informative as using an individual's actual personality test score.

Some Likes had a strong but seemingly incongruous or random link with a personal attribute, such as Curly Fries with high IQ, or That Spider is More Scared Than U Are with non-smokers.

When taken as a whole, researchers believe that the varying estimations of personal attributes and personality traits gleaned from Facebook Like analysis alone can form surprisingly accurate personal portraits of potentially millions of users worldwide.

They say the results suggest a possible revolution in psychological assessment which – based on this research – could be carried out at an unprecedented scale without costly assessment centres and questionnaires.

"We believe that our results, while based on Facebook Likes, apply to a wider range of online behaviours." said Michal Kosinski, Operations Director at the Psychometric Centre, who conducted the research with his Cambridge colleague David Stillwell and Thore Graepel from Microsoft Research.

"Similar predictions could be made from all manner of digital data, with this kind of secondary 'inference' made with remarkable accuracy - statistically predicting sensitive information people might not want revealed. Given the variety of digital traces people leave behind, it's becoming increasingly difficult for individuals to control.

"I am a great fan and active user of new amazing technologies, including Facebook. I appreciate automated book recommendations, or Facebook selecting the most relevant stories for my newsfeed," said Kosinski. "However, I can imagine situations in which the same data and technology is used to predict political views or sexual orientation, posing threats to freedom or even life."

"Just the possibility of this happening could deter people from using digital technologies and diminish trust between individuals and institutions – hampering technological and economic progress. Users need to be provided with transparency and control over their information."

Thore Graepel from Microsoft Research said he hoped the research would contribute to the on-going discussions about user privacy:

"Consumers rightly expect strong privacy protection to be built into the products and services they use and this research may well serve as a reminder for consumers to take a careful approach to sharing information online, utilising privacy controls and never sharing content with unfamiliar parties."

David Stillwell from Cambridge University added: "I have used Facebook since 2005, and I will continue to do so. But I might be more careful to use the privacy settings that Facebook provides."

(AP) -- Nine privacy groups have sent a joint letter to the Federal Trade Commission saying it should investigate the ways Facebook collects data about users' online activity after recent changes to its site.

While those active on social media aren't shy about expressing opinions on their Facebook pages, how much do their "Likes" really reflect the quality of an organization? American Journal of Medical Quality recently published ...

James Dyson announced Tuesday he was investing £2.0 billion ($2.7 billion, 2.3 billion euro) into developing an electric car by 2020, a new venture for the British inventor of the bagless vacuum cleaner.

A pair of Purdue University professors are using the popular Nintendo Wii gaming system to help people with Parkinson's disease. Jessica Huber and Jeff Haddad from the College of Health and Human Sciences are studying how ...

Robots perform many tasks that humans can't or don't want to perform, getting around on intricately designed wheels and limbs. If they tip over, however, they are rendered almost useless. A team of University of Illinois ...

22 comments

I never "like" anything. But I really don't like analysts who perceive all internet users as consumers who need to be directed to products that they might buy via commercial advertising intruding in on the relevant content that brought them to a site in the first place. The internet exists for the vast majority, if not for all of us, as a resource for information and research. It is not of necessity advertisement driven, as much as they like us to believe that it is. We pay our ISP's for the delivery of the information on sites that we would all be much happier to view without any commercial content.

Furthermore, people who browse sites on their cell phones are often paying up front for limited bandwidth. The bulk of the bytes that are sent to a mobile device constitute advertisement and that is just plain unjust.

See how many people would actually choose to visit sites that existed solely for their advertising content and nothing else. NOBODY!

But I really don't like analysts who perceive all internet users as consumers who need to be directed to products that they might buy via commercial advertising intruding in on the relevant content that brought them to a site in the first place.

Who do you think pays for all the sites that you visit? Does Facebook, the news sites and others that you visit, let you use their server just out of love for you? These sites have to make a living, and that comes either from advertising or subscriptions.

Furthermore, people who browse sites on their cell phones are often paying up front for limited bandwidth. The bulk of the bytes that are sent to a mobile device constitute advertisement and that is just plain unjust.

I like this point but you do have a choice on whether or not use the site. I for one find that sigalert is quite useful and accept that adds go with it. I have unlimited data though and never even get near the "alert" limit set befor throttleing begins. I don like how google gets to auto update all the crud I don't want which uses up bytes but it's forced on you with a smart phone.

This is exceptionally convenient to help people in many ways. The kind of knowledge from these studies could potentially lead to ways to help people understand what they really like and dont like and what they are suited for and who they are suited for. Many people dont realize what work they would really like to be doing and this might help people know what kinds of jobs are out there that they didnt know before. facebook profiles could be linked to companies looking for employees. And I tell ya, trying to find material for further education now that i am out of school is a little difficult. If this kind of data analysis could be used to connect me with adult ongoing education materials that I never knew i liked or had trouble conveniently accessing, i would be better off. Especially if it could lead to some kind of standardized certificate. That could help me and many other people get better jobs.

this technology might also help to uncover patterns of diseases that could be solved simply by knowing what to look for. I dont know which diseases, but maybe looking for associated things such as whetever allows it to predict drug usage. maybe something like this could help those drug users connect with more effective diversions which then lead towards something constructive which they can use to get a better education or a better job and stay more stable so that drug dependency is gradually minimized or phased out. combining with technologies such as the netflix challenge recommendation software, people could lead much more comfortable lives. that is to say, when people want or need something, even if they dont know what they want or need it, it can be delivered to them quite quickly and accurately.

The analysts will also be able to peg masochists who love to read what their "friends" had for breakfast and sit quietly while watching an internet version of a slide show of someone else's vacation. They will also discover a new personality disorder manifested by the collecting of friends like sea shells- the more you have, the higher your status.

Who do you think pays for all the sites that you visit? Does Facebook, the news sites and others that you visit, let you use their server just out of love for you? These sites have to make a living, and that comes either from advertising or subscriptions.

Facebook is a social networking site. It, and others like it, should be funded by subscription, not by advertising. Nobody reads Facebook for research or information gathering purposes unless they are doing an anthropological study.

The big ISP's are making money hand over fist from subscribers, to the tune of billions, especially from the mobile market. Advertising is overkill and hogs bandwidth and detracts from the quality of the internet experience. Sites that get visited a lot, such as news and information sites like this one, could get a share of that money.

"Facebook is a social networking site. It, and others like it, should be funded by subscription, not by advertising. Nobody reads Facebook for research or information gathering purposes unless they are doing an anthropological study."- baudrunner

Many users of Facebook have a commercial agenda- promoting their businesses of services and products at no cost for advertising. Oddly, fb users are more likely to trust a product in that atmosphere of "we're all friends here." The paying advertisers on fb and particularly Google are what make those corporations the juggernauts that they are, and why they're in business in the first place.

The kind of knowledge from these studies could potentially lead to ways to help people understand what they really like and dont like and what they are suited for and who they are suited for...

Umm... What?? You're actually saying it's better if people don't have to think for themselves?... "Helping" people understand what they really like and don't like? That's too close to blatant mind control for me.

Umm... What?? You're actually saying it's better if people don't have to think for themselves?... "Helping" people understand what they really like and don't like? That's too close to blatant mind control for me.

No. I mean it can help introduce people to things they hadn't encountered before. Additionally, maybe people will realize what they liked about something else was not direct, it was because they didnt realize what they were missing.

I wonder if this technology would have identified those people who committed mass shootings last year. If there was a way to keep it from being abused, it could help identify people who need help before they make national news.

I wonder if this technology would have identified those people who committed mass shootings last year. If there was a way to keep it from being abused, it could help identify people who need help before they make national news.

Definitely this could be used for it. But then, there are all kinds of moral, legal, and practical issues involved. And if word gets out that it's done, just guess if all those who "suspect they are suspect" will skip FB, just to be sure.

But yes, it could easily be done, and given enough data (i.e. that from a few actual mass shooters) it could be pretty accurate.

Please sign in to add a comment.
Registration is free, and takes less than a minute.
Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.