On Facebook, smart people like The Colbert Report and curly fries

Hello Kitty fans scored low on emotional stability.

Facebook's tendency to leak private information and photos has gotten the company in hot water, but the controversies may be missing a larger point. The service is all about sharing personal tastes and interests in a public forum, and the collective public musings can tell you a lot about the service's users.

How much? A new study has paired a personality profile with a datamining of people's "likes" on Facebook and has found that the likes collectively tell us some remarkably specific things about political views, personality traits, happiness, drug use, and so on. On its own, the study doesn't tell us anything shocking, but it provides some amusement value when the authors dive into their numbers and find out what items were specifically correlated with what traits. Which is how we find out that fans of curly fries probably outscored Sephora users on their SATs.

The work comes out of the myPersonality project, which has created a Facebook app that gives users a basic test of their psychological traits. If the users permit it, the researchers also get access to their Facebook profile and history of using the service to formally "like" something. At the time that the analysis in this study was done, the authors had data on nearly 60,000 users.

The study was remarkably simple: take a list of all the users' likes and start doing regressions to see if they could be correlated with a variety of specific traits, from basic demographic information like age and sexual orientation to personality traits and drug use. The end result was a score that reflected a simple test of accuracy: given two random members of the group that are on opposite sides of a score (say, gay and straight), how well could the algorithm do at predicting both correctly?

For the gay or straight question, remarkably well—88 percent of the time, it would get them right (it did worse with lesbians). That's about the same as its prediction of political persuasion (Democrat vs. Republican) and religious persuasion (they only compared Christianity and Islam). The statistics got gender right 93% of the time, and they picked Caucasians and African Americans with 95 percent accuracy. From there, things dropped a bit; cigarette and alcohol use were down to around 70 percent accuracy, while the study got drug use and "being in a relationship" correct only about two-thirds of the time.

People use the "like" button with very different frequencies, so the authors tracked whether their predictions got better when a user was a bit twitchier with the mouse. For every value they checked, accuracy went up with the number of likes available to analyze, although the accuracy for predicting age tailed off at about 300 likes.

The authors also tested whether likes could be used to predict some personality traits that had been scored by their tests. There's some variability if a person takes the same test twice—their scores won't typically have a perfect 1.0 (full correlation) match, but rather a correlation in the 0.6 to 0.8 range for most tests. So they compared their like-based predictions to the inter-test variation. For the most part, the likes didn't do especially well, although they did manage to do much better than random chance. Intelligence and extroversion each have a between-test correlation of about 0.75; the like-based scores only correlated with the tests a bit more than half that. The one exception was openness; its inter-test correlation is 0.55, and the like-based predictions managed to capture about 80 percent of that.

The authors make some valid points about how we tend to focus on the leaking of specific information, like social security numbers and embarrassing pictures. But the more interactions we have in public forums like Facebook, the more details of our lives tend to leak. In some countries, details like sexual orientation or religion can put people's lives at risk.

That sober warning aside, the details of some of the correlations they found are simply hilarious. When they looked at the best predictors of high intelligence, they came up with science (naturally), thunderstorms (oddly), The Colbert Report (OK, I can kind of see it) and curly fries (huh?). On the low end of the intelligence scale, you wind up with Sepahora-using, Harley-riding, Lady Antebellum fans who "like being a mom."

Self-identified gay users were unlikely to spend the time liking obviously gay-friendly groups like the No H8 Campaign or Gay Marriage, so the authors had to sharpen their predictions using things such as Wicked The Musical, Britney Spears, and Desperate Housewives. Straight users were picked out based on liking the Wu-Tang Clan, Shaq, and (bizarrely) “Being Confused After Waking Up From Naps.”

Some individual likes also said a lot about a person. For example, the authors found that Hello Kitty fans tended to score high on openness but lower on things like conscientiousness, agreeableness, and emotional stability. They also tended to be Democrats, for what it's worth.

75 Reader Comments

I wonder how those who never opted for a facebook page out of privacy concerns rank? Seriously scarey the amount of personal information users put out about themselves that can be later correlated into some pretty good guesses as to who you might be, even behind the persona.

Fold that data in with your credit/debit card transactions and things really take off. Wait for it.....

This is kind of amusing in the context of Facebook, but the same kind of analysis is going on with other data collected about users elsewhere. For example, the records of web pages you read, as recorded by advertising networks, is very similar in practice to the records of Likes. So, the same advertisers who do targeted advertising are also extracting and extrapolating information about you that you are not deliberately sharing. Much was made of Target's ability to infer when women are pregnant by this same method, which shows it's not just picking up trivial information like movie preferences but also information about medical conditions, life planning, etc. And then this combination of facts and inferences about you are both used to serve ads and also sold to other companies.

I was going to comment on how that "tomato sauce" bottle was a blatent ripoff of Heinz, but then I did my due diligence and found that they are now owned by Heinz, so they can rip themselves off I suppose

It seems that their personality test is the weak link, given the relatively low correlation when using their personality test, and the relatively high accuracy of predictions using 'likes' alone, improving their personality test would likely yield more accurate results when drawing conclusions about personality based on likes.

From the Article:"Some individual likes also said a lot about a person. For example, the authors found that Hello Kitty fans tended to score high on openness but lower on things like conscientiousness, agreeableness, and emotional stability. They also tended to be Democrats, for what it's worth."

I wonder how those who never opted for a facebook page out of privacy concerns rank? Seriously scarey the amount of personal information users put out about themselves that can be later correlated into some pretty good guesses as to who you might be, even behind the persona.

Fold that data in with your credit/debit card transactions and things really take off. Wait for it.....

I'm highly skeptical that they don't already know how to combine the data. There have been a few articles on the advance metrics Target utilizes that can predict pregnant women before they even realize they're pregnant. I have no doubt these places know exactly who likes them on Facebook and are able to paint an even better picture of their customers.

I never cared about my privacy on Facebook. There is hardly anything on there that some company couldn't glean from their already invasive tracking policies. The only thing I'd have to worry about is stuff I'd find personally embarrasing, and that's my own damn fault for sharing something I probably shouldn't have.

[EDIT: Okay... guess I'm the only one who thinks it's funny that Shaq is a spokesperson for online child protection and has a bizarre correlation with straightness (according to the article)? Gotta work on my jokes...

The reason this is not the privacy scare it seems to be is the same reason data mining is terrible at spotting terrorists.

1 out of 1000000 people will be a terrorist. The algorithm may get a prediction accuracy of 99% which means you get 990000 people who are most likely correctly cleared but 10000 people who are identified as terrorists. 9999 wrongly. Which is more or less useless. And you never get 99%.

You can do awesome statistic predictions but it is pretty much impossible to make an accurate actionable statement about a single person without hitting a shit load of false negatives /positives. So it is good for companies to target marketing etc. But the privacy implications for every single person are more or less negligible.

The reason this is not the privacy scare it seems to be is the same reason data mining is terrible at spotting terrorists.

1 out of 1000000 people will be a terrorist. The algorithm may get a prediction accuracy of 99% which means you get 990000 people who are most likely correctly cleared but 10000 people who are identified as terrorists. 9999 wrongly. Which is more or less useless. And you never get 99%.

You can do awesome statistic predictions but it is pretty much impossible to make an accurate actionable statement about a single person without hitting a shit load of false negatives /positives. So it is good for companies to target marketing etc. But the privacy implications for every single person are more or less negligible.

Debatable. The reason why it is harder to spot a terrorist is because there are relatively so few that it is difficult to pick up a desernable pattern for identification. They talk about this is Superfreakonomics. They said certain flags were better than others (lack of life insurance, large number of international calls and money wires, and abnormal spending habits for day to day spending).

The reason this is not the privacy scare it seems to be is the same reason data mining is terrible at spotting terrorists.

1 out of 1000000 people will be a terrorist. The algorithm may get a prediction accuracy of 99% which means you get 990000 people who are most likely correctly cleared but 10000 people who are identified as terrorists. 9999 wrongly. Which is more or less useless. And you never get 99%.

You can do awesome statistic predictions but it is pretty much impossible to make an accurate actionable statement about a single person without hitting a shit load of false negatives /positives. So it is good for companies to target marketing etc. But the privacy implications for every single person are more or less negligible.

I think the record shows people are willing to beat up gays even when they have less than 99% certainty. Just to pick one example. A rumor can ruin your life even if it's not true.

New stats show that dumb people who want to look smart are now liking the Colbert Report and curly fries....

The people I know who are into Colbert and Stewart are not dumb, but they do tend to be intellectually lazy when it comes to world affairs, looking for the simple answers to deeply and chaotically complicated situations. Yes, yes, anecdotal sample and all that. Also, that description fits most people in the world.

"On the low end of the intelligence scale, you wind up with Sepahora-using, Harley-riding..... "

1.) That's a rather ironic statement; given that you spelled Sephora incorrectly. :-P

2.) It has been well established that riding a Harley is correlated to low intelligence. It's not a causation==correlation issue though. Buying and riding a Harley doesn't make you dumb, you're dumb because you bought a Harley to ride. Intelligent people use their hard-earned disposable income to buy motorcycles on the leading edge of the technology curve; the kind of bikes that have evolved combustion efficiencies to pull every miniscule increment of horsepower out of every drop of fuel consumed. Less intelligent people would rather overpay for motorcycles where the only thing they are good for is converting fuel into loud noises. Nothing against all Harley owners, just the ones who think their Sporty/HDFC123/HDFUWHATEVA is the best produced motorcycle in the world because a.) Harley marketing said so or b.) because its loud.

The reason this is not the privacy scare it seems to be is the same reason data mining is terrible at spotting terrorists.

1 out of 1000000 people will be a terrorist. The algorithm may get a prediction accuracy of 99% which means you get 990000 people who are most likely correctly cleared but 10000 people who are identified as terrorists. 9999 wrongly. Which is more or less useless. And you never get 99%.

You can do awesome statistic predictions but it is pretty much impossible to make an accurate actionable statement about a single person without hitting a shit load of false negatives /positives. So it is good for companies to target marketing etc. But the privacy implications for every single person are more or less negligible.

Actually that is exactly why data mining does scare me. There are too many people in this world that don't understand statistics and see no problem with acting on information that has a 1% false positive rate. A surprisingly large number of these people somehow end up in management or governmental roles.

This kind of data analysis isn't too surprising. It's the sort of thing that underscores the privacy concerns most Arsians can see that the rest of the world doesn't. I don't care if Facebook knows that I like Chobani, Brownberry's bread, Kashi and Wikileaks. It's the fact that some 3rd party of a 3rd party that I've never heard of may try to use that data down the road as "applicant screening info". For example they may try to mine statistics out of it which could incorrectly correlate that I'm some anti-HFCS, "conspiracy nut" which wouldn't be a good thing if I were trying to get a job in the I.T. dept at a company like Cargill, ADM, Kraft, or wherever.

Are curly fries the US equivalent of the Canadian onion ring? I remember in our last election there was a facegroup group whose purpose was to get more people to like a picture of onion rings then our Prime Minister Harper. It then blog rolled a bunch "facts" about the conservative government. It did quite well.Or are people just strait up liking curly fries?

Are curly fries the US equivalent of the Canadian onion ring? I remember in our last election there was a facegroup group whose purpose was to get more people to like a picture of onion rings then our Prime Minister Harper. It then blog rolled a bunch "facts" about the conservative government. It did quite well.Or are people just strait up liking curly fries?

No!!! Curly fries are just fries that are cut in a spiral shape. See the picture in the article. Curly fries may have some seasoning in addition to salt, but they are not onion rings.

Edit: upon further reading your comment, I'm not sure if you are confused about what curly fries are, or if there is some weird link between curly fries and politics.

Ugh. The source "information" has sooo many meanings I can't see how it is supposed to be an indicator of anything.

For example, suppose I'm not a fan of Colbert but I got to see a meme image of him talking about some controversial issue (you know, guns, abortion, religion whatever). If I "like it" does it means I like Colbert or I like whatever it says on his meme image?

It seems as if most people drank the Facebook kool-aid and gave value to such "information" for granted.

Perhaps I'm missing something, but I don't see that anything is being "predicted" based on facebook likes. The authors found correlation between different data values, that I suspect is relatively sparse. This article says that authors developed an algorithm to use some data values to predict other data values from within the group used to develop the algorithm. That doesn't mean it will work for members not in the original group. (Which would be some good research to try, and might have been done, but even the paper's abstract doesn't say that the authors have done that.)

Perhaps I'm missing something, but I don't see that anything is being "predicted" based on facebook likes. The authors found correlation between different data values, that I suspect is relatively sparse. This article says that authors developed an algorithm to use some data values to predict other data values from within the group used to develop the algorithm. That doesn't mean it will work for members not in the original group. (Which would be some good research to try, and might have been done, but even the paper's abstract doesn't say that the authors have done that.)

You're right, I meant Facebook and "information" from it should be taken not so seriously.

So... if I liked all of those apparently gay indicators did they register me as a gay male? I'm intrigued! It's okay if they do, I likes the mens (and the wominks; I'd skew their data all up). Obviously they'd figure it out quickly enough by looking at my gender in my profile. I've always wondered why people put so much information in their profiles; aren't things like political party/religion/sexual orientation a bad idea to be on there in case recruiters come a-knockin'?