Facebook Graph Search Privacy Woes

Facebook's new search engine 'Graph Search' is heading into a new kind of privacy trouble that we previously haven't seen. The problem is not the data, but the implied relationship within it. A relationship we didn't intend to make when we liked a page.

Let me give you an example: Take a search query like: 'Single men who like leather and George Michael and Elton John'

When you read that search query, what was the first thing that came to mind? Some of you would probably think these men are gay (not that there is anything wrong with that).

And while a few of the people Facebook finds will fit that description, most won't. Most people also like 200 other musicians. And besides liking leather they also like bacon, Ferrari, Coca Cola, Starbucks, mountain biking, photography, Alicia Keys, and Mitt Romney.

So the same person would show up regardless if you search for 'Single men who like leather and George Michael and Elton John' ...or... 'Single men who like Coca Cola and Alicia Keys'.

See the problem here?

Because of the way Facebook Graph Search is designed, we can imply a relationship between data points that simply doesn't exist.

This, of course, is nothing new. We actually have a word for it. It's called 'contextomy' or the fallacy of quoting people out of context. We often hear about this from the news media. A journalist will, in order to get the angle that he wants, quote a person completely out of context. And the person ends up being quoted to say exactly the opposite of what he meant.

And while this practice is generally frowned upon by most respectable media companies, it is standard procedure for some of the more scrupulous tabloids.

With Facebook Graph Search, this is about to become a mainstream problem.

I don't, even for a second, think that Facebook wanted for this to happen, but that's what Graph Search does. Graph Search creates a relationship between data points that has no relationship to begin with. Graph Search is specifically designed to 'quote out of context'.

As an analyst, correlation and causation are the two most important elements that you have to get right. It defines value versus worthless. Right versus wrong. Correct versus incorrect, and misleading versus guiding.

There is no correlation between being a single man and liking George Michael's Facebook page. Those two data points weren't created within the same context.

Or what about:

Single women who live nearby and are interested in men and like getting drunk

And just look at how this data is presented in the search result page. All the other likes are filtered out, causing people to believe that there is a correlation here. That one thing automatically leads to another.

Or what about this one: "Current employers of people who likes racism" (and just look at that FB result page. If that is not an implied relationship I don't know what is):

Sounds pretty bad, right? But does that mean that Target and McDonalds support racism? No, of course not. There is no correlation, and thus no causation.

McDonalds has 400,000 employees, so whatever you search for you will always be able to find some who have liked a page of questionable nature.

And a like isn't even necessarily an endorsement. For instance, how many people do you think who work at BP also likes Greenpeace, just to stay up-to-date with their activities?

And how many in the fishing industry in Japan do you think likes Sea Shepard's Facebook page?

And how many of your own employees do you think 'like' one of your competitors? If you search for 'Employees working at Apple who likes Android' what do you think you will find?

Again, there is no correlation here and no causation. The relationship between search terms is entirely created by you. One data point does not lead to another. But do you see how dangerous such data is in the hands of people who don't realize this?

As analysts and (partly) journalists, we know the importance of context. And we know how critical it is to validate the relationship between data points. In fact, that is 90% of what we do. We find the true story in the sea of sources.

But I'm worried about Graph Search. This uncritical use of data, in the hands of people (i.e. everyone) who don't normally think in terms of correlation and causation, is problematic.

My guess is that, in 2013, we will see a ton of stories based on Graph Search results, putting people and brands into situations that they never intended to be in, simply because they were 'quoted out of context'.

I'm pretty sure McDonalds doesn't like being in the top three results when people search for "Employers of people who like racism".

So what should McDonalds do? Fire all those employees? Remember, there is no confirmed correlation here. A like is not necessarily the same as an endorsement.

Should Apple fire all employees who like Android? Even though many of them only like the page to stay up-to-date with the movements of a competitor?

Should a brand that makes barbecue grills fire all employees who are vegetarians?

But more to the point, is this really the kind of world that we want to have? A world where people are judged and devalued based on what they like on Facebook? Isn't this just another form of discrimination. We have tried discriminating people based on race, religion, political and cultural views ... all of which are now illegal. But isn't discriminating a person based on a like the same thing?

Racism is defined as views, practices and actions reflecting the belief that humanity is divided into distinct groups and that members of a certain group share certain attributes which make that group as a whole less desirable and inferior.

Should people be afraid of what pages they like in order to fit into society? Should a like not be treated the same as freedom of speech?

Now don't get me wrong. I think Facebook means well, and 99.99% of Graph Search will probably be used for great things. As an analyst, I love the concept of having access to all this data, and by analyzing it critically it has many potential uses.

When people search for "photos taken by people who live in Paris" the result is amazing. Similarly with most other results.

The problem is people who search without critically evaluating the result. One day they will see a result that causes them to draw a connection between two completely separate data points. And that's when the social backlash happens.

Of course, this also opens up an opportunity for the media. If everyone can search for everything, it's up to journalists and analysts to be the 'reality checkers' and to position themselves as analysts of truth.

But in these days of page view whoring will the newspapers focus on the right story? ...or the implied scandal?

How tempting would it be for them to write a story titled: "Massive problem with racism at McDonalds" ... combined with 15 follow-up articles of isolated cases of people being mistreated and McDonalds' management denying the whole thing.

All started because of this:

In the long term your readers will stick with those they can trust. Are you going to be a pundit or an analyst?