Knowing the score: How Facebook’s Graph Search knows what you want

Search happiness algorithms and a herd of Unicorns power Facebook search results.

With the introduction of its Graph Search feature, Facebook is trying to turn the vast store of data about relationships between people, places, and things into something useful for its users: a search engine for the real world. While Bing and Google mine the Web for information about the pronouns of Earth—"entities" such as individuals, products, locations, and concepts—Facebook is instead tapping into the collective knowledge of one billion users to answer questions about where to eat, what to buy, and who to ask for advice.

That collection of entity data is vast—there are hundreds of billions of entities with trillions of attributes and relationships defined. This gives Facebook a substantial advantage over Google and Bing, which have built up their own "entity" search databases. These search engines mine the searches performed by users and apply language processing technology to Web content, identifying entities and building a schema for their attributes. But Facebook's entity database spans all of the languages of its users without having to do any of that, and it's also substantially larger in scope than Google's Knowledge Graph or Microsoft's Active Objects stores.

"We have the data already. We own it, so we are complementing what we already have as data with search instead of looking for the data elsewhere," Facebook Search Infrastructure Engineering Manager Sriram Sankar told Ars. "So we have a different kind of problem to solve. What we are doing has more to do with making our Graph better, doing whatever it takes to encourage our users to do a better job of curating their information, and us looking at ourselves as a piece of the full Facebook puzzle."

The problems Facebook faced were how to turn this massive database into something that could be easily searched, and how to rank those results in a way that delivers results. The results should, in the words of Sankar, "maximize searcher happiness"—all while delivering near-real-time results as users type their queries.

Today, Facebook is pulling back the curtain a bit on how it performs search and search rankings, which are the result of data-driven analysis and the engineering team's gut instincts. Ars spoke with Sankar about the technology used by his team to make Graph Search work not just as a new feature of the site but as an integral component of nearly every feature of Facebook's interface. (And, as he noted, there's still a lot of work left to be done.)

Understanding the Graph

To understand how Facebook's search works, you first have to understand what the "graph" is. The graph is a database that stores information about the users, pages, and other objects within the Facebook universe. It also includes the relationships between them. Each entity, or "node," within the Facebook graph—identified by a unique number called a fbid (Facebook ID)—has a set of attributes, or metadata, associated with it. The relationships between these nodes, called "edges," contain their own metadata to describe the type of relationship between them.

The graph database used by Facebook is quite similar to Google's Knowledge Graph and Microsoft's Satori graph-based repository. But in many ways, the structure of Facebook's graph is simpler than Google's and Microsoft's graph schemas, because Facebook has tuned the metadata for its nodes and edges specifically for social interaction—not to store product SKUs or how many times a particular actor has been to rehab. The Facebook Graph may not be able to answer those questions, but it contains data about other things that are useful. You can learn what entities are close to a certain location, liked by certain people, or otherwise tethered to a user through the social network's path of edges.

Enlarge/ A graphical illustration of Facebook's graph database, with entities (or "nodes") as blue balls, and relationships (or "edges") as arrows and lines.

Facebook

The problem is that there are hundreds of billions of entities in the Facebook Graph, with trillions of relationships connecting them and trillions of attributes. Sankar said that the attributes of Facebook's photos alone number in the trillions. So how do you index and search something that big?

A herd of Unicorns

The heart of Graph Search is called Unicorn, an in-memory database based on an inverted index. Inverted indexes have been used by a number of search technologies that associate words, fragments of words, or types of relationships with attributes of entities within Facebook's graph database. Those associations, called "marks" by Facebook, are stored in a Unicorn in-memory index, pointing to the Facebook numerical identifier for related entities.

But no single index server is large enough to handle the index for hundreds of billions of entities. Again, Facebook photos alone have trillions of attribute values associated with them. So the Facebook search team needed a whole herd of Unicorns. Indexes for large categories of data, known in Facebook parlance as "verticals," appear as a single Unicorn instance, but these are broken up (or sharded) across multiple index servers. These index servers are front-ended by a server known as a "vertical aggregator," which broadcasts queries to them and passes the result back.

Enlarge/ The Facebook search architecture, with the top aggregator acting as intermediary between users and the herd of Unicorn instances.

Facebook Engineering

All of Facebook's search verticals sit behind a top-level aggregator, which handles the incoming search requests. To keep the latency of responses to search entries down, the aggregators at each level also heavily cache search results in memory. "The goal of the index is to make it super-optimized for read," Sankar said.

But for that stampede of results to make sense, it needs to be ranked. Unicorn returns search results in a static-rank order, meaning that there's no inherent weighting of what it sends back based on relevance to the search. So before Facebook's Graph Search passes back its results to the user or application making the query, it has to perform one more step: scoring the results.

Points for style

Scoring happens at both the vertical aggregator for each category and at the top-level aggregator. At both stages, a set of algorithms tries to determine the relevance of results based on a "forward index"— an index that stores metadata about each entity not stored in Unicorns' inverted index.

But just scoring on relevance can create some issues for results. In a blog entry Sankar posted today, he explained that relevance scoring works on just one entity at a time. This can cause the results of a search to be "very one-dimensional and offer a poor search experience (for example, 'photos of Facebook employees' may return too many photos of Mark Zuckerberg)."

So the aggregators perform a set of score filtering that looks at the entities returned by a query as a group, then picks from them a "subset of these entities that are most interesting as a set and not necessarily the highest scoring set of results," Sankar wrote.

The Top Aggregator has an additional scoring task to handle—blending. The results from different categories of entities also have to be presented together in the proper order, which requires more adjustments to scoring to ensure that the mixed results can be fairly compared for relevance.

Another factor that had to be addressed in the Top Aggregator's results was search queries that required joining results across multiple category verticals. For example, a search for "restaurants my friends like" would have to retrieve all the friends of a user and then use their likes as a filter for restaurant entities. So the Top Aggregator needs to both rewrite queries to be sent down to indexes based on these connections and then process the results based on the relevance to the original query.

All of these scoring decisions are "static"—they're hard-wired into the search code and impact every search through Facebook. But on top of that, there are additional scoring decisions made dynamically based on the user's own metadata (their location, their network relationships, and their demographics). "The dynamic rank that we build for results will be specialized first for the use case"—which part of the Facebook interface is calling for the search, and whether the user is on the Web and mobile, for example—"and then second on the actual searcher," Sankar said.

In front of this all is a natural language processor module that passes the query to the Top Aggregator as the user is typing it. As a user types, the language processor is also suggesting queries based on its dictionary and sending the results for that suggestion back. (This is why when I typed in "Pope" on Wednesday, it returned "Popeye" as the top result before I got more specific).

Sean Gallagher

Embedded in all of the scoring and the conversion of natural language to queries is a certain set of assumptions about what Facebook users really want to see when they get search results. For example, right now, a search on "restaurants" returns a set of results that Sankar said is an attempt to match the user's location and demographics. That search also includes drill-down searches that can bring back more granular results.

Improving the behavior of search is going to take time and more data on what users choose. With each tweak to the scoring system, Sankar said, Facebook rolls out the change to a small subset of users and measures the results of what they pick from choices.

"What we have today is going to be much worse than what we have a year from now and is much better than what we had six months ago," Sankar said. As data collected from users is cast as votes on what search results make them the happiest, Sankar's team will adjust the scoring to match.

Of course, the value of the results also depends a great deal on Facebook users themselves. Since the Graph Search mines data provided by users, the quality of the result is only as good as what Facebook users put in—through likes, tags, and other metadata they create. And keeping results clean will become an ongoing battle as well. You can bet search engine optimization specialists will start attempting to reverse-engineer the scoring algorithms Sankar's team has developed.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

I dunno... yet in spite of all of these glaring issues, you're still using the service. It's obviously "good enough" in some aspect to prevent you from swearing off of it entirely.

Google's natural language processor has more or less completely destroyed Google Search's usefulness to me for all but the most trivial queries. And as illustrated, real-time-as-you-type search results are a great way to bombard the user with results that are very rarely relevant to what they're looking for.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

This post seems a bit uninformed.

Their release cycle is extremely aggressive. In fact, for their desktop product, twice a day. For their mobile product, once a month.

I want to know what product you're working on with a release once-a-month that doesn't have any bugs.

I'm not a fan of Facebook, personally. I also highly value my privacy. But if you're going to be a hater, just say you're a hater.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

I dunno... yet in spite of all of these glaring issues, you're still using the service. It's obviously "good enough" in some aspect to prevent you from swearing off of it entirely.

I know I am probably one of the few, but I mainly keep my Facebook account active so I can monitor whether or not others are violating my privacy (e.g., by posting pics of me or my family members). I know this is not a perfect solution, but better than doing nothing.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

This post seems a bit uninformed.

Their release cycle is extremely aggressive. In fact, for their desktop product, twice a day. For their mobile product, once a month.

I want to know what product you're working on with a release once-a-month that doesn't have any bugs.

I'm not a fan of Facebook, personally. I also highly value my privacy. But if you're going to be a hater, just say you're a hater.

Are you arguing that by releasing more often they're excused for having significant bugs?

I know I am probably one of the few, but I mainly keep my Facebook account active so I can monitor whether or not others are violating my privacy (e.g., by posting pics of me or my family members). I know this is not a perfect solution, but better than doing nothing.

...how does someone like you even get into a picture if you're this against having people see pictures of you?

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

Facebook has 1 billion advertisers and investors?

No, those people are users. As a user I don't care if they can't find a way to advertise effectively. Makes no difference to me. Well until advertisers pull out and the company crashes. Then I suppose it will impact me.

Ideally facebook would fix it mobile and advertising problems and so move towards a more sustainable business model. If they don't, in the long term they're going to be in trouble.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

This post seems a bit uninformed.

Their release cycle is extremely aggressive. In fact, for their desktop product, twice a day. For their mobile product, once a month.

I want to know what product you're working on with a release once-a-month that doesn't have any bugs.

I'm not a fan of Facebook, personally. I also highly value my privacy. But if you're going to be a hater, just say you're a hater.

Lovely. So they roll a number around and it's an update now?What kind of bug can they fix in 12h? Better, if they have bugs that can be fixed in 12h everyday, just toss all that code and start from scratch.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

Facebook has 1 billion advertisers and investors?

No, those people are users. As a user I don't care if they can't find a way to advertise effectively. Makes no difference to me. Well until advertisers pull out and the company crashes. Then I suppose it will impact me.

Ideally facebook would fix it mobile and advertising problems and so move towards a more sustainable business model. If they don't, in the long term they're going to be in trouble.

Who complaining about bugs? It's the user. I referred to the user experience. I don't see advertisers complaining about Facebook anywhere, and why should they? FB has 1 billion users, some are bound to be interesten in whatever you are advertising.

Are you arguing that by releasing more often they're excused for having significant bugs?

No. I'm arguging there's a higher potential for both introducing / resolving bugs. There will be bugs. Saying otherwise is just nonsense. But making that a point of criticism when they can fix it within the day seems a little unfair.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

1 billion of people(me excluded) disagree.

This post seems a bit uninformed.

Their release cycle is extremely aggressive. In fact, for their desktop product, twice a day. For their mobile product, once a month.

I want to know what product you're working on with a release once-a-month that doesn't have any bugs.

I'm not a fan of Facebook, personally. I also highly value my privacy. But if you're going to be a hater, just say you're a hater.

Lovely. So they roll a number around and it's an update now?What kind of bug can they fix in 12h? Better, if they have bugs that can be fixed in 12h everyday, just toss all that code and start from scratch.

Just cause their release cycle is 12h, doesn't mean that they have bugs that can be in 12h everyday, just that the results of longer fixes can be released in a staggered and rapid fashion.

I got a very clean home page. I have got in the habit of deleting any post I make after a few days. Photos too! I wipe it all out! FB eventually gets around to deleting it. Not sure if it is like 3 months now versus forever before. FB can't link a search to what isn't there. Or, can they?

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

All this tech is great, but everything about facebook is just so buggy. From their horrible mobile apps, to their webpage not loading pictures, to their schizophrenic advertising. They really need to get their system in order or they're not going to have a sustainable business.

No, it's not great. Facebook's search isn't really usable. Try searching on "people in Phoenix, AZ" It gives up immediately and goes to Bing web results. Facebook must have thousands of people who list their home as Phoenix and who have open profiles but it didn't find any of them. But maybe people in Phoenix are very private. Let's try restaurants in phoenix. Restaurants are sure to have public profiles. Of the top 10 results, four of them are In-n-Out Burgers, one is a Phoenix restaurant in Tulsa and another is in Budapest and one is an app for your phone. An app.

Color me unimpressed. Well, how about I try and leverage some of that mountain of data Facebook has. Let's try restaurants in phoenix that men like. No results. Men don't like restaurants in Phoenix! I guess they must suck. But maybe women like them. Nope. In both cases, it goes straight to Bing.

I know I am probably one of the few, but I mainly keep my Facebook account active so I can monitor whether or not others are violating my privacy (e.g., by posting pics of me or my family members). I know this is not a perfect solution, but better than doing nothing.

...how does someone like you even get into a picture if you're this against having people see pictures of you?

Don't be so sure about that. If everybody did nothing on Facebook, it would cease to exist and its severs would get recycled into paperclips or something.

And yet in spite of them knowing everything about me down to my GPS coordinates, facebook still shows me mobile ads for services I have no need for in cities I've never been to thousands of miles away from where I just told them I am.

If the advertiser chooses to target all of a country despite being a local service, blame the advertiser, not the delivery mechanism. Do you blame Google for their ads which do the same thing?

" Facebook is trying to turn the vast store of data about relationships between people, places, and things into something useful for its corporations, identity thieves and advertisers: an search system that is unable to find a user's own posts, yet is able to bombard its userbase with spam tailored to them from their own posted info and info gleamed from their Internet use."

I know I am probably one of the few, but I mainly keep my Facebook account active so I can monitor whether or not others are violating my privacy (e.g., by posting pics of me or my family members). I know this is not a perfect solution, but better than doing nothing.

Don't be so sure about that. If everybody did nothing on Facebook, it would cease to exist and its severs would get recycled into paperclips or something.

If you're not a part of the solution then you are a part of the problem. Not that I blame the guy for doing what he does. That's his decision.

I wonder how much the kickback from Microsoft is. Or is it that FB doesn't see MS as ever being a threat to their social network?

At any rate, I don't much have an interest in search results generated by the mass ignorance of the human population. I'm going to bet that if I use Google I'll actually find the expert knowledge I'm looking for. If I use FB search I'll get the "knowledge" of experts who's primary qualification is how much people "like" them.

People will "like" things they don't actually like simply because they think it's the popular thing to do. House is right. Everybody lies. The more social you make things like search, the worse it's going to get.