How Facebook Builds a Digital Signature for You (And Your World)

When Mitu Singh asks Facebook for Chinese restaurants where his friends eat, he expects the real thing — none of this Chinese-American chain-restaurant stuff.

You see, his girlfriend is Chinese, and that means a good number of his Facebook friends feeding the social network’s new search engine are Chinese too. “I get really, really authentic restaurants,” he says. “If she weren’t around, I’d probably get Panda Express.”

But one day, the system broke down. It recommended a restaurant called State Bird Provisions. State Bird is one of San Francisco’s trendiest restaurants, but it’s not Chinese. It’s New American. That’s not as bad as Panda Express, but it’s still wrong. Luckily, Singh isn’t just a Facebook user, but a Facebook employee. He could look into it.

After doing a little digging, Singh noticed that the person who’d created State Bird’s Facebook page had categorized its food as dim sum. State Bird Provisions specializes in small plates, some of which are rolled around in carts, just like in traditional Chinese dim sum eateries. Facebook’s algorithms, it turns out, had learned that the term “dim sum” was associated with Chinese restaurants — the two are tied together on thousands of other pages — so the service decided that State Bird Provisions was Chinese.

Now his task is to solve the problem — not only for himself but for everyone on Facebook. Singh and company must hone the search engine to the point where its virtual world comes much closer to matching what we experience in the real world. It’s a problem so many websites have, from Amazon to Yahoo, but it’s particularly pronounced at Facebook, a service that’s supposed to span, well, our entire lives.

The trouble is that our real world — and how we describe and experience it -– is constantly changing. And if Facebook doesn’t evolve with it, Singh says, “people get mad and rightly so.”

And, so, Singh, a Facebook product manager, spends his days working with engineers to tweak and improve the social network’s “Entities Graph,” a gargantuan map of relationships. Facebook boasts over 1 billion users, and the graph maps out how each of them relates to things like schools, books, movies, and restaurants — not to mention how all those objects relate to each other. It provides a kind of digital signature for each Facebook user and the world he or she inhabits.

Until around 2010, the information now mapped by the Entities Graph lived in your Facebook profiles, as plain text, and these strings of text weren’t linked to any additional information — information that would describe, say, what a school is or which of your friends may have gone there. But then Facebook rolled out object pages, the Like button, and check-ins, making it possible for people to interact and connect with things much the same way they did with people.

At the same time, the company invited its digital citizens to report errors, detect duplicates, and populate object pages with information, like addresses and phone numbers, and it used a combination of crowdsourcing and publicly available data to verify user-generated information was accurate.

According to Kai Yu — the director of the Institute for Deep Learning, the research arm of Chinese search giant Baidu — you couldn’t solve the problem with algorithms alone. “The biggest challenge is the ambiguity. For each entity, there are tons of different ways to express the same meaning. So far, it’s still difficult to develop these machine-learning algorithms to handle this huge variety of expression,” he says. Enlisting the help of millions of users to label the world’s data was the obvious workaround.

In short, Facebook added some structure to information that previously had none, making it easier for the company to parse and understand its wealth of data. This let Facebook engineers better define each type of object. They could hone notions of, say, “place-y-ness” or “schooliness” in an effort to describe all those objects.

And on some level these models work. According to Singh, many people on Facebook list their school as Hogwarts, the wizardry school from the Harry Potter series. But their graph models give Hogwarts a low “schooliness” quotient, partly because those who list it as a school come from so many different places. “We want to preserve user expression. If someone really wants to say that they went to Hogwarts, who are we to say that they didn’t go to Hogwarts?” Singh says. “But that’s not the thing that we want to show on top when we search for schools that people have gone to.”

The end result, according to Facebook, is that it can field richer queries and serve up more useful, personalized results than traditional search engines. “It gives you a familiarity with what that [object] is going to be before you even try it. That is very different from regular searching,” says Phil Bohannon, an engineering manager on the Entities Team. “It gives you a result that’s just not available anywhere else.”

Of course, companies like Google, Baidu, and Foursquare are also working to hone this type of search. Among other things, it lets you target ads more effectively. Google and Baidu are actively constructing knowledge graphs of their own that index webpages based on relationships between entities, rather than just keywords, and Foursquare has become a recommendation engine on its own right, though with a much smaller dataset than Facebook has at its fingertips.

“[Facebook] is the biggest tool for self-expression the world has ever seen and therefore there’s a big motivation to edit the Graph. That doesn’t exist in other companies. They have to get the graph filled in other ways — mining the web, things like that. Great for them,” says Facebook’s Bohannon. “But we’re doing it because the users want to express themselves.”