A more literate search engine

Graphics

Human behavior is
Facebook's business.

Its success is based on understanding how people are wired: how they present themselves, what they remember, whom they trust and now, how they seek information.

Facebook this month introduced a search tool to enable users to find answers to many kinds of questions. But before it did, it assembled an eclectic team to scrutinize what users were searching for on the site – and how. The team included two linguists, a Ph.D. in psychology and statisticians, along with the usual cadre of programmers. Their mission was ambitious but clear: teach Facebook's computers how to communicate better with people.

Kathryn Hymes, 25, who left a master's program in linguistics at Stanford to join the team in late 2011, said the goal was to create “this natural, intuitive language.” She was joined last March by Amy Campbell, who earned a doctorate in linguistics from UC Berkeley.

When the team began its work, Facebook's largely ineffective search engine understood only “robospeak,” as Hymes put it, and not how people actually talk. The machine had to be taught the building blocks of questions, a bit like how schoolchildren are taught to diagram a sentence. The code had to be restructured altogether.

Loren Cheng, 39, who led what is known as the natural language processing part of the project, said the search engine had to adjust to the demands of users, a great variety of them, considering Facebook's mass appeal.

“It used to be you had to go to the computer on the computer's terms,” Cheng said. “Now it's the user.”

The heart of the research took place in a lab at the Facebook offices in Menlo Park. Hidden behind one-way glass, team members watched users playing with different versions of a search engine and filled notebooks with observations. On occasion, the engineers tore out their hair.

They consulted dictionaries, newspapers and parliamentary proceedings to grasp the almost infinite variety of ways people posed questions. Then they trained the algorithms to understand what was meant. They tested tweaks to the search tool, as they do with every product, and measured how certain groups of people responded.

The project resembles how Facebook builds products. It studies human behavior. It tests its ideas. Its goal is to draw more and more people to the site and keep them there longer.

What it builds is not exactly a replica of how people interact offline, said Clifford I. Nass, a professor of communication at Stanford University who specializes in human-computer interaction. Rather, it reflects an “idealized view of how people communicate.”

“The psychology they are drawing on is not pure psychology of how humans communicate,” Nass said, “but the psychology of what makes people stay around, spend time on site and secondarily, what makes people click the advertisements.”

In the past, Facebook's rudimentary search engine responded to very specific queries. Say a user was trying to find Stanford students. The user had to type into the search bar: “people who attended Stanford.” The search engine did not understand “people who went to Stanford” or “studied at Stanford.”

Likewise, if someone were looking for friends, that person could type in “friends of me” or that person could look for vacations he or she had taken, by typing in “places visited by me.”

Today, the search engine can understand 25 close synonyms for the word “student,” including “freshmen” and “pupils,” and another 25 slightly more distant words that suggest the same thing, including “academics.” That can be combined with a time reference – current students – or more detailed descriptions – psychology majors – and all told, the search engine can recognize at least 275,000 ways to ask about “students.”

The search tool has already come under scrutiny. A recent blog post on Tumblr detailed how it could ferret out several uneasy personal details, including a list of people who “like” Falun Gong and whose relatives live in China, where Falun Gong is an illegal organization.

How aggressively the new search engine will compete with
Google, which dominates the search market, is unclear, as is how quickly it can spin money for Facebook. The company went public last May, in one of the costliest public offerings, and has been on something of a roller-coaster ride since. Its fourth quarter results will be announced Wednesday.

Much work remains for the researchers. The search engine, for example, still has difficulty understanding many kinds of sentences – for example, “photos John likes and that he commented on.” Nor can it grasp sentences that are ambiguous when written but perfectly understandable when spoken. Note how different it is to read – and hear – the sentence: “Sports fans of Lady Gaga play.”

“Computers are bad at context,” Campbell, the linguist, said. “They're bad at real world knowledge.”

Even without context, Facebook is also trying to approximate real world trust. Its search engine ranks answers to every query by an awkward construct that Facebook calls “social distance.” Its algorithms vet who among a user's Facebook “friends” the user is closest to and whose answers the user would like to see at the top of search results.

The company is betting on the principle of homophily: If it is from someone the user likes, the user may be more likely to pay attention to it – and click on the link.

User Agreement

Keep it civil and stay on topic. No profanity, vulgarity, racial
slurs or personal attacks. People who harass others or joke about
tragedies will be blocked. By posting your comment, you agree to
allow Orange County Register Communications, Inc. the right to
republish your name and comment in additional Register publications
without any notification or payment.