Site Navigation

Site Mobile Navigation

Google Schools Its Algorithm

To humans, computer intelligence is a puzzle, as if the machines have split personalities. They can be so remarkably smart at times, yet so bafflingly dumb at others.

This riddle of digital deduction has been center stage recently.

Just over a week ago, for instance, the Internet search giant Google announced that it was making a major overhaul of its formula for ranking Web sites. The company said it was demoting “low quality” Web sites, designed mainly to lure traffic from Google’s search engine and attract advertising revenue. It was a move to improve the quality of search, but also an admission that Google, the trusted curator of the Web, was being outwitted.

Computers are only as smart as their algorithms — man-made software recipes for calculation, the basic building blocks of computerized thought. When running on powerful computers, a clever algorithm can perform amazing feats. Google’s algorithm handles one billion search queries a day. But algorithms are often brittle and simple-minded, doggedly following their step-by-step formulas as if with blinders. They can be amazingly good at set tasks — playing chess, scanning the Web, simulating weather patterns. But they are typically unable to process what humans effortlessly understand — nuance, background knowledge, common sense about things in the physical world.

Expanding the horizons of computer intelligence — mimicking human understanding in more realms — is one of the grand challenges in science. I.B.M.’s Watson and Google’s algorithmic makeover highlight not only the accelerating pace of recent progress, but also how much remains to be accomplished.

Researchers at those two companies, a handful of others, and at several universities are leading the way. Their work often focuses on language — programming computers to recognize words and also understand the meaning of words, in their way.

Think of it as the Education of the Algorithm.

The machines, lacking the background knowledge and life experience of humans, use statistical models to reach results similar to the ones people reach, if by different means. With an analogy to flying, Frederick Jelinek, a pioneer in speech recognition, once explained, “Airplanes don’t flap their wings.”

The explosion of language on the Web, in text and audio, has given the statistical algorithms a rich training ground for improvement. Ever-faster computers help as well.

But parsing and categorizing language, with its ambiguity and subtlety, remains a formidable hurdle for computers. So the challenge facing Watson was far greater than the one I.B.M. overcame in 1997 with its Deep Blue chess-playing computer, which beat the world champion Garry Kasparov.

Photo

Credit
Stuart Bradford

“It’s a lot more difficult for a computer to understand language at the level of an 8-year-old than to beat a grandmaster at chess,” observed Oren Etzioni, a computer scientist at the University of Washington in Seattle.

A computer, of course, cannot really understand words. Instead, its algorithms scan mountains of text seeking patterns and probabilities, like how often a certain word appears close to other words in documents. For example, Watson needed a “pun detector” because puns are regularly used in “Jeopardy!” clues. Using statistical pattern-matching, Watson, for example, recognized that the phrase “holy city” was more likely to be St. Paul than, say, South Bend.

Algorithms are coded short cuts to certain ends, with assumptions, goals and perhaps even values built in. People tend to think of technology, like the algorithm, as impartial. But Helen Nissenbaum, a professor of media, culture and communication at New York University, calls the interplay between technology and its consequences “values by design.”

In Internet search and commerce, the priorities are to quickly deliver information to a user, and to present a potential customer to an advertiser. Those are perfectly practical and reasonable goals, but also fairly narrow ones.

Google is constantly refining its algorithm, though rarely as significantly as the recent overhaul, expected to alter the rankings on 12 percent of searches. Its algorithm is a tightly guarded trade secret, but it relies heavily on linking search terms to noun phrases in a Web page — as well as the popularity of a site and how often other sites link to it.

Google can be thought of as a supercharged, automated reference librarian for the Web. Type in a few words in Google’s search box, and Google replies, in effect, “I don’t know the answer, but try these Web sites.” And it performs that service in fractions of a second, a billion times a day.

Google’s algorithm, experts say, is fast and powerful, but reasonably predictable. That has opened the door to Web entrepreneurs who tailor their sites to “suck traffic from search engines,” as Amit Singhal, a Google fellow and search expert, put it. Google did not name any of the “low quality” sites it had in mind when it retooled its algorithm. But industry analysts agree that the target seemed to be so-called content farms, often sites with listlike articles, filled with words that are frequently used as search terms.

Essortment.com is a site whose rankings dropped sharply after Google changed its algorithm, according to an analysis by a search consulting firm, Sistrix. A fairly typical article, “25 Fun Things to Do With Your Girlfriend,” includes tips like “cook together,” “run a marathon together,” “go camping” and “go shopping together.” And it makes ample use of search-magnet words like “girls,” “dating,” “marriage” and “singles” — and the “25 Fun Things” page has plenty of ads.

A smarter algorithm, Mr. Singhal said, is the way forward, so there will be no easy way to game Google. “As we improve the language understanding of the algorithm,” he noted, “all the cheap tricks that people do will be recognized as cheap tricks instead of tricks that work.”

Indeed, the future of search engines like Google and Microsoft’s Bing, according to computer scientists, will be to exploit advances in machine learning and language processing to become answer machines — to take a page from Watson, but as a consumer service. Both companies are already headed in that direction.

Type into Google’s search box, “What is the height of the Empire State Building?” The top result is not a link to a Web page, but a reply: 1,250 feet.

A version of this article appears in print on March 6, 2011, on Page WK4 of the New York edition with the headline: Google Schools Its Algorithm. Order Reprints|Today's Paper|Subscribe