Yesterday's news that the European Commission has opened a preliminary inquiry into competition complaints from three companies has generated a lot of questions about how Google's ranking works. Here, Amit Singhal, a Google Fellow responsible for ranking, who has worked in search for almost 20 years, explains the principles behind our algorithm.

Pop quiz. Get ready. You're only going to have a few milliseconds to answer this question, so look sharp. Here goes: "know the way to San Jose?" Now display the answer on a screen that’s about 14 inches wide and 12 inches tall. Find the answer from among billions and billions of documents. Wait a second - is this for directions or are we talking about the song? Too late. Just find the answer and display it. Now on to the next question. Because you'll have to answer hundreds of millions each day to do well at this test. And in case you find yourself getting too good at it, don’t worry: at least 20% of those questions you get every day you’ll have never seen before. Sound hard? Welcome to the wild world of search at Google. More specifically, welcome to the world of ranking.

Google ranking is a collection of algorithms used to seek out relevant and useful results for a user's query. There's a ton that goes into building a state-of-the-art ranking system like ours. Our algorithms use hundreds of different signals to pick the top results for any given query. Signals are indicators of relevance, and they include items as simple as the words on a webpage or more complex calculations such as the authoritativeness of other sites linking to any given page. Those signals and our algorithms are in constant flux, and are constantly being improved. On average, we make one or two changes to them every day. Lately, I’ve been reading about whether regulators should look into dictating how search engines like Google conduct their ranking. While the debate unfolds about government-regulated search, let me provide some general thinking behind our approach to ranking. Future ranking experts (inside or outside government) might find it helpful. Our philosophy has three main elements:

1. Algorithmically-generated results.2. No query left behind.3. Keep it simple.

After nearly two decades, I’ve lost count of how many times I've been asked why Google chooses to generate its search results algorithmically. Here's how we see it: the web is built by people. You are the ones creating pages and linking to pages. We are utilizing all this human contribution through our algorithms to order and rank our results. We think that's a much better solution than a hand-arranged one. Other search engines approach this differently -- selecting some results one at a time, manually curating what you see on the page. We believe that approach which relies heavily on an individual's tastes and preferences just doesn't produce the quality and relevant ranking that our algorithms do. And given the hundreds of millions of queries we have to handle every day, it wouldn't be feasible to handle each by hand anyway.

This brings me to the next point: leaving no query behind. Usually once I've explained to people the thinking behind algorithmically-generated results, some will ask me, "But what if you do a search, and the results you see are just plain lousy? Why wouldn't you just go in there by hand and change them?" The part of this question that's valid is in terms of lousy results. It happens. It happens all the time. Every day we get the right answers for people, and every day we get stumped. And we love getting stumped. Because more often than not, a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages around the world in over 100 countries. I should add, however, that we do have clear written policies for websites that are included in our results, and we do take action on sites that are in violation of our policies or for a small number of other reasons (such as legal requirements, child porn, spam, viruses/malware, etc.). But those cases are quite different from the notion of rearranging the page you see one result at a time.

Finally, simplicity. This seems pretty obvious. Isn't it the desire of all system architects to keep their systems simple? We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. Our commitment to simplicity has allowed us innovate quickly, and it shows.

Ultimately, search is nowhere near a solved problem. Although I've been at this for almost two decades now, I'd still guess that search isn't quite out of its infancy yet. The science is probably just about at the point where we're crawling. Soon we'll walk. I hope that in my lifetime, I'll see search enter its adolescence.

In the meantime, we're working hard at our ongoing pop quizzes. Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages, our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work.

Posted by: Amit Singhal, Google Fellow

Update 2 March, 10:30am

First of all, let me thank everyone for their kind comments and honest views in this discussion. Gary, I love search, after having done search for almost 20 years, I still come into work every morning like a kid going to a candy store. Alongside my passion for search, one fact that keeps me so excited is that what was science fiction in search research twenty years ago is now coming to fruition at Google. The semantic systems we have built are something I didn't expect to build in my lifetime. Secondly, Google has given me an environment where researchers like me can practice search in its pure algorithmic form. I can't put in words how incredibly satisfying this combination is for a search geek like me :-)

Great job Amit. I love that you clarified the definition of Google ranking being a collection of algorithms. Too often we think about it (or drive ourselves nuts about it) being based on one algorithm or formula.

My favorite attribute about the entire Google search philosophy is 3. Keep it simple. Even after the YaBing conversion, Google will always have minimalism as a sustaining value proposition - and that's the way I like it.

Amit - you forgot to mention one last thing: those results for "search engine" will vary depending on where you live, where your web hosted is serving from, what you've search for and clicked on before, and what Google data center you are hitting at the time. Then, you leave the user to contemplate whether they want to see a web search result, a news result, a social network, a blog post, a book result, or a paid ad. While diversity and options are great, this may leave me, as the user, so much more confused and almost forgetting what I originally wanted to find.

This was really clear and I appreciate the honesty behind Googles search philosophy.

If you don't mind me asking, why have you spent 20 years in search? You seem to have a huge passion for it and know the landscape for it's future. Even though I seem to have answered my own question(?) are there any other reasons?

What is such a shame is that the interfering government do gooders will probably read the first line of this excellent post, consider themselves informed and then go off on a money wasting spree similar to the UK's bailout of Northern Rock. They will set limits, goals and other useless targets and then conveniently forget what they were meant to be doing in the first place when they get called infont of a parliamentary committe or a congressional oversight committee!

However without such imbeciles I wouldn't have any one to rant about and this post would have just said....

Excelent and clear info, keep up the knowledge stream, we are all sponges out here!

"... Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages, our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work"

you are very funny google, your statement sounds clear and now all people understand the algo, but why google not explain us this:

If Company 1 have many landingpages and/or affiliates, all this guys have relevant sites or short urls maybe, and you display in the first 10 of 5 millions results 8 from Company(a1) with the effectiv same target url ! Than its not a question how your indexing and algo is working, its more a question why you not block more results with the same target url ??? or is it not true also is it me i see this issue only ???

This looks for me anti competition and nobody can tell me now that have to do with SEO or someone else, it have to do simple google not block this target urls and give 8 of 10 first results to the same company, i call this prefered listing !

Its easy to fix, if you crawl make a limit of 2 same target urls for the first 100 results, dont display the landingpages and short urls what have effectiv same target and nobody will cry.

[This looks for me anti competition and nobody can tell me now that have to do with SEO or someone else,...]

There are certain situations where my attempts to perform research are hampered by junk result sets.

No matter how clever an algorithm Googledevises, there will be folks looking to game the system and conversely folks who inadvertently demote themselves.

It's a delicate balancing act for boththe indexer and the indexed in trying toget both the quality content that is oblivious to SEO and the content leveraging every last strategy to weigh out more by their relevance then their cleverness in jousting the crawler.

Overall, I'm pretty happy with theresults I get, and when I get junkit's typically pretty obvious to methe nature of the issue and I adjust my queries accordingly.

Interesting that you feel fixing one lousy result will fix a host of other issues. i would think the opposite would be true or possible. Identifying a solution specific for one set of results could cause a huge list of problems for many other website that were probably doing nothing wrong.

This type of thinking really helps me understand why mom and pop shops that are doing nothing wrong get de-indexed or demoted all of the time in your index.

This was really clear and I appreciate the honesty behind Googles search philosophy.

If you don't mind me asking, why have you spent 20 years in search? You seem to have a huge passion for it and know the landscape for it's future. Even though I seem to have answered my own question(?) are there any other reasons?

February 25, 2010 6:13 PM

---

First of all, let me thank everyone for their kind comments and honest views in this discussion. Gary, I love search, after having done search for almost 20 years, I still come into work every morning like a kid going to a candy store. Alongside my passion for search, one fact that keeps me so excited is that what was science fiction in search research twenty years ago is now coming to fruition at Google. The semantic systems we have built are something I didn't expect to see in my lifetime. Secondly, Google has given me an environment where researchers like me can practice search in its pure algorithmic form. I can't put in words how incredibly satisfying this combination is for a search geek like me :-).

By representing a heuristic as an algorithm, Amit has attempted a sleight of hand. An algorithm can be tied back to a body of knowledge, such as mathematics, from which the algorithm is obtained.

Search is informed by human intuition - it is heuristics that drive search ranking. Expressed as what programmers tend to call an algorithm, a heuristic embeds in a computer program, social and cultural assumptions. Expression as a computer program doesn't make the heuristic and the assumptions behind it, transparent, or remove the cultural and social biases. If anything, representing a value laden heuristic as a neutral algorithm allows Google to conceal US cultural values as eternal truths - deceiving the observer into thinking that human factors play no part in choosing how to rank search results.

Google's problem with Europeans is that at every turn, Google shows it favours the US view of the world and US interests, over non-US. Why then, faced with a search engine based on human informed heuristics, should we not fear implicit US favour in the results?

Explain the technical mechanisms, and demonstrate in actions, that people outside the US are regarded as being of equal value, and you may remove the fear that Google is implicitly favouring US interests.

When you have US residents, paying US taxes in US dollars, subjected to parochial US media, under US law, why would we assume that you're thinking about us, and valuing us? Show us that you genuinely *care* about us, and you'll change opinions.

You are welcome to comment here, but your remarks should be relevant to the conversation. To keep the exchanges focused and engaging, we reserve the right to remove off-topic comments, or self-promoting URLs and vacuous messages