Unofficial news and tips about Google

December 10, 2007

Google Finds Less Search Results

Everybody should know that when you use a search the engine, the number of search results is just an estimate. You only look at the first 10 or 20 results anyway and, in some cases, the search engine doesn't let you access more than a certain number of results. For example, Google only lets you see the top 1000 results, mostly for efficiency reasons.

"When you perform a search, the results are often displayed with the information: Results 1 - 10 of about XXXX. Google's calculation of the total number of search results is an estimate. We understand that a ballpark figure is valuable, and by providing an estimate rather than an exact account, we can return quality search results faster." (Google help center)

But recently something has changed in Google's algorithm that estimates the number of results. Here's a comparison between the number of results for [Moby] in May (notice the recently-launched bar that used gradients) and today:

Searching for Moby (May 17, 2007)

Searching for Moby (December 10, 2007)

From 15 million results to only 2 million results, there's a long way. For the same query, Yahoo estimates 18,900,000 results, Microsoft finds 7,730,000 results, while Ask only finds 4,089,000 results. Notice that all the other three major search engines show bigger numbers than Google. You might think that this query is just an exception, but that's not the case. Almost every query shows much less results in Google than in other search engines.

And even if this estimate has never been reliable, it's strange to see a such an obvious inaccuracy. If you use complicated queries (more than 3-4 keywords), the estimates become more accurate and Google starts to show more results than other search engines.

In other related news, Google started to treat subdomains the same as directories for some queries. "For several years Google has used something called host crowding, which means that Google will show up to two results from each hostname/subdomain of a domain name. That approach works very well to show 1-2 results from a subdomain, but we did hear complaints that for some types of searches (e.g. esoteric or long-tail searches), Google could return a search page with lots of results all from one domain. In the last few weeks we changed our algorithms to make that less likely to happen in the future," explains Matt Cutts.

I noticed a significant drop in the number of results for certain phrases I was tracking between September and October of this year -- from 4,300,000 results to 561,000 results in just one month (around the time of a large Page Rank update). That query is now at 217,000 results. I noticed drops in several other queries as well.

Other background information:

I do not have personalized search enabled.

These queries were all made from the same physical location.

The queries are all multi-word queries, with phrases in quotes, and the OR operator ("a b c" OR "a bc" OR abc).

As for the lower results quantity, I also don't think it may be too meaningful necessarily. I wonder if the big difference is due to Google handling permanent or temporary redirects differently? So that Yahoo would see "double" where Google sees only one? A small test shows this could indeed be the case:

Google count for blog.outer-court.com (which is now redirecting to blogoscoped.com): ~2Google count for blogoscoped.com: ~12,200

So, Yahoo counted at least 15,582 more sort of "non-existing" pages than Google. Way to bloat your page count :)

Also, maybe Google kicks some spam sites off the index faster (though I think they should merely lower their ranks, not completely stop indexing them, right)?

But I guess the real question is: how likely do they actually let you find "exotic" pages? I mean that's the only use-case where you'd really need not just "the best" pages but also a really deep & far index to find even the smallest webpage that may contain info.I just formulated a hypothetical research query for instance, which reads [daniel gillespie clowes interview ink pen]. I was imagining that I'm looking for an interview with comic artist Daniel G. Clowes in which he details which tools he uses. I even used his middle name to only get interviews that go really deep about the subject matter:

I'll ask about the results estimates; I think it's independent of the subdomain/subdirectory change that I discussed. But Philipp is right that the only way to truly compare index size is to do queries that return <1000 results and then count the actual number of results.

Google is now shifting its algorithm towards web 2.0 socialization which emphasizes relevancy and consistency of content among its top page results. In other words, they have slimmed down on outdated static web pages which have not been updated in months, years. For instance, abandoned domains.

Also, please note that this new algorithm change is a progressive implementation and may vary from server to server for a few days.

Just wanted to give you an update on this. There was a bug in the serving code that caused result estimates to be low by a factor of up to 40 (depending on the query and the language). This didn't affect the search results at all, just the calculation of the estimate. The fix is rolling out right now, so over the next hour or so the numbers should be back where they're supposed to be. I'll be keeping an eye on this thread in case anyone spots something else.

Try to use a longer keyword, i.e. "health benefits of strawberries", last time I use this keyword on Google (about 2 months ago) it showed me about 1,900,000 results but now its only showing 189,000 results.

Btw, before this bug showed up, the results are returning so many relevant content but now the results are a little messy and kind of unrelated.