Recent Posts

IntroductionI noticed some strange behavior in queries about me, which seemed to begin just after I had published an article in the Huffington Post that might have offended Google. These patterns are strange enough that I examined several dozen queries on Friends of Google (FOGs), Not-Friends of Google, and Possible-Enemies-of-Google (PEGs). The data suggest two possibilities. Either (i) Google deliberately overrides some queries and hacks some queries on PEGs or (ii) the counts returned in the number of results line are pretty random.

Some History — How I Got to be a PEGI criticized some statements that were made at a Congressional Hearing on Antitrust, saying, “It’s time to wake up and smell the antitrust.” I then went on to describe the danger of third party payer business models in a piece posted in Silicon Valley Insider; this was not specifically about Google, but Google is the best-known current example of where one party uses a system (for example, to search), a second party pays to be included (the hotels, airlines, or camera companies that want to be found), and a third party collects. Since the first party isn’t paying, it doesn’t much care what the fees are, taking almost all market discipline away from the prices that the second party can charge the third. I thought these were pretty good stories, and I wanted to make sure that they could be found.

So, yes, I was ego surfing. You know what ego surfing is — you key in your name in quotes and wait to see what comes back. Serious ego surfers know about how many results to expect, and about how many to expect for their friends. Sometimes Google will change an algorithm, and your counts will go up or down, but before you get too excited you check on the counts of a couple of colleagues to determine if you just surged, crashed, or just experience an algorithm update.

Both of my articles were easy to find. Then I wanted to see where each was on a general list of my post, so I did a search on my name in quotes, the classic ego surf query. And I had vanished. Not entirely, of course. But a search on my name returned about 11,000 results. Down well over 90 percent from the previous time I checked. Had I suddenly offended so many Web sites that they pulled all references to me? I then queried “Eric K. Clemons” … not everyone who references me uses my middle initial, so these counts are usually about half as large as a query without my middle initial. “Eric K. Clemons had not vanished, but was holding steady at about 76,000 hits. Curious.

The Search for a PatternWas Google hiding my antitrust work? Searching for <<”Eric Clemons” –antitrust>> continued to return about 149,000 results. So was Google was hiding my antitrust work? No … the antitrust work seemed to be listed in the same places it always had been in the queries on just my name.

Was Google just hiding me in general? Were they trying to make me look less visible, or less significant in some way? What if they just had a manual hack on the basic query involving my name? So I tried a query that should have been equivalent to querying on just my name but so silly that it would not have been hacked; this was a query that I suspected no one had ever thought to enter. I tried <<”Eric Clemons” –gerbils>>. Yes, that was about 169,000 results, pretty much what I would have expected on an unhacked search on my name. Looking for <<”Eric Clemons” –poodles>> or <<”Eric Clemons” –weimaraners>> or <<”Eric Clemons” –habanero>> produced very similar numbers. So maybe they hacked me, but only on the obvious query on my name alone. The extent of the possible hacking appeared to be enormous; the meaningless exclusion of gerbils or of other nonsense terms increased the number of results returned by a factor of 15.36, or by 1,436 percent.

But that’s just me. We need a second data point, on another guy they don’t like. I repeated the examination, using the name of my colleague Ben Edelman at Harvard in place of my own. I knew in advance that they don’t like Ben either. And the results were very similar to queries on my name. Ben is hardly visible when you search for him as <<”Ben Edelman”>>. He has more results when you exclude antitrust. But when you exclude gerbils, poodles, weimaraners, or habaneros he really begins to shine again Ben has a gerbil ratio of 7.05, and the exclusion of nonsense terms increased his results returned by over 600 percent. Data on Ben, on me, and all the other queries can be found in Table 1, below; all data in table 1 were correct as of 14-19 October, and all are backed up by screen scrapes.

Table 1

The First Simple Question — Spite or UnreliabilitySo now we have a simple model of what is going on here, either unreliability or spite in the estimation of the count of results returned.

1. Hypothesis I (The Unreliability / Poor Quality Control Hypothesis) — This hypothesis would argue that the approximate number of search results returned by any given Google query is pretty random. I mean subtracting stuff out should make it smaller. Excluding articles and posts on antitrust, which Ben and I write about, should reduce our counts, not raise them. Subtracting out gerbils, poodles, or other things that really are unrelated to our writings should leave the counts largely unchanged.

2. Hypothesis II (The Spite / Hide Our Enemies Hypothesis) — Google does not like Ben or me. This hypothesis would suggest that indeed they dislike us enough to manually hack queries relating to us by name, placing a significant factor over-ride in somehow. But they don’t to have done this on all queries relating to us by name. Hence adding a term subtracting out something we don’t write about might lead to a query that has not been hacked. ince this generates an unhacked query, it results in unhiding the previously hidden results, and thus there is a huge increase in the reported number of results returned.

Expanding the Data Set — The Gerbil EffectHow do we decide between Hypotheses I and II? Let’s look at queries on people they like, or are at least publicly neutral about. For President Obama, about whom they were favorable early and enthusiastically, there does not seem to be any significant query hacking. His base approximate number of results, just searching on his name, returns about 60,000,000 queries. Subtracting out gerbils, poodles, weimaraners, or habaneros still produces an increase, but only a very small percentage increase; it’s nothing like the 1,436% increase in results returned for me or the 605 percent increase in results returned for Ben. President Obama has a gerbil ration of 1.14. The results for President Clinton are very similar, with a gerbil ratio of 1.09.

Let’s try J. Baxter Newgate III. I know that Google is pretty neutral about Newgate, principally because I know that he does not exist. He is the pen name for an author who writes crossword puzzle books. Since he does not exist, he does not have a private life, or indeed, a life of any kind; he has not published anything that could have pleased or displeased Google, written any letters, or signed any petitions. And the largest effect I could observe, excluding gerbils from queries, produced only a 0.76 percent increase in the number of results returned. This is a gerbil ratio of 1.0076. It might appear that someone at Google does not want you to think that Ben or I are significant public figures but does not much care what you think of J. Baxter Newgate.

So, after looking at query result figures for five different individuals we might suspect that we could argue in favor of the Spite Hypothesis. But we can’t decide so quickly.

Let’s look at Hal Varian, who I suspect was Google’s favorite academic, and who is now Google’s own chief economist. The results for Hal seem remarkably similar to the results for Ben Edelman. Friend Hal enjoys a maximum increase when subtracting out gerbils; leaping from 113,000 returns to 868,000 returns, Hal’s visibility without his writings about gerbils increases enormously, by 668 percent; Hal’s gerbil ratio is 7.68. So the gerbil factor does not appear to apply only to enemies. Does it apply only to faculty members? Our data set is not yet large enough to allow us to decide.

Expanding the Data Set Further — The Weimaraner Effect and the Habanero Anomaly

Next I looked at Eric Schmidt, Google’s Chairman and presumably a trusted friend and ally of the firm. His counts are enormous, of course, and at first I thought that they were pretty stable. Yeah, they go up a little, by about 12 percent when you subtract out poodles, gerbils, and habaneros; CEO Schmidt has a gerbil ratio of 1.12, between Presidents Clinton and Obama. But when you exclude weimaraners there is an enormous surge; there is a 1,189 percent increase in results on Eric Schmidt when you subtract out weimaraners, with a weimaraner ratio of 12.89.

So, is there something special, perhaps a GreatMan-weimaraner interaction between great people and weimaraners on Google search? It’s hard to be certain. There is the same anomalous spike in returned results when submitting queries on Mahatma Gandhi. But this is not true for all great people — there is no weimaraner effect for John Lennon, for President George W. Bush, or for Bill Gates. There is a small weimaraner-effect for Karl Marx (multiplier of 1.32), but a much more pronounced weimaraners-effect for his colleague Karl Engels (multiplier of 3.05).

Finally, there is the great habanero anomaly, which so far seems to affect only Eric Schmidt of Google. Excluding “habanero” in the singular from queries on Eric Schmidt produces no great effect, but subtracting out “habaneros” in the plural gives Schmidt a surge fully comparable to the GreatMan-weimaraner interaction.

Analyses of the Data and ConclusionsSo, what can I conclude? With samples this small, using so few key words and so few individuals, nothing can be established with statistical reliability. I therefore invite all readers to pick their 10 favorite people to search on, their six favorite nonsense words, and repeat my experiment. But I cannot yet reject either hypothesis.

It seems unlikely that Google would really take the time to hide search results on me, on Ben, or on a few other insignificant researchers, no matter how important we might think we are; we just don’t make that much difference to Google. Likewise, it is unlikely that somewhere in Google headquarters there is a program who loves weimaraners, or loves people who do not associate with weimaraners, and has carefully hacked the results to produce a GreatMan-weimaraner interaction.

However, whether the results on Ben and on me are due to malevolence or the same unreliable reporting of result counts that appear to be prevalent in the other cases, the unreliability of the counts is striking. Until recently I had used counts of results reported as a rough measure of the significance of an individual or a piece of work, or at least of their popular visibility, which is not quite the same thing as significance. Obviously I cannot use the number of results returned as a measure of anything until I understand them further. Can I use Google Scholar, or any other Google site, and assume that results have been returned with an even hand, without fear or favor, and with care for their accuracy? I suspect not.

Late Breaking NewsGoogle never sleeps. As of this morning the GreatMan-weimaraner interaction has been addressed, as is clear from a screen scrape from 14 October and from 21 October, although the habaneros anomaly remains. Watch for further changes. And if you want to see all of my results for any reason, remember to exclude the gerbils; as of now, the gerbil ratio remains unchanged.

Recommended For You

The Board Room

Editors' Picks

Hi Eric, the answer to the headline's question "Is Google Guilty Of Deliberate Query Sabotage?" is no. We've talked about the fact that results estimates are just estimates for years, see e.g. http://video.google.com/videoplay?docid=-4814548594071648913# or http://www.youtube.com/watch?v=2ix3mHeL7hg for more details, including the fact that we only return three significant digits on our results estimates.

As to why the query [A B -C] can return more estimated results than [A B], that's easy to explain. The query [A B -C] causes us to go deeper through our posting lists looking for matches, which can lead to more accurate (and larger) results estimates. Other things can cause us to go deeper in finding matches, such as clicking deeper in search results. Results estimates can also vary based on which data centers or indices your query hits, as well as what language you're searching in. It certainly has nothing to do with whether you're a "possible enemy of Google," as you put it.

We try to be very clear that our results estimates are just that--estimates. In theory we could spend cycles on that aspect of our system, but in practice we have a lot of other things to work on, and more accurate results estimates is lower on the list than lots of other things.