Thursday, February 3, 2011

The Microsoft-Google spat explained

It took me a while to understand the Google-Microsoft spat regarding Bing incorporating Google results into Bing searches. Its important for us because we own a small position in Microsoft and a seven times bigger position in Google.

My first reaction was that Microsoft has engaged in theft. That reaction has softened slightly – but only slightly. I need to walk you through it.

Google's suspicion came from a very strange search: tarsorrhaphy. As Google notes tarsorrhaphy is a surgical procedure on eyelids. They started with an unusual misspelled query [torsorophy]. Google returned the correct spelling—tarsorrhaphy. At that time, Bing had no results for the misspelling. Later Bing started returning Google's first result (a Wikipedia page) to their users without offering the spell correction (Google posts screenshots). This was very strange. How could they return Google's first result to their users without the correct spelling? Had they known the correct spelling, they could have returned several more relevant results for the corrected query.

Google then puts out a veritable honey-pot of strange searches – for example “hiybbprqag” - and they rig their search engine to relate those searches to completely spurious pages. Lo and behond – three weeks later the same searches get linked to the same spurious pages in Bing.

Lets explain what has happened here. Lots of people (including 20 Google engineers) install Bing toolbars on their computers. The Bing toolbars report click-streams to Microsoft. Microsoft thus learns that when people search Google for the odd term “torsorophy” their next click is the Wikipedia article on tarsorrhaphy. Bing does not correct the spelling (it has not got that trick right) – it is just copying the Google users clickstream.

If it copies enough clickstream of course – and especially for rare searches such as torsorophy then it will copy the results.

Microsoft – in their response – claimed that Google used “clickfraud” to rig the results of the test – getting enough Google engineers to install a toolbar that they could rig the Bing results. I want to analyse that response.

Firstly the term "clickfraud" here is (a) emotive and (b) (I suspect knowingly) misused. If I were to put google adverts on this blog and then click those adverts myself so that I got revenue that would be clickfraud. Google has ways of stopping this (with considerable but not total success). But there is no fraud by Google in their click-stream. Microsoft is just saying that clicks resulting from a Google search page are perfectly valid to incorporate in a Bing result.

Bluntly: It's one thing to collect click data, it's another to look at what one search engine returns as a result for a query and add that to your index.

So, what Bing is effectively doing is saying: "we use Google ranking algorithm as one of our signals" but only via the mechanism of the customer's click-stream.

Do they really think this is an honest way of doing business? (If they directly stole the data by entering a search into Google themselves that would be criminal theft. But they get the results anyway via the click-stream.)

The antitrust case

The Microsoft antitrust case argued that Microsoft extended their monopoly in one thing (operating systems) by bundling other things (browsers) into the operating system (and hence killing netscape). I always thought this was a weak case because you could always download a netscape browser if you thought it was better. You can still download a browser based on Firefox from Netscape – and the monopoly in software hardly stopped Firefox and Chrome being the browsers I use (even in those rare times I use a Windows computer).

But the anti-trust case here is absolutely solid. Microsoft bundles the Bing toolbar with some versions of Windows (certainly if you click all the options once you open the included browser you will wind up with a Bing driven machine). It then takes streams that say if someone searches Google for X and then clicks Y then we should copy this and make a Bing search for X present result Y.

They are thus using their power in operating systems to copy (I would say steal) Google's results and hence weaken Google's business.

This is precisely what the antitust case failed (in my view) to prove with browsers.

Google has just asked Microsoft to cease using Google search results to populate Bing searches. Microsoft is pussyfooting around on this.

If they don't cease pussyfooting then methinks it is time to reopen the antitrust case.

John

Postscript: A lot of people seem to have a different view. They argue that Microsoft received consent from the users of the toobar - end of story.

Microsoft did not receive consent from Google.

Its Google's results that Microsoft is copying.

They are not copying them by inserting search terms themselves and copying them. (That would be criminal.)

They are copying them by watching ordinary Americans enter search terms and watching where those people then click.

29 comments:

dd
said...

I don't get this. Bing is just using data which is provided to it by its users, via their clickstream via the toolbar. If that's "stealing", then what would you call the way in which Google uses the link structure of other people's websites in order to weight its "pagerank" algorithm? It wasn't Google that decided that your blog was a better result for "financial blog" than Ben Stein's - it was a bunch of other people with websites, whose judgements were then scraped up automatically by Google. This actually was a pretty live debate in the early days of search engines.

The clickstream of what people do after searching for any particular tool is the users' data, not Google's and I think I would rather resist any attempt by Google to declare ownership over it.

Here's what actually happens (and the google article is very specific): when users use the built-in bing toolbar with google as a search provider, and then click on a link in google, the pair "search-term" x "target link" is collected as part of the opt-in data collection of the bing toolbar. This stream of events is submitted to Microsoft.

Then those google engineers installed IE + Suggested sites + Bing Toolbar, accepted the opt-in dialog to submit user data, set the search provider to Google and started entering the new terms in the toolbar to search.

As per the user agreement, the search-term from the toolbar and the link clicked results get submitted to Bing.

Sidebar: As part of the Bing search engine optimization, if Bing finds that most users who search for "ice cream" finally click on "Ben and Jerry's", this will get "Ben and Jerry's" a higher score for results. It's just one of many (thousands) of factors determining what gets scored how.

Here is where it gets interesting - the words chosen by the google engineers were unique by design. Thus, the search index of Bing had no data for them, except from those google engineers submitting "cajfnkdgjvksngkr" throught the search box and then clicking on a specific link.

Given that that's the only data that Bing has on this new word, of course that page is now going to rank high on Bing.

Let that sink in. The approach taken by Google wasn't a scientific experiment or honey pot, it was a deliberate and well-known technique for injecting associations into Bings data corpus.

Specifically, Bing didn't copy Google results, it didn't do anything special for Google whatsoever, the data collection is disclosed on install and this technique (determine which links are clicked following a search) is one of the common methods every browser today uses to improve results. (This includes the address bar of Google's own Chrome browser.)

This technique has nothing to do with the robots.txt file - the robots.txt file is only read by webcrawlers scraping sites. In this case, all the relevant data is entered directly on users machines, the server side robots.txt never comes into play.

Furthermore, the technical approach taken here (click manually on specific links in response to a search term) is *exactly* one primitive form of clickfraud to inject terms and increase relevancy.

The interesting part of this story is that this is completely obvious to anyone who understands how websites and search engines work. It is hard to understand for consumers.

This makes the way that Google presented the story interesting. While a naive consumer would be unaware that this is the natural result of "teaching" a search engine about an association through search and follow, anyone working on search engines would have been well aware of this and expected exactly this result.

Google using obscure and unclear language and accusing Bing of theft and cheating is a willful and very likely intentional misrepresentation of the facts of the case.

Bings response is technically accurate, but suffers from the typical curse of the expert - it is meaningless to consumers, who assume that it is a denial and implicit admission of guilt.

This is nothing but a clever google PR stunt, and actually a quite aggressive and malicious slander, something I hadn't expected from Google of all places.

There is some more in-depth discussion about this in various places, but hackernews is a good place to start: http://news.ycombinator.com/item?id=2167875.

just an anecdote…i have accelerator addons to IE8 for 7 different search engines and prefer google; however, when i want to find a mauldin article by title at ritholtz’s blog, i have to use bing…its seldom on google’s first two pages of results…

i wouldnt encumber my browser with all those search engines if i didnt see them working differently

You're arguing that inferring rules from observed human behaviour should be illegal.

For instance suppose a fund manager notices that whenever a particular company reports a strong result, over the next few days its competitors will perform strongly on market because lots of people buy the competitor. This is just like noticing that just after someone has typed "XYZ" into a search box, they click on a particular link.

Suppose you notice the company performing well. You predict over the next couple of days everyone will buy into the competitor. You buy the stock and make lots of money.

I looked up the Chrome privacy notice, you can turn off or "opt out" of providing usage statistics to Google.

Assuming you haven't "opted out" then this applies:

When you type URLs or queries in the address bar, the letters you type are sent to your default search engine so the Suggest feature can automatically recommend terms or URLs you may be looking for. If you choose Google as your search engine, Google Chrome will contact Google when it starts so as to determine the best local address to send search queries. If you choose to share usage statistics with Google and you accept a suggested query or URL, Google Chrome will send that information to Google as well. You can disable this feature as explained here

Interestingly enough, they then say, on the opt out page:

Usage statistics contain aggregated information such as preferences, button clicks, and memory usage. It does not include web page URLs or any personal information. Crash reports contain system information at the time of the crash, and may contain web page URLs or personal information, depending on what was happening at the time of the crash.

So if you search for something, they recommend a link and you click on it, they get that "click" info - but, forgive my cynicism, "they won't know the URL!" Just think about that for a second and how that compares to MSFT's clickstream usage, which has also been agreed to when you install the Bing bar.

Just my 2 cents, and reading blog posts by 3rd party people that work in the search engine industry.

I think you're off base here because it is Google which is now the subject of greatest regulatory scrutiny on the Internet. Microsoft is simply collecting clickstream data that it has been given authorisation to collect. I imagine it also collects clickstream data at vertical search engines as well. This is not stealing. The are not scraping Google's site or content, they are tracking user clicks -authorised and anonymously.

I sense your comments are motivated in large part by pre-conceived negative animus directed at Microsoft and its previous anti-competitive behaviour. For the record, I share those sentiments. But this issue is separate, both from an anti-trust and legal perspective.

I have written a post which I believe better explains the background issues.

If that's *all* that Bing did, it would be questionnable, but it still wouldn't be stealing.

Stealing GOOG's data, or software, or algorithms is stealing.

Improving your product through the observation of clickstream data is not. Not a single software engineer out of the dozen I called about this [most are MS haters to boot] agree with your sentiment, fwiw.

Microsoft is in no way copying Google search results. One of the many inputs they use to build their search engine index is observed user behavior from opt-in usability studies on *any* search.

This is exactly what Google does in Chrome - they observe what you type in the address box and what you eventually click on.

This is by no means the only information that informs the search index.

But if someone - like Google - sets up an artificial situation where all the other inputs are nonexistent by design, and then feeds that specific data to the Microsoft search engine, that will be the only data the search engine has, and therefore that result will of course get top billing.

I understand your concern about your business. Note that the robots.txt file has a very specific function - it instructs webcrawlers to voluntarily ignore part of your site. This is nothing either Google Chrome or the Bing Toolbar can know about, so they are not willfully infringing anything. There is an unsolved technical problem here - how would you protect links you don't want public? Note though that this problem is independent of this case - it exists for both Google and Microsoft and more generically anyone using any browser.

Now, the weird thing here is that Google knows all this, it knows that they themselves are doing exactly the same and that their "experiment" was anything but scientific but designed to inject specific pairs into the Bing index. (Note also that they only succeeded for 7 out of a 100 attempts, which makes it very clear that Microsoft isn't copying Google but that Google worked very hard to whip up this PR storm).

And yet they publish on a wide front an article that people who don't work in the industry will understand exactly as you have.

Best analogy I've found so far: what Google is doing is like sending your business plan in dozens of plain envelopes to the competition, pay a number of folks to shove copies via underneath the door and insert some into the junkmail and then call the police to raid their offices, discovering - copies of my business plan! Shocking.

This is wrong because, it is the user, the person behind the computer, that decides which link to click. The information for which link to click came from the user's biological brain and not some silicon chip in a google data center. MS copied information from the user's human brain.

In a way, what MS did is not much different than the way google's page rank mechanism worked. The page rank mechanism rates web pages base on links from other web sites. Ultimately, it is the web site operators (again human), that decides what links point to. Google is copying the web site operator's human brain.

Circling back, Google _is_ presenting to the users a set of choice to choose from. That came from a Google data center. In order to prove MS is copying Google, you have to demonstrate MS copied the set of choices google presented to the user (and not what the user choose to click on).

Granted, the list google and other search engines presented to the user does influence the user's click choice. Is that sufficient to judge MS guility of copying from all search engines in the market place? Even when the key piece of information came from the user's brain?

By the way, I don't have a dog in this. I think both MS and Goog are too big to be much better than QQQQ. Google maybe some what better because of Android.

On the other hand, Fannie Mae and Freddie Mac are much more interesting! Since the beginning of 2009, I've read every one of your post. I'd love to read a follow up on your current thinking, maybe a reassessment of your orignal analysis.

After reading the documents you linked, it looks to me that some Bing users have opted to use a feature that has their current activity used to "improve" subsequent search results (perhaps for all users of Bing). This does not sound like a Google-specific feature. If I use search engine X -- not Google -- and send a letter to Steve Ballmer informing him that I searched for Y and subsequently clicked Z, I would certainly hope that Ballmer would update Bing to associate Y with Z. This is not theft, this is a service performed for the users convenience.

Fascinating. You've gotten quite the collection of pro-Microsoft comments, which is really not the view of the tech community overall. (I wonder if you have an odd set of followers, or if you've been targeted by MS PR flacks?)

John, your understanding is exactly right - strip away the doubletalk and you find Bing was including results based on what users clicked in Google (if they had default IE settings). I have no idea if that is stealing or illegal, but it is certainly what we professionals call "LAME".

Though I guess this undercuts those of us who thought B.I.N.G. stood for "But Its Not Google" :)

Haha... kidding with that ycombinator link right? Follow it back and the original author says "If I could go back and change only one thing in my original story, I’d have made the headline "Google: Bing Is Cheating, Copying SOME Of Our Search Results."

Not wholesale copying - just keeping an eye on what users do on Google, and using that (weak) signal when Bing can't figure out another appropriate result.

But again, cut through all the doublespeak and read the experiment - the results ended up on Bing ONLY because they were on Google. The results would NOT have appeared had they NOT been on Google (since they were bogus results to prove the point). That's a form of copying, factually speaking.

I have been told by a friend (an computing academic staff member at the Uni of Southamptom - I don't know if he's a professor or not) that the Google Toolbar acts in exactly the same way. Just as MS copy Google (and indeed, everyones) results through the clickstream, Google copy MS' results (and indeed, everyones) through the clickstream. He also said he thought Google had been doing this for longer than MS and that he thought this move was indicative of concern on the part of Google for their position in the search market (which puzzled me, since they seem entirely secure in their dominance - perhaps I'm out of touch).

There's no need to be confused about this controversy. Quite plainly, Google does not have an exclusive right to the knowledge, and hence must be guaranteed property rights to, the fact that as a user I find Z relevant to Y (even if Z is a "honeypot") if I choose to send this information to Bing.

@Anon 4:48 PMRegarding "because google engineers repeatedly sent information to bing" - well, the engineers searched for things in Google and clicked on the top (rigged) Google result, and Internet Explorer sent that info back to Bing... That's the only truth to that statement. But all we are doing is describing the mechanism by which Google's results got to Bing.

If you (or others) are sincerely confused (which I doubt), then you should go read the original description carefully:http://searchengineland.com/google-bing-is-cheating-copying-our-search-results-62914

BTW, that Jacques Mattheij thing is slightly amusing, but note that it does not actually show Bing search results. It just shows the Bing instructional copy.

Slight tangent, but it's been coming up in these discussions: I have no clue what info Chrome and the Google Toolbar might be gathering. Curious to hear if anything interesting shakes out on those over the next few weeks, since I'm sure people are looking now.

John, I think your use of the example of robots.txt files is entirely apropos here, but it rather underlines the legal position. No robots flags are, to say the least, not legally enforceable. The search engine companies respect them out of a general sense of courtesy, and a wish not to create badwill (which as I say, was potentially considerable in the early days of search engines), but if Google were to decide tomorrow that it was going to ignore robots.txt files there would be nothing legally preventing them.

General disclaimer

The content contained in this blog represents the opinions of Mr. Hempton. You should assume Mr. Hempton and his affiliates have positions in the securities discussed in this blog, and such beneficial ownership can create a conflict of interest regarding the objectivity of this blog. Statements in the blog are not guarantees of future performance and are subject to certain risks, uncertainties and other factors. Certain information in this blog concerning economic trends and performance is based on or derived from information provided by third-party sources. Mr. Hempton does not guarantee the accuracy of such information and has not independently verified the accuracy or completeness of such information or the assumptions on which such information is based. Such information may change after it is posted and Mr. Hempton is not obligated to, and may not, update it. The commentary in this blog in no way constitutes a solicitation of business, an offer of a security or a solicitation to purchase a security, or investment advice. In fact, it should not be relied upon in making investment decisions, ever. It is intended solely for the entertainment of the reader, and the author. In particular this blog is not directed for investment purposes at US Persons.