Google vs. Bing: Fight! #copygate

On Feb 1, 2011, Danny Sullivan published a post on Search Engine Land describing google's allegations that Bing is copying google results to improve their own search results.

Google set up some honeypot pages for nonsense words, and the only way Bing could get those was if they were sniffing them out through Internet Explorer usage out of Google.

Based on the discussions I've seen on this, it seems that Bing is using the Internet Explorer browser and permission from users to watch their browsing behavior, they're watching what people search for on Google and other search engines, as well as what they're clicking on, to collect information for improving their own search results.

Google suspects this is much more widespread than just the obscure searches that they've used to definitively identify this behavior. It's just really obvious to see that Bing is picking up search terms and search results, likely from Google employees running these searches on IE browsers.

This list contains some of the interesting posts and tweets surrounding this scuffle.

This has been dubbed #copygate on Twitter. Check that hashtag for a finger on the pulse.

Today, Google called Bing to the carpet for using search data collected via the Internet Explorer browser to enhance their own search listings. Essentially, Bing collected information on what their IE users search for on Google, and what they click on, to add to their own search index.

Is Bing copying the hard work of Google engineers, recycling stale search engine results from Google? Or are they using legitimate signals from their users who opt in to anonymously share their clickstream information?

By choosing gibberish search terms that produced no natural results, Google effectively eliminated all other factors that could have influenced Bing's rankings. The terms were found on none of the pages in the index and certainly aren't used in any links pointing to any sites. So when Google engineers suddenly provided a data point (the click data collected via the Bing toolbar) for Bing to latch onto, it's no surprise that even a minor factor was able to influence 9% of the rankings.
There are a couple of points that are very important to establish at this point. First, ranking the same site #1 for any given query doesn't mean Bing is "copying" Google's results.

Second, the data being collected does NOT belong to Google. Yes, the data is being collected when the user is visiting Google, however, the users opted to provide their use data to Bing. Bing didn't somehow hack into Google's data, and Bing didn't scrape Google's search result page. Bing used data provided by their users to influence their rankings, which is a practice they've readily admitted is in use.

or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

From the BigThink conference, Bin's VP of Search had interesting responses to Matt Cutts' questions about Bing copying Google:

s not like we actually copy anything. It’s really about, we learn from the customers — who actually willingly opt-in to share their data with us. Just like Google does. Just like other search engines do. It’s where we actually learn from the customers, from what kind of queries they type — we have query logs — what kind of clicks they do. And let’s not forget that the reason search worked, the reason web worked, is really about collective intelligence.

Matt Cutts asked Bing point blank if they are stealing keyword rankings from Google. The question was mostly sidestepped as far as how specific honeypot results ended up in Bing, but he said that, "We learn from customers who opt-in to share the activity they do."

So essentially, Bing watches what their users type into Google searches and apparently look also at what they click on in the results.

Google's engineers working on the honeypot sites and searches were acting as "consenting customers" of Bing.

To verify its suspicions, Google set up a sting operation. For the first time in its history, Google crafted one-time code that would allow it to manually rank a page for a certain term (code that will soon be removed, as described further below). It then created about 100 of what it calls “synthetic” searches, queries that few people, if anyone, would ever enter into Google.
These searches returned no matches on Google or Bing — or a tiny number of poor quality matches, in a few cases — before the experiment went live. With the code enabled, Google placed a honeypot page to show up at the top of each synthetic search.
The only reason these pages appeared on Google was because Google forced them to be there. There was nothing that made them naturally relevant for these searches. If they started to appeared at Bing after Google, that would mean that Bing took Google’s bait and copied its results.
This all happened in December. When the experiment was ready, about 20 Google engineers were told to run the test queries from laptops at home, using Internet Explorer, with Suggested Sites and the Bing Toolbar both enabled. They were also told to click on the top results. They started on December 17. By December 31, some of the results started appearing on Bing.