Matt Cutts confirms AdSense media bot in natural search index

By Jenstar on April 18, 2006, 8:35 pm

Matt Cutts confirmed today that the AdSense mediapartners bot (aka mediabot) is indexing pages for use in the Big Daddy Google index. Both Greg Boser and myself have found evidence of mediabot’s crawls for AdSense ad targeting purposes have ended up being used in the natural Google search results. Shoemoney, who attended the Google sponsored luncheon at PubCon reports:

At the Lunch sponsored by Google today Matt Cutts confirmed the recent rumors about media bot results getting into Big Daddy. Matt said it is a bandwidth saving feature to have GoogleBot and MediaBot both contributing to big daddy. Matt also stated that you will gain zero advantage in search listings however if you are serving different content to MediaBot then to Googlebot then you could be in trouble.

It could definitely be used as a tool to detect when content is being cloaked for either the Google or AdSense bot, particularly since the mediapartners bot has been indexing pages since at least the beginning of February.

It will be interesting to see if other consequences arise for webmasters, such as excluding pages for googlebot via robots.txt that end up being indexed via the mediabot. But very nice to see an official confirmation on this from Matt at Google!

JenSense posted about it a few days ago and now it seems that Matt Cutts confirmed it. The Mediabot from Google, the one used for the AdSense targeting is also being used to add sites to the Google Index, before

During my last job, the company owned litterally hundreds of test sites, and they were more black than white, we noticed that those with Adsense ads on it were still getting cached and indexed, while the others without adsense were de-indexed….

Jen, you raise a really good question. If a page is crawled via MediaBot, does Google respect the Googlebot rules in robots.txt when it considers storing the page in the index? We all need to know the answer to this one. Maybe somebody can pose the question to Cutts this week.

Thanks for posting this clarification, Jen. I was hoping to do a post about the crawl cache, but haven’t gotten a chance yet b/c of getting pulled into panels here at WMW. Lemme know if you want to chat more about it though.

Scott you asked “If a page is crawled via MediaBot, does Google respect the Googlebot rules in robots.txt when it considers storing the page in the index?”

The answer is yes: the Googlebot does the correct/conservative thing; even if a page is in the crawl cache, it will respect the Googlebot rules in the robots.txt file. So we’ll obey the robot.txt rules for each crawler.

I don’t know where that fits in, but for what it’s worth – the mediapartners bot is also the one used for the new “Related Links” pseudo-feature. Here comes a new generation of sites with Adsense in a hidden DIV