The Blog

Google engineer explains the principle of Googlebot capture webpage

The Matt Cutts of Google affirmed the Mediabot of AdSense is the news that can help Googlebot capture webpage really, but partial person does not believe Matt Cutts, perhaps do not believe he can represent Google government. As a of the Blog of Matt Cutts faithful reader, I feel complete to do not have necessary expenditure length not to show the authority of Matt Cutts especially again. I think those who say is, matt Cutts is engineer of a of Google character management department advanced software, what environment place knows is the technology that he is in charge of research and development preventing Spam and malign control rank. Accordingly, the letter is not believed of course by you.

Actually last what Matt place discloses is merely among them on one hand content. Today, matt wrote a very detailed article again, all sorts of Bot that explained Google are how of capture webpage, and the BigDaddy with newest Google has the change with new what to wait a moment in capture webpage respect, content is exceedingly wonderful, share with everybody so.

The Crawl Caching Proxy that what want the introduction above all is Google (creeping cache acts as agent) . Matt lifted the example of an ISP and user to show it. When the user gets online, always get webpage content through ISP first, next the webpage cache that ISP can have visited the user rises reserve. E.g. , visited when user A, so China is telegraphic (or the net is connected etc) meet 80 hind poineering base hair sends an user A, next will 80 hind poineering base cache rises, issueing li of revisit one second when user B, so Chinese telecommunication is met send the microphone of the disillusion in cache to user B, can save bandwidth so.

What before no less than stands originally, report in that way, of the software level with newest Google upgrade (move comes BigDaddy) already adjacent finish, because of the Google after this upgrades the ability of each respect will get strengthening. These strengthening included more the Googlebot crawl that intelligence changes, improved normative sex and better collect webpage ability. And in respect of webpage of Googlebot creeping capture, google also adopted the method of economic bandwidth. Googlebot also as BigDaddy upgrade and got upgrading. New Googlebot supported Gzip encode already formally, if your website leaves,opened Gzip encode function so, can save the bandwidth that Googlebot crawls the place when your webpage is occupied so.

Besides improved Googlebot, the Google after upgrading will be used above the Crawl Caching Proxy of place respecting comes capture webpage, with saving bandwidth further.

Channel of poineering base SEO is informed: Google spider climbs to be given priority to all right with Googlebot, what Server A points to is AdSense, and the Blogsearch that Server N can be Google or other. We can see, same a website, the Bot of the Mediabot of Googlebot and AdSense, Blogsearch has crawled, among them a lot of reduplicative crawl. And the Crawl Caching Proxy that the Google place after upgrading uses is what kind of case:

Apparent, because of Crawl Caching Proxy the capture all sorts of Bot cache rises, because this is become Googlebot already capture crosses certain webpage, and Mediabot or other Bot again when capture reduplicative webpage, crawl Caching Proxy can produce effect, return the webpage in cache Mediabot to wait directly, make so actual creeping number decreases, saved bandwidth.

The analysis from Matt can see, google is done so is the bandwidth that can save oneself and website really, advantage is to be able to make all sorts of Bot of Google creeping in proper time more webpages, collect in order to go to the lavatory. My understanding is, although advantage is quite apparent, but disadvantage also is some. For instance, when a website it makes a living with the advertisement cost of AdSense, so its Mediabot ceaseless flashes of light preceding an earthquake with respect to need AdSense is faced, in order to analyse its content that updates a webpage, put in more relevant advertisement. But when this website it is the website with good value of a PR, so Googlebot can crawl probably every day it, come so, crawl Caching Proxy is met rise the creeping cache of Googlebot, when waiting for Mediabot to crawl again, it returns the content of cache to Mediabot directly. Reduced Mediabot to crawl so the frequency of this website. Because two kinds of Bot are not to use identical working mechanism, because this website is possible of the creeping frequency because of this Mediabot decrease and make shown AdSense advertising dependency abate.

Article origin: Channel of poineering base SEO is reprinted please in order to link formal give chapter and verse for.