Optimizing for Google Webcache

Last Updated March 15, 2015

The Google Webcache is a saved copy of your website, downloaded by the GoogleBot indexer.
Visitors can visit the cached version of your pages by clicking on the link shown on the Google search results page next to each site URL.

By default, Google offers the most recently scanned version of your page to searchers through the "Cached" link on the page options menu.

Google provides this alternate copy of your page in case your web server ever goes down, and it is also useful for pages that update often. Most importantly, it gives the user an indication of what the Google engine is basing its search results on.

How Google transforms your page.

Not to confuse users, Google provides a fairly unobtrusive header to the top of cached pages indicating the time the document was last fetched by the GoogleBot, and even includes a link to a "Text-only" version, void of any images or attached CSS stylesheets.

To create this header, Google Webcache makes a very simple code addition to the top of page HTML source:

The bottom <div> tag is never closed, but browsers will auto-close it - more on that below. In fact, the result of adding body elements outside of your defined <body> tag does not result in properly formatted HTML. However, browsers are engineered to handle these kinds of inconsistencies, and Google seems fine with the simplicity.

Can I remove the header?

Yes, there are CSS tricks you can place that remove the header completely.

If your <body> source doesn't start with a <div> you can place the following CSS to remove the first div tag that comes immediately after the document body. My website code traditionally begins with a <nav> element, so the following works great for me:

Having more issues with CSS

I knew that Google's second <div> had a CSS styling of position:relative. Because this is hardcoded (inline) with HTML, I don't get to override the relative positioning. However, all I really need to do is to add an additional rule of 100% height:

My word of caution: Your HTML structure can change, so take care that in your attempt to fix a small problem (such as one introduced with Google hosting a cache of your page), you don't introduce a much larger problem with site functionality in the future. The worst thing you could do in this case is to hide the very important first block on your live site to live visitors.

Don't cache the webpage at all

If you don't like the idea of search engines like Google keeping alternate copies of your site on the web, request that they don't.

In your HTML <head> place the following meta tag:

<meta name="robots" content="noarchive" />

Placing this tag is only a request that search engines not provide links to cached content. It does not force an immediate removal of cached content; neither does it stop other bots from archiving your site online. (Like the Wayback Machine)