Robots meta tag and X-Robots-Tag HTTP header specifications

Abstract

This document details how Google handles the page-level
indexing settings allow you to control how Google
makes content available through search results. You can specify these
by including a meta tag on (X)HTML pages or in an HTTP header.

Note: Keep in mind that these settings
can be read and followed only if crawlers are allowed to access the
pages that include these settings.

Using the robots meta tag

The robots meta tag lets you utilize a granular, page-specific
approach to controlling how an individual page should be indexed and served
to users in search results. Place the robots meta tag in the
<head> section of a given page, like this:

The robots meta tag in the above example instructs all search
engine not to show the page in search results. The value of the
name attribute (robots) specifies
that the directive applies to all crawlers. To
address a specific crawler, replace the robots
value of the name
attribute with the name of the crawler that you are addressing. Specific
crawlers are also known as user-agents (a crawler uses its user-agent
to request a page.) Google's standard web crawler has the user-agent name
Googlebot. To prevent only Googlebot from crawling your
page, update the tag as follows:

<meta name="googlebot" content="noindex" />

This tag now instructs Google (but no other search engines) not to
show this page in its web search results. Both
the name and the content attributes
are non-case sensitive.

Search engines may have different crawlers for different properties or
purposes. See the appendix for a complete list
of Google's crawlers. For example, to show a page in Google's
web search results, but not in Google News, use the following meta tag:

<meta name="googlebot-news" content="noindex" />

If you need to specify multiple crawlers individually, it's okay to
use multiple robots meta tags:

Using the X-Robots-Tag HTTP header

The X-Robots-Tag can be used as an element of the HTTP
header response for a given URL. Any directive that can used in an robots
meta tag can also be specified as an X-Robots-Tag. Here's an
example of an HTTP response with an X-Robots-Tag instructing
crawlers not to index a page:

Multiple X-Robots-Tag headers can be combined within the
HTTP response, or you can specify a comma-separated list of directives.
Here's an example
of an HTTP header response which has a noarchiveX-Robots-Tag combined
with an unavailable_afterX-Robots-Tag.

The X-Robots-Tag may optionally specify a user-agent before the
directives. For instance, the following set of X-Robots-Tag HTTP
headers can be used to conditionally allow showing of a page in
search results for different search engines:

Valid indexing & serving directives

Several other directives can be used to control
indexing and serving with the robots meta tag and the
X-Robots-Tag. Each value represents a specific directive.
The following table shows all the directives that Google honors and
their meaning.
Note: it is possible that these directives may not be treated the
same by all other search engine crawlers. Multiple directives may be
combined in a comma-separated list (see below for the handling of
combined directives). These directives are not case-sensitive.

Directive

Meaning

all

There are no restrictions for indexing or serving. Note: this
directive is the default value and has no effect if explicitly
listed.

noindex

Do not show this page in search results and
do not show a "Cached" link in search results.

Do not show this
page in search results after the specified date/time. The date/time
must be specified in the
RFC 850 format.

After the robots.txt file (or the absence of one) has given
permission to crawl a page, by default pages are treated as
crawlable, indexable, archivable, and their
content is approved for use in snippets that show up in the search results,
unless permission is specifically denied in a robots meta tag or
X-Robots-Tag.

Handling combined indexing and serving directives

You can create a multi-directive instruction by combining robots meta tag
directives with commas. Here is an example of a robots meta tag that
instructs web crawlers to not index the page and to not crawl any of the
links on the page:

<meta name="robots" content="noindex, nofollow">

For situations where multiple crawlers are specified along with different
directives, the search engine will use the sum of the negative
directives. For example:

Practical implementation of X-Robots-Tag with Apache

You can add the X-Robots-Tag to a site's HTTP responses
using .htaccess and httpd.conf files that are available by default on
Apache based web servers. The benefit of using an X-Robots-Tag
with HTTP responses is that you can
specify crawling directives that are applied globally across a site. The
support of regular expressions allows a high level of flexibility.

For example, to add a noindex, nofollowX-Robots-Tag to the HTTP
response for all .PDF files across an entire site, add the following snippet
to the site's root .htaccess file or httpd.conf file:

You can use the X-Robots-Tag for non-HTML files like
image files where the usage of robots meta tags is not possible. Here's
an example of adding a noindexX-Robots-Tag
directive for images files (.png, .jpeg, .jpg, .gif) across an
entire site:

Combining crawling with indexing / serving directives

Robots meta tags and X-Robots-Tag HTTP headers are
discovered when a URL is crawled. If a page is disallowed from crawling
through the robots.txt file, then any information about indexing or serving
directives will not be found and will therefore be ignored. If indexing or
serving directives must be followed, the URLs containing those directives
cannot be disallowed from crawling.