A /robots.txt file is a text file that instructs automated web bots on how to crawl and/or index a website. Web teams use them to provide information about what site directories should or should not be crawled, how quickly content should be accessed, and which bots are welcome on the site.

What should my robots.txt file look like?

Please refer to the robots.txt protocol(External link) for detailed information on how and where to create your robots.txt. Key points to keep in mind:

The file must be located at the root of the domain, and each subdomain needs its own file.

The robots.txt protocol is case sensitive.

It’s easy to accidentally block crawling of everything

Disallow: / means disallow everything

Disallow: means disallow nothing, thus allowing everything

Allow: / means allow everything

Allow: means allow nothing, thus disallowing everything

The instructions in robots.txt are guidance for bots, not binding requirements.

How can I optimize my robots.txt for Search.gov?

Crawl delay

A robots.txt file may specify a “crawl delay” directive for one or more user agents, which tells a bot how quickly it can request pages from a website. For example, a crawl delay of 10 specifies that a crawler should not request a new page more than every 10 seconds. We recommend a crawl-delay of 2 seconds for our usasearch user agent, and setting a higher crawl delay for all other bots. The lower the crawl delay, the faster Search.gov will be able to index your site. In the robots.txt file, it would look like this:

User-agent: usasearch
Crawl-delay: 2
User-agent: *
Crawl-delay: 10

XML Sitemaps

Your robots.txt file should also list one or more of your XML sitemaps. For example:

Allow only the content that you want searchable

Note that if you disallow a directory after it’s been indexed by a search engine, this may not trigger a removal of that content from the index. You’ll need to go into the search engine’s webmaster tools to request removal.

Also note that search engines may index individual pages within a disallowed folder if the search engine learns about the URL from a non-crawl method, like a link from another site or your sitemap. To ensure a given page is not searchable, set a robots meta tag on that page.

Customize settings for different bots

You can set different permissions for different bots. For example, if you want us to index your archived content but don’t want Google or Bing to index it, you can specify that: