We have a global brand website project for which we are only working the LATAM portion. There is a website installation process here that allows to have one website installation with several ccTLDs, in order to reduce costs.

Because of this the robots.txt in www.domain.com/robots.txt is the same file in www.domain.com.ar/robots.txt.

We would like to implement custom robots.txt files for each LATAM country locale (AR, CO, CL, etc..). One solution we are thinking about is having a redirect placed at www.domain.com.ar/robots.txt to 301 to www.domain.com.ar/directory/robots.txt.

This way we could have custom robots.txt files for each country locale.

Does this make sense?

Is it possible to redirect a robots.txt file to another robots.txt file?

Every crawler has to do two HTTP requests: one to discover the redirect, and another one to actually fetch the file.

Some crawlers might not handle the 301 response for robots.txt correctly; there's nothing in the original robots.txt specification that says anything about redirects, so presumably they should be treated the same way as for ordinary web pages (i.e. followed), but there's no guarantee that all the countless robots that might want to crawl your site will get that right.

(The 1997 Internet Draft does explicitly say that "[o]n server response indicating Redirection (HTTP Status Code 3XX) a robot should follow the redirects until a resource can be found", but since that was never turned into an official standard, there's no real requirement for any crawlers to actually follow it.)

Generally, it would be better to simply configure your web server to return different content for robots.txt depending on the domain it's requested for. For example, using Apache mod_rewrite, you could internally rewrite robots.txt to a domain-specific file like this:

This code, placed in an .htaccess file in the shared document root of the sites, should rewrite any requests for e.g. www.domain.com.ar/robots.txt to the file robots_ar.txt, provided that it exists (that's what the second RewriteCond checks). If the file does not exist, or if the host name doesn't match the regexp, the standard robots.txt file is served by default.

(The host name regexp should be flexible enough to also match URLs without the www. prefix, and to also accept the 2LD co. instead of com. (as in domain.co.uk) or even just a plain ccTLD after domain; if necessary, you can tweak it to accept even more cases. Note that I have not tested this code, so it could have bugs / typos.)

Another possibility would be to internally rewrite requests for robots.txt to (e.g.) a PHP script, which can then generate the content of the file dynamically based on the host name and anything else you want. With mod_rewrite, this could be accomplished simply with:

A file cannot be named robot_ar.txt!
–
Edgar QuinteroApr 25 '14 at 19:18

1

@EdgarQuintero: Why on earth could it not be?
–
Ilmari KaronenApr 25 '14 at 19:45

Crawlers will always look for the file named as is - robots.txt
–
Edgar QuinteroApr 25 '14 at 20:16

3

@EdgarQuintero: An internal rewrite, as implemented by the rewrite rules I show above, happens entirely within the webserver. A crawler requesting the URL path /robots.txt has no way of even knowing whether the content it receives comes from a file named robots.txt (as usual) or from a file named robots_ar.txt (to which the request was rewritten) or even from a script named robots.php (or even whatever.php).
–
Ilmari KaronenApr 25 '14 at 20:23