Robots.txt and Security

Potential security issues around using Robots.txt to block indexing of content in members sections.

Edited: 2017-11-28 23:31

You should not rely too heavily on robots.txt to prevent bots and users from accessing parts of your site. If there is something that you do not want to be discovered by search engines, then it is best to either not host it on your site at all, or implement a decent server-side security mechanism instead.

We can not assume all robots will adhere to the rules in the robots.txt file. Some may even choose to ignore the file entirely, while others might not understand the rules we specify. It is therefor best to use other security mechanisms on your site, to prevent access to content that you do not want to be accessible.

Robots.txt and Security

There are quite a few security issues with relying on Robots.txt to prevent indexing of content, none are however critical. Robots.txt is mainly useful if you want to control how the major known search engines will access your site – and not as a security mechanism.

In addition, listing secret directories in the robots text file, could inform hackers of otherwise unknown locations on your server. It is therefor important that you have other security mechanisms in place. Simply providing members of your site with a secret URL is rarely enough to prevent access from uninvited guests – especially not if you list this URL in your robots.txt file to prevent it from showing up in the search results.

Alternatives to robots.txt

You might opt to use a more robust access control instead of relying on, such as that available by using .htaccess and .htpasswd files. Alternatively, you can also serve parts of your site through a PHP script with password protection. Both should effectively prevent unauthorized access and indexing by search engines.