Ultimate Magento Robots.txt File Examples

Related Inchoo Services

Extremely common question when it comes to eCommerce – and for that matter Magento SEO – is how a robots.txt file should look and what should be in it. For the purpose of this article, I decided to take all of our knowledge and experience, some sample robots.txt files from our clients sites and some examples from other industry leading Magento studios to try and figure out an ultimate Magento robots.txt file.

Please note that you should never just take some of these generic files and place it as your robots.txt file on your specific Magento store blindly. Every store has its own structure and almost in every case there’s a need to modify some of the robots.txt’s content to better fit the specific needs of your store’s URL structure and indexing priorities you have. Always ask your eCommerce consultants to edit the Robots.txt file for your specific case and double check that everything that should be indexable indeed is using Google Webmaster Tools robots.txt testing tool before you deploy it live.

As you can see, the file above allows image indexing for image search while disallowing some blank image pages as explained in a tutorial by my mate Drazen.

It prevents some of the folders that are usually unwanted in index for a common Magento online store setup.

Please note that it doesn’t disallow most of the sorting and pagination parameters as we assume you’ll solve them using rel prev next implementation and by adding meta “noindex, follow” to the rest of the sorting parameters. For more info why meta “noindex, follow” and not “noindex, nofollow” read this.

In some cases you might want to allow reviews to be indexed. In that case remove “Disallow: /review/” part from the robots.txt file.

UPDATE: Since a lot of people in the comments talked about javaScript and image blocking and didn’t read the instructions in this post carefully, I decided to edit the recommended robots.txt file. The one above now allows indexing of the same. You’ll also notice that the file now allows “/checkout/”. This is due to our new findings that it is beneficial to allow Google to see your checkout. Read more in this post.

Robots.txt examples from portfolio websites of some other top Magento agencies:

As you can see above, they allow ?p parameter but disallow it in case there’s another parameter used at the same time with the ?p. This approach is quite interesting as it allows the rel prev next implementation while disallowing lots of combinations with other attributes. I still prefer solving those issues through “noindex, follow” but this is not bad either.

Here is an example of robots.txt file, very similar to what we’re using, coming form Groove Commerce‘s portfolio:

# Groove Commerce Magento Robots.txt 05/2011
#
# robots.txt
#
# This file is to prevent the crawling and indexing of certain parts
# of your site by web crawlers and spiders run by sites like Yahoo!
# and Google. By telling these “robots” where not to go on your site,
# you save bandwidth and server resources.
#
# This file will be ignored unless it is at the root of your host:
# Used: http://example.com/robots.txt
# Ignored: http://example.com/site/robots.txt
#
# For more information about the robots.txt standard, see:
# http://www.robotstxt.org/wc/robots.html
#
# For syntax checking, see:
# http://www.sxw.org.uk/computing/robots/check.html

As you can see, most of the top Magento agencies have very similar approach when it comes to robots.txt. As I said in the beginning – always check with your consultants before blindly copy/pasting any of these codes on to your store.

Has the boilerplate robots.txt file in this article been updated for Magento 2.0 / 2.1? I’ve read that the robots.txt file should be different for the new version of Magento, but I’d like to get another opinion. Interestingly the Magento 2.1. default robots.txt parameters seem very incomplete compared to the file displayed in your article…

This is boilerplate for Magento 1.X. Magento 2 one should be slightly different. We’ll do our best to publish a boilerplate soon for Magento 2, however, we’re still gathering data and observing new ideas and best practices as there are very few Magento 2 websites in live environment so best practices are still being formed.

I would not recommend adding out of stock products to robots.txt disallow.

Out of stock products should either remain a status 200 page with a correct out of stock schema markup or if there are way too many of them just 404 them after some time or 410 if you’re sure they’re not coming back to stock ever again.

I have a question i have update my robots.txt according to as said above. But when I check in google webmaster tool it is showing a error on the Crawl-delay: 10. Why so.. Do it will affect my ranking and indexing..

I have a small question: if you implement a WordPress installation into magento – and WP-folder is in main directory – “wordpress”, what should we do?
1. Should we disallow whole WP directory?
Disallow: /wordpress/
2. Should we Disallow some special files/folders as for example
Disallow: /wordpress/wp-admin/
Disallow: /wordpress/wp-includes/
Allow: /wordpress//wp-content/uploads/ and so on as for WP sites
3. Or should we add 1 more robots.txt into wordpress folder?

Great post-Thanks! I just can’t seem to find the answer to my question anywhere, which is the following:

What exactly does the asterisk (*) mean? what does it do? for example if I use Disallow: /*?_a=*
Does the * tell bots to disallow EVERYTHING before the * and everything after it? what if I DON’T put the * right after the slash or at the end?
Or if I’m using this – Disallow: /*product-reviews.asp?product=*
would it also work if I used this – Disallow: /*?product=*
Or even this – Disallow: *.asp$

I assume disallowing /catalog/category/view and /catalog/product/view are used to combat duplicate content, but in the Magento configuration, we can add rel canonical link elements to point this to the clean URL’s. Is there another other benefit of disallowing these URL’s?

I just updated to CE 1.9.1 – my store is created, live , indexed. I do not have a current robots.txt
We are a start up with very limited money, so I have to figure this out myself. I am gettting alot of duplicate content reported on my SEO reports(MOZ, google webmaster tools, etc) It looks like magento search pages and such. im glad I stumbled onto this blog through google search. Could anyone please suggest if one of these examples will work for my store? I would be ever grateful and appreciate so much some direction on this.
My storehttp://www.iamgreenminded.com
Magneto CE 1.9.1 With theme “Ultimo – Fluid Responsive Magento Theme” 1.13
thank you!

I uploaded the robots.txt file this morning, overwriting my old one, but the new configuration is still showing the old file? Do I need to reboot the server for the new file to be accepted? … Thanks, Joe

When you visit your website’s robots.txt file from browser, do you see a new one or the old one?

If you see the new one then everything is fine, robots.txt tester in Google Webmaster Tools has a cache of your old one and will change to new one in up to few days.

If you see the old one, make sure you actually did rewrite it with the new one. You might need to clear some caches you’re using or if you’re using CDN force the CDN to fetch the new robots.txt file if it’s served from over there.

Hi Toni,
I have concern regarding images are not indexing in Google for one of my clients. They have their all images on sub-domain for example.
site is: WXYZ.com
and Images are on
media.WXYZ.com
Can you please help me finding solution on this.

Great post, but I would argue you should not include app, lib, var (and maybe a few others) in the robots.txt from a security standpoint. The less an attacker knows about your code layout the better. Since those URLs are not likely to be something Google would find/crawl, there is no reason to include them in the file at all.

Does blocking the directories /skin and /js not automatically disallow all css and js files inside?
I used the robots file above and used ‘Fetch and Render as Google’ in Webmasters and all the formatting, js, product images, sliders etc could not be rendered.
Please correct me if I am wrong.
Thank you

Great article Toni :). This has helped shed some light on recent robots.txt issues I’m having with review urls.

@James:
I think hiding /reviews/ might be good in the case you move your reviews to your actual product page since the default reviews page is pretty much the product page but without the description. Our setup is still pretty “defaultish” and I think I will eventually change our description/reviews for a product, to a tab based layout under the product. That way it cuts down on the number of duplicated pages, and as you mentioned, puts more SEO into the product page itself.

One of the most common issue i have seen in various Magento stores is that they don’t prevent search engine crawlers to crawl and index their subversion or Git directories. Moreover, i have also seen downloadable .PDF listed when you search site it Google.

This tells, if you do not configure Robots.txt carefully you will end up seeing lots of unwanted stuff in Google too.

excellent post, we are having some issues with the bing bot, they said we are blocking the images, I know google is the big search engine but bing and yahoo have much netter conversion in our case and we would like to know if you have any suggestions to optimize the robot file for yahoo/bing, thank you!

Your email address will not be published. Required fields are marked *

Comment *

You may use these HTML tags and attributes: <a href="" title=""> <blockquote cite=""> <code> <del datetime=""> <em> <s> <strike> <strong>. You may use following syntax for source code: <pre><code>$current = "Inchoo";</code></pre>.