What is a Robots.txt File and how – why we use it?

Robots.txt file found in root directory of any website. As its name indicated Robots, It is being made for robots, crawler or web robots. It gives instructions to web robots, is called The Robots Exclusion Protocol.

Robots.txt file tells the crawlers what to crawl and not to crawl. Robots.txt a text file placed on web server root folder. When a robot visit a website or any webpage, first it visits robots.txt (if available) and follow the instructions before visiting the website or webpage.

Google and some other respected crawlers obey the instructions of Robots.txt but other might not. So, you just can block them by their ip or if you wish not to crawl some of your confidential directories by any of crawlers then you can use password protected directories. Google or any crawler are unable to access and crawl password protected directories on server.

Make sure Points for Robots.txt –

1- if you don’t have robots.txt file for your website, please create one to not crawl your private or confidential directories.

2- Determine any harm by robots.txt, any wrong instruction can block your website from search engines, so please check again your robots.txt file

We can disallow some specific IPs, Search engines also by robots.txt not to visit or crawl our website.
Eg-

User-agent: Bad Bot
Disallow: /

To disallow, some specific area or folder –

User-Agent: Google bot
Disallow: /admin/

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/

We can include many User agents and disallow in a single line also.

If robots.txt file is not available in your web root then just create a .txt file named robots.txt and put it on your root folder. Suppose, your website is www.ecomspark.com then robots.txt should be placed at www.ecompsark.com/robots.txt . When you disallow any folder or files in root, no need to use full path.