Using “User-agent” for SEO

As you’ve seen, when you write the User-agent field, you have the option to apply certain rules to all search engines and crawlers (with the asterisk *), or to single robots.

Or both, when you want to handle a mix of different behaviors.

Take a look at this example from one of my websites:

Here I wanted to exclude Google Images from indexing my images after I found out that some of my artwork from this and a similar website was scraped years ago. I also wanted to deny Alexa’s web crawler from scanning my site.

I applied this SEO and reputation management decision to the robots.txt file by simply writing down Google Images’ and Alexa’s user agents and applying a Disallow rule to both of them, one per line.

As an SEO, you know what search engines (or parts of search engines) you want to appear in, for whatever reason.

Robots.txt lets you tell web services what you allow and what you don’t, de facto determining the way your site appears (or doesn’t appear) on each platform.

You can exclude the internet archive from crawling and snapshotting your website.

Using “Disallow” and “Allow” Directives for SEO

The Disallow and Allow directives are powerful tools to tell search engines and web mining tools exactly what to crawl and index.

So far, you’ve seen how to use them to exclude (or include) files and folders from being scanned and indexed. If you use these directives properly, you can optimize your crawl budget to leave out duplicate pages and service pages that you don’t want to rank in the SERPs (for example, thank you pages and transactional pages).

The company in question found themselves with a case sensitivity issue when disallowing category folders (“/CATEGORY/” instead of “/Category/”), and had disallowed their entire website by using “Disallow: /” instead of “Disallow:” (without the trailing slash).

Robots.txt Hacks for SEO and File Security

In addition to basic robots.txt usage, you can implement a few more hacks to help support and boost your SEO strategy.

Add a Sitemap Rule to Your Robots.txt File

You can add a sitemap to your robots.txt file—even more than one, actually!

The screenshot below shows how I did this for my business website:

I added three sitemaps, one for my main site and two from subsites (blogs) that I want counted as part of the main site.

While adding a sitemap to your robots.txt file is no guarantee that it’ll help better site indexing, it’s worked for some webmasters so it’s worth giving a try!

Hide Files That You Don’t Want Search Engines or Users to See

It could be that .PDF e-book you’re selling on your blog for your most loyal readers only.

Or it might be a subscriber-only page that you don’t want common mortals to get their hands on.

Or a legacy version of a file that you no longer want findable except through private exchange.

Whatever the reason for not wanting a file to be available to the public, you have to remember this common sense rule:

Even though search engines will ignore a page or file stated in your robots.txt file, human users will not.

As long as they’re able to load the robots.txt file in their browser, they can read your blocked URLs, copy and paste them into their browser, and get full access to them.

So when it comes robots.txt, SEO and common usage isn’t enough. You also have to ensure that human users keep their hands off the confidential material that you’ve entrusted the robots.txt to keep out of search engines!

Now the question is: How do you do it?

I’m happy to tell you it only takes three steps:

1. Create a specific folder for your secret files

2. Add index protection to that folder (so nobody browsing it can see its contents)

3. Add a Disallow rule to that folder (not to the files under it because they’ll inherit the rule)

Let’s get to putting that into practice.

Step #1: Create a specific folder for your secret files

First, log in to your website administration panel and open the file manager that comes with it (e.g. File Manager in Cpanel). Alternately, you can use a desktop-based FTP client such as FileZilla.

This is how I created the folder “/secret-folder/” in my website using cPanel’s File Manager:

Step #2: Add index protection to the folder

Secondly, you have to add protection for the index of that folder.

If you use WordPress, you can protect all folders by default by downloading and activating the Protect Uploads free plugin from the repository.

In all other cases, including if you want to protect this one folder only, you can use two methods (following on from my example above):

A. .htaccess 403 Error Method

Create a new .htaccess file under “/secret-folder/” and add this line to it:

Options -Indexes

This line tells browsers to deny the listing of directory files for browsing.

If that doesn’t work on your web server, use:

Deny from all

instead.

B. Index.html File Method

Create an index.html (or default.html) under “/secret-folder/.”

This file should be empty or contain a small string of text to remind users who are browsing that this directory is inaccessible (e.g. “Shoo away. Private stuff here!”).

Step #3: Add a Disallow rule to the folder

As the third and last action, go back to your robots.txt file at the root of your website and Disallow the entire folder.

For example:

User-agent: *
Disallow: /secret-folder/

And you’re done!

As you can see, doing robots.txt SEO is not wasting your time on a minor SEO factor.

Your robots.txt file might seem as small and insignificant as the coin I used to “fix” my washing machine, be it can be just as powerful and critical to the good standing of your website in search engines.

So take good care of it!

Related Posts

The Do’s and Don’ts of Submitting Your Site to Search Engines Submitting your site to search engines is still important for SEO success. Mainly because it offers you benefits you can’t get anywhere else. Benefits like: 1. Making sure search engines index your site, 2. Letting search engines know exactly what’s important on your site, and 3. Using search engine-specific tools...

Learn How SEO Works with Link Building (5-Part Guide) There’s an old SEO strategy that still crushes it. Know what it’s called? Link building. And if you’re not actively using it to rank higher in search results, you should be. Based on my own experiences, most marketers don’t use link building correctly because they’re not really sure how it...