How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag

How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag

Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt) This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: * Disallow:

Explanation The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

Explanation The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk ? The search marketing company

How SEO Has Evolved (And What's Next)ForbesSince the early days of SEO, marketers have deployed online strategies to ensure that their websites would gain greater visibility on search engine results pages ...

SEO: 6 Ways to Optimize Category PagesPractical EcommerceCategory pages are important to optimize because they target the keywords that consumers search for. But it can be difficult since the pages tend to have little ...

Is it too late to do SEO for the holidays?EconsultancyThanksgiving, Black Friday, and Cyber Monday are behind us, and it might seem like any holiday related campaigns that weren't put in motion by now would be ...

Don’t skip these critical recurring SEO tasksSearch Engine LandNo matter how perfectly you've optimized a website, it will always require constant maintenance. This is because figuratively speaking, much like a vehicle, there ...

Google: You Can Ignore Negative SEO ThreatsSearch Engine RoundtableGoogle's John Mueller said it again, you can ignore negative SEO threats, attacks, blackmail and other manipulative attempts to get you to do something you do ...

Is There Any New Innovation In SEO?Forbes NowSearch engine optimization (SEO) has been a field of its own in the marketing community for the past two decades. But in a world where most industries are ...

Twitter Rolls Out ‘Show Latest Tweets’ ButtonSearch Engine JournalTwitter now offers a way for users to quickly toggle between latest tweets and top tweets. An update rolling out today on iOS adds a button to the top right-hand ...

Why Content Is Important for SEOSearch Engine Journal*Content* and SEO. At their best, they form a bond that can catapult any website to the top of search engine rankings. But that's only when they're at their best.

9 Ways to Deal with Unresponsive SEO ClientsSearch Engine JournalYou've sent your third email in as many weeks to a client, but they haven't responded. You can't move forward with your project until they send over some ...

5 SEO trends that will matter most in 2019Search Engine LandTo be atop the waves, think about your SEO strategy in advance. A shortcut to success: get to know the upcoming trends and work out an action plan for each.