Avoiding Duplicate Content

In a previous post, we discussed the fundamentals of duplicate content (i.e., what it is, why it’s important, etc.). Now, we’re going to complete that discussion by describing various techniques for avoiding duplicate content.

Solving The WWW vs. Non-WWW Dilemma

Previously, we learned that http://www.webgnomes.org and http://webgnomes.org are unique addresses. Consequently, if both addresses are serving up the same content, we have a duplicate content issue on our hands. To avoid this situation, we need to choose one of the addresses as our preferred address. Then, we need to redirect the other address to this preferred address (e.g., http://webgnomes.org will redirect to http://www.webgnomes.org). To accomplish this redirection, we will use a 301 HTTP redirect.

If you’re using the Apache Web server, you can define a 301 HTTP redirect in your .htaccess file, and if you’re using the IIS Web server, you can define it using the administrative console.

Once we have chosen a preferred address (WWW or Non-WWW), we’ll also want to inform Google about our decision. To accomplish this, we log into our Google Webmaster Tools account and select our preferred domain (e.g., www.webgnomes.org). This notifies Google that we want our URLs to be displayed in their results pages using our preferred domain.

General Duplicate Content Solutions

The WWW vs. Non-WWW dilemma is the most common cause of duplicate content, but it is by no means the only potential source of problems. Instead of trying to address every possible scenario, we will focus on the most effective solutions. Then, you will be prepared for any duplicate content that comes your way!

301 For The Win

We’ve already mentioned one of the most effective tools for avoiding duplicate content: 301 HTTP redirects. It solved our WWW vs. Non-WWW problem, but the fun doesn’t end there. Any time we have multiple URLs that are serving the same content (e.g., http://www.domain.com, http://www.domain.net, http://www.domain.org, etc.), we follow a similar procedure. First, we select our preferred URL (e.g., http://www.domain.com), and then, we redirect the other URLs to point to that preferred URL. Now, instead of having multiple URLs serving the same content, we have multiple pointers to a single version of the content.

Let’s Get Canonical

301 redirects are your best friend, but unfortunately, they can’t solve every problem. There are numerous situations where duplicate content is generated (e.g., syndication, mobile-friendly pages, printer-friendly pages, etc.), and you are unable to avoid it using redirection. But fear not: there is a simple solution for this problem.

When we have a duplicate content situation, a number of pages have very similar content. As we discussed in our previous post, search engines will resolve this situation by algorithmically choosing one of those pages as the original (this is also called the canonical page). Fortunately, we can influence this selection process by explicitly identifying the canonical page with the canonical tag.

To illustrate, let’s assume we have two very similar pages:

http://www.example.com/canonical (the canonical page)

http://www.syndicate.com/duplicate (the duplicate page)

Since http://www.syndicate.com/duplicate is the duplicate of the canonical page, we need to include the following canonical tag in the duplicate page’s HTML:

<link rel="canonical" href="http://www.example.com/canonical"/>

Now, when the search engines see this tag, they will know that http://www.example.com/canonical is the canonical page (instead of attempting to select it algorithmically). To learn even more about the canonical tag, watch this video by Matt Cutts:

Now that you know a few of the best techniques for avoiding duplicate content, let’s get out there and start de-duping the world!

About The Author

Steve Webb is an SEO audit specialist at Web Gnomes. He received his Ph.D. from Georgia Tech, where he published dozens of articles on Internet-related topics. Professionally, Steve has worked for Google and various other Internet startups, and he's passionate about sharing his knowledge and experiences with others. You can find him on Twitter, Google+, and LinkedIn.

Comments

I was looking for an article on how to do this and this article explained what I needed. I am in the process of setting up a 301 HTTP redirect through .htaccess and this has helped immensely. If you have a good tutorial on how to change the .htaccess for a 301 redirect let me know. Thank you for the knowledge.