Duplicate Content & URL Canonicalization

Before we get into this week’s tip, let me provide a definition for the term Canonicalization.

“It is the process of converting data that has more than one possible representation into a “standard” canonical representation.”

For SEO purposes this means it creates a definitive and unique page to represent more than one possible page on your site. You want to avoid duplicate pages on your site.

Sometimes duplicate content is a result of two pages sharing the same information in two different files. Sometimes you have duplicate content because of the technology causing two different urls point to the same page. Both of these problems have simple solutions. It is important to correct this because search engines want to show only unique web pages and text.

If your site has multiple pages with the same content possibly through a Content Management System (CMS) or through duplicate navigation, or because it actually exists in multiple versions, you are could be hurting your search engine ranking results. We all know how important linking is to any SEO campaign and if links pointing to different urls of the same information your link value will be diluted because 2 incoming links may point to one version and 3 links to another. It would be much better for all 5 links to point to one url.

301 Re-Direct to the Rescue
The solution is to take any current duplicate pages and use a 301 re-direct to point all versions to a single, “canonical” version of the content or web page.

Most often this problem can be found on a site’s homepage. For example: Search engines view your home page as having more than one version. How? Take a look at the following urls. All point to the same page, but to the search engines they are different: http://www.yoursite.com, http://yoursite.com, http://yoursite.com/index.html and http://www.yoursite.com/index.html. The search engines may find up to four home pages that have the same content.

While this may not cause your site to be unranked it is certainly not helping and can easily cause poor rankings. That is a shame for something that is so easily corrected. Most often this is caused by links pointing to different versions of your site. You can’t change all the links coming into your site, but you can use the 301 re-direct to solve this by pointing all versions of your home page to the full url (http://www.url.com). You can read more on how this is accomplished on our 301 re-direct blog post.

5 Comments

awesome tip! I think this has caused me some problems which I will now fix. It seems like this is something that should happen automatically somehow. Are there hosts that do this by default? Are there ways that http://www.samwilson3d.com and http://samwilson3d.com could even show 2 different pages? If why would you want this and if not why are they not redirected by default?

Sam,
You should contact your host to see if they can make the changes for you, but you can control it by accessing files such as .htaccess on your server, if you are hosting on a Unix system. Your host is a good place to start if you are unfamiliar with the inner workings of your server set up.

The different urls should show the same information, after all it is showing the “default page” that is identified by the server. Usually this is an index.html or default.html page. The extensions can be different, but you get the picture. So there shouldn’t be any worry of different pages showing up.

Finally you can tell Google which version you want to show up. You have to sign up for webmaster account and verify that you are the owner of the site, but ther are other nice features provided by Google’s webmaster tools to make it worth your while, besides telling them what your preferred domain should be.

I haven’t experienced any problem or rankings drop when using www and non-www version of a site at the same time. Or some link pointing to index.html instead of root. But is always better to add those lines of code in .htaccess file if you can do it.
And I saw once 2 different sites on http://www.domain.com and domain.com. I think they were hosted on 2 servers and 2 different DNS used for each one. Kinda funny for me..:D