Canonical URL Issues

Posted
on Tuesday, October 6th, 2009 at 6:28 am by admin
and is filed under SEO.
You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

“Canonical” is derived from “canon” that means “a generally accepted rule standard or principle by which sth [something] is judged”, as per Oxford Advanced Learners’ Dictionary. As an answer to the question why such a weird word is used for the process of picking a single URL where there are many variations, Matt Cutts of Google says in his blog that “it’s a strange word; that’s what we call it around Google.” Canonicalisation is an effective way to prevent your content being duplicated or banned due to duplication.

What is canonicalisation?

For the uninitiated, canonicalisation is “choosing what single domain you want to use for your site, and what single URL should be used to request each of your pages”. Having duplicate content is an issue that can even result in the permanent ban of your website from the search engines. Even if you take content from your own site for different web pages it can be considered spam. Canonicalisation is a way out of this since here you inform Google that there is a parent page for the content in your website. In other words, you let Google know the URL that you prefer for your pages.

Issues related to canonicalisation

Canonicalisation of a URL involves many issues that web masters have to take care of. Here are the answers of Joachim Kupke, Google’s engineer in the indexing team, to a few questions related to issues in canonicalisation.

Is rel=”canonical” a hint or a directive?
It’s a hint that we honour strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognised as expected with the <link> tag. Also, if you include a <base> link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognise that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.

Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalisation results.

Can this link tag be used to suggest a canonical URL on a completely different domain?
No. To migrate to a completely different domain, permanent (301) redirects are more appropriate. Google currently will take canonicalisation suggestions into account across sub domains (or within a domain), but not across domains. So site owners can suggest www.example.com vs. example.com vs. help.example.com, but not example.com vs. example-widgets.com.

Sounds great—can I see a live example?
Yes, wikia.com helped us as a trusted tester. For example, you’ll notice that the source code on the URL http://starwars.wikia.com/wiki/Nelvana_Limited specifies its rel=”canonical” as: http://starwars.wikia.com/wiki/Nelvana.

The two URLs are nearly identical to each other, except that Nelvana_Limited, the first URL, contains a brief message near its heading. It’s a good example of using this feature. With rel=”canonical”, properties of the two URLs are consolidated in our index and search results display wikia.com’s intended version.