Learn More about the Canonical Link Element

If you enjoyed that video but wanted to learn more, last week I sat down and recreated the presentation that I did at SMX West. You can watch the “director’s cut” of the video (click in the lower-right of the video to get the high-quality version). Here’s the video:

One exciting new development even since we made the video is that Ask announced that they will support the canonical tag. This means that pretty much all the major search engines will support this as an open standard. That should make life easier for site owners, developers, and webmasters.

Philipp, I was speaking in terms of global market share rather than local markets, but the nice thing is that this is a completely open standard, and the data is live on the web (not locked in specifically to any one search engine). So Baidu or Yandex could easily add support for this tag and benefit from the standard as well. I would love if Baidu or Yandex decided to do that.

Duplicate content can be such a bear to deal with. From the redirection issues to server stuff (www vs. non-www), it makes you want to pull your hair out.

I’ve been having issues with my WordPress blog when I place a post in 2-different categories, and only thanks to Google Webmaster Tools did I realize the problem. Luckily I was able to fix the issue and remove the 404-pages it caused from the index via one of the Webmaster Tools.

Can I understand your presentation to mean that as long as I have submitted a sitemap, Google will automatically assign all crunchy goodness to the URL given in the sitemap, including whatever value links of the other type (relative vs absolute) might have generated?

One question I did not see answered (or asked) yet is whether the specified canonical URL itself may redirect to another page using a 302 redirect.

The use case that comes to mind is a canonical URL without a session identifier which redirects to the same URL with a session identifier, or a vanity URL that redirects to a lengthy string of unreadable garbage.

I would hope that the specified canonical URL will be used then — is that the case?

As a future tweak, how about recognizing a canonical tag outside an HTML header, like dropped into the end of a PDF document? Could help Google keep straight where the true home of the document is, since most people won’t bother hacking it out.

Curious about something, I know it is possible to have two articles with different case in the url (i.e. lowercase vs upper case) but it is incredibly uncommon (for different content), why do search engines consider this to be duplicated content …

As Yusef Hassan Montero pointed in his great conference at Search Congress in Barcelona something “bad” about copyleft licensed content is that helps to create duplicated content all along the internet (from all other points of view it is marvellous, I state). This represents 2 problems: authoring attribution and search engine algorithms to attribute relevance to the original one.

Although this canonical thing can be very useful, a nice first step, as it is not working across domains it doesn’t help to deal with the duplicate content/relevancy attribution problem.

Some (cross domain) attribute that could be placed in any html tag to indicate the origin of the content would help better, ie:
<div rel="canonical" href="http://example.com/page.html"><p>bla bla bla</p></div>

You told it in your video: This tag doesn’t work for different domain-names. But if I have to publish my content unter different urls, what can I do to told Google which of them is the right one? Is it enough to set a source link inside the html-code?

Matt, the official syntax of this element generates an error message in BBEdit’s syntax checker when inserted into an HTML 4.01 document. I think you need to document that the final slash is only for XHTML. I haven’t seen that distinction made anywhere.

I just posted your YouTube video on my website and blog. You know, it might’ve been a long time before I heard about the Canonical Link Element, I’m glad I took the time to ‘play’ around online for once. I’m not sure how to go about ‘testing’ my site for duplicate URL content, but I added the link element to my blog and hopefully that will handle any major issues (if they were) or have arose.

1. Will Google eventually or quickly remove all pages with canonical link element but just keep the page the canonical link points to?

2. Will this hurt the pagerank of example.com/page1 since different variance with different sorting algorithm would have different unique nature such as keyword density. WIll example.com/page1 be able to have all those unique nature or G will just treat it based content on example.com/page1?

My site had a redirect in place pointing from http://mysite.com to http://www.mysite.com and although the redirect was set up properly, Google indexed the http://mysite.com version for the main page of the site but indexed inner pages with the http://www.mysite.com syntax. Placing a canonical link in the main page of the site corrected this. This is good because my hosting provider was stumped about how to fix it otherwise since htaccess redirect was set up just fine.