Login

Using the Link Rel Canonical Tag

The link rel canonical tag lets you solve canonical issues for your URL without having to resort to 301 redirects or other potentially complicated approaches. This article explains how to take advantage of the tag in your PHP-powered web sites.

Background

Any website URL is accessible by either its www or non-www version. For example, if you have a website www.somewebsite.com, people can access it in the browser in two ways:

By typing your canonical www version of the URL into their browser: http://www.somewebsite.com

By typing the non-www version: http://somewebsite.com

If you have SSL (secure socket layer certificate), there are an additional two ways that users can access your web site:

By typing the https www version: https://www.somewebsite.com.

By typing the https non-www version: https://somewebsite.com.

The problem is even trickier if the domain URLs use session IDs, which is very common with e-commerce websites. In addition to helping you with general canonical URL issues, this article will provide solutions for removing session IDs.

Users may not have problems with canonical issues, but what happens to a search engine bot? If sites link to both canonical and non-canonical versions of your website, search engine bots will index several versions of your domain, resulting in duplicate content.

Why is this a serious issue? It will cause the loss and weakening of your internal link juice strength. For example, Google Page Rank flows like water in a pipe, so if some of the link juice flows into the non-canonical version of your domain, it is leaking like water in a pipe with holes.

This is a common problem in search engine optimization and web development. This is a failure on the part of the web development team to create a canonical version of the URL for the site.

The common approach to solving this uses 301 redirects. Now sometimes 301 redirects are difficult for an ordinary webmaster who loves to copy and paste the source code. Worse, this can be implemented using .htaccess, which is prone to server incompatibility issues.

The search engine community, particularly Google, Yahoo and MSN, decided to adopt the link rel canonical element as an easy solution to duplicate content problems, including the non-www and www canonical issue.

<link rel="canonical" href="http://www.yourcanonicalurl.com" />

This is to be placed in the <head> section of template source code. The objective of this article is to implement this element so that 301 redirection will not be necessary. The scope of this article focuses only on the PHP/Apache website configuration.

{mospagebreak title=Theory of Implementation}

To understand how to implement this, we must illustrate first how 301 redirection is used to solve this type of problem. Consider the example below, which illustrates a typical non-www and www canonical issue:

Googlebot sees a non-www URL during a crawling session —> http://somewebsite.com/myfile.php —> if non-www —> server will do a 301 redirect to the www canonical version —> http://www.somewebsite.com/myfile.php —> Googlebot indexes the www version

Therefore, this means that if Googlebot sees any non-www URL, the server will do a 301 redirect so that Googlebot will index the canonical version.

What should happen when we use the link rel canonical element? See the diagram below:

This means that we have to place this tag in the <head> section so that if any non-www URL is being crawled, Googlebot will fetch the canonical www URL.

{mospagebreak title=Implementation Steps}

We will make a flowchart of the PHP script that will be able to place the canonical tag at the source of any non-canonical URL issues without the use of .htaccess or redirections.

Basically there are four domain canonical scenarios:

A domain using www and using the non-secure protocol (http://) as the canonical version. Most of the domains on the Internet fit into this scenario.

A domain using www and the secure protocol (https://) as the canonical version. A good example of this is PayPal.

A domain using non-www and using the non-secure protocol (http://) as the canonical version. This makes for easier typing of domain names in the browser for faster user access. A good example of this is Digg.

A domain using non-www and using the secure protocol as their canonical version. This alternative is not popular, although some good sites use this type of canonical version solution, such as https://launchpad.net/

Each of these URL canonical scenarios requires different PHP scripts, but the flow of implementation is the same. Here is an example that illustrates a site using http:// and www: as their canonical version.

Step 1: If the domain is using session IDs, this should be stripped off first as it is not needed as part of the canonical version.

Step 2: Check to see if the requested protocol is http://. If it is not, return the http version to the browser and append results with Step 3.

Step 3: If the requested URL dos not use www, append www to the URL; otherwise, return it to the browser.

The final canonical URL should:

Not use a session ID.

Contain the www version.

Use the canonical protocol, which is http://

In the rest of this article, I’ll show you the recommended PHP canonical scripts for the four scenarios discussed above.

{mospagebreak title=Implementing a Canonical Version, First Scenario}

This scenario focuses on a domain using a non-secure protocol (http://) and www as the canonical version of the site. Using the steps explained earlier, the resulting script should be:

<?php

//place this script between the <head> and </head> section of your header.php or related dynamic website template

//such as index.php, product_info.php in the OsCommerce templates

//this script is applicable when the CANONICAL PROTOCOL IS HTTP AND USING WWW VERSION.

//this script is NOT APPLICABLE to a subdomain of a main domain.

//Example: if your canonical version is www.mysite.com, you should NOT be using the script in any of its subdomain.

//append the canonical www version to the server name and display the canonical www version

$URL=’www.’.$_SERVER["SERVER_NAME"];

$canonical=$protocol.$URL.$cleanrequest;

}

//Final step defining the final link rel canonical element

echo ‘<link rel="canonical" href="’.$canonical.’" />’;

?>

The good thing about this script is that it will automatically remove the session IDs of the most common e-commerce templates, such as OsCommerce, Zen Cart and CRE loaded.

The resulting URL is SEO-friendly without the session IDs in it, while using the http:// and the www canonical version.

{mospagebreak title=Implementing Canonical Version for a Secure Website}

This section explains how to implement the second scenario, where the canonical version of the site includes https:// and www. It is relatively easy to revise the PHP script from the previous section if your canonical version is https and www. The only thing that needs to be revised is this:

$protocol=’http://’;

Change that to:

$protocol=’https://’;

It should then return the secure protocol version of the domain as the canonical version.

By the way, when the server executes the PHP script above (say we are using the https// version), it should return this HTML code in the browser:

If you’re trying to implement http:// and the non-www version of your website as the canonical version, you should know that this is a little bit trickier. Most domains are configured to have a www in the beginning of the server name, and if you want to strip off the www as part of the URL, then we need to approach it in a way somewhat different from the string URL manipulation we saw in the first and second scenarios.

<?php

//place this script between the <head> and </head> section of your header.php or related dynamic website template

//such as index.php, product_info.php in the OsCommerce templates

//this script is applicable when the CANONICAL PROTOCOL IS http:// AND USING NON-WWW VERSION.

//this script is NOT APPLICABLE to a subdomain of a main domain.

//Example: if your canonical version is http://mysite.com, you should NOT be using the script in any of its subdomain.