Hacked Canonical Tags: Coming Soon To A Website Near You?

Google recently alerted website owners of a recent trend involving the hacking of websites to insert a canonical tag and point it to the hacker’s site. Is your site at risk? How can you protect against it?

There’s a great discussion going on over at Webmaster World about this topic. To give credit where it’s due, it’s quite a bright attack that – if undetected – stands to provide potentially more SEO value than the vast majority of spam tactics that I’ve witnessed in my decade plus of experience.

To make sure we’re all on the same page, let’s quickly define the canonical tag.

What is the Canonical Tag?

For years, webmasters wrestled with the issue of duplicate content. The majority of these cases were caused by simple product duplication on a series of listings pages.

Whitepaper

Let’s say a website sells blue widgets. They are ordered on one page by size, on another by color, and on a third by price. That’s three pages with fundamentally the same content.

Google will try to determine the best possible page to show in the results. Before the canonical tag, webmasters couldn’t define it for themselves, resulting in scenarios where less desirable pages were appearing in the SERPs.

In 2009 Google announced support to the rel=canonical tag allowing website owners to place a tag in their headers indicating which page was to be considered the primary result. Later that year Google announced that the canonical tag was also going to work across domains allowing webmasters of multiple sites with similar content to define specific content as fundamentally sourced from a different domain.

The Power of the Canonical Tag

For any hack to be worth doing it first must hold value. This leads us to the question of whether or not an exploitation of the canonical tag is worth doing in the first place. I found a statement from Google’s Matt Cutts on a related topic very interesting. It takes some reading between the lines so watch the following video from April 2011 and then read on.

You’ll notice that the video discussed link loss from a 301 redirect and why that’s a necessity, interesting and indirectly relevant. But the statement that most closely matches what we’re asking ourselves about here comes in the last few seconds when he’s comparing the strength of the canonical tag and the 301 redirect and states, “… but as far as the amount of PageRank that gets passed, there’s not a lot of difference.”

In one video we can get two pieces of information on the amount of weight the rel=canonical tag can pass over which combined lead only to one conclusion. The two pieces of information are:

There is very little strength loss on a 301 redirect.

The amount of strength passed via a 301 and the rel=canonical tag are virtually the same.

The conclusion then is that an exploit that inserts the rel=canonical tag onto a page can be a very effective strategy, on par with 301ing the page itself but even “better” in that it likely won’t be detected by the site owner.

Is This an Issue?

The next question we need to ask ourselves is, “Is this an issue now or just a warning?” The answer is that it is an issue right now. WebmasterWorld user goodroi claims to have seen evidence of this and I have no reason to doubt him – he knows his stuff; but even if we want to take that claim with a grain of salt, Matt Cutts sent out the following Tweet on May 13th, “A recent spam trend is hacking websites to insert rel=canonical pointing to hacker’s site. If U suspect hacking, check 4 it.”

With that, let’s assume it’s an issue, a known issue, and now discuss who’s at risk and how to contend with it.

The Hack

Sadly, there is no one hack when we’re dealing with things like this. Every environment has it’s own weaknesses, some more than others.

A WordPress blog, for example, has different weaknesses than a custom CMS, which is different than a static site. To be sure, all are vulnerable and where there’s monetary incentives, there are people who will look to exploit the situation.

The hardest part to contend with is that the offending element isn’t visible nor will it generate warnings about your site in the SERPs as malware will. It’ll just sit there, quiet in the header passing your strength to another domain.

I haven’t heard any tales yet of a cloaked hack, but the question was asked in the forum thread if it’s possible. I’m familiar enough with cloaking techniques to confirm that it wouldn’t be that difficult to cloak the tag, so when you view your source it’s not there but appears when Googlebot drops by.

The only security you have is your own site security and hosting environment. Ensuring that your CMS is fully up to date (so stop ignoring that WordPress update notice) and that your hosting environment is secure (have you changed your password since the last time you’ve given it to a third party?).

These are all best practices to defend against all exploits. This current situation is simply a notice of another use of your potential vulnerabilities.

This isn’t a new issue and as Matt Cutts puts it: “On the ‘bright’ side, if a hacker can control your website enough to insert a rel=canonical tag, they usually do far more malicious things like insert malware, hidden or malicious links/text, etc.”

It’s not new that they’ll be there – it’s just the nature of what they’re doing that is different. You may not get a malware warning, you’ll “just” notice that all the power of your page is gone.