Are Webmasters Using Canonical URL Tags, Nofollows? The Latest Linkscape Update Has the&nbspData

The author's views are entirely his or her own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

It's an exciting day at SEOmoz - Linkscape's index has updated with fresh data crawled in the past 30 days. This update also gives us a chance to show off lots of interesting data points around the web's usage of search-specific tags and directives. Let's dive in!

The Canonical URL Tag Grows in Popularity

Rel Canonical is here to stay. Websites have been growing in their adoption of the tag since it's announcement and this index has the highest number and percentage of URLs employing it to date.

The overall numbers are still small. Canonical URLs are on less than half of 1% of all pages, and I suspect duplicate content is much more prevalent, thus giving SEOs a lot of opportunity to help sites apply this directive. Don't forget that you CAN use the tag on the original version of the page, too.

Usage of Nofollowed Links Falls

It would appear that the nofollow directive is falling out of favor, as evidenced by the chart below:

Nofollow use is down, both on external links and internal links, though it's taken more of a hit on internal links. Perhaps that's a sign more SEOs are getting internal nofollows removed after Google's announcement on the topic.

May 2010's Linkscape Index Stats

Linkscape's index this month has the largest number of unique, root domains we've ever indexed and has improved quality in several other ways as well. For example, some of you reported some link spammers that were highly effective in gaming page/domain authority scores, and those should be fixed in OSE.

Pages: 41,202,970,156 (41 Billion)

Subdomains: 289,291,281 (289 Million)

Root Domains: 85,725,739 (85 Million)

Links: 424,255,504,138 (424 Billion)

You can see a chart of growth in the number of root domains (e.g. *.domain.com) below:

This shows the growth we've been doing in reaching more new sites and getting a broader picture of the web. We've taken to heart the feedback that it's frustrating when we don't have any data on a site and are reaching out in accordance (these numbers may also show that there's lots more websites getting registered and earning links).

You'll notice that at the beginning of this year, we ramped up index size at the request of our users. Unfortunately, we found that this didn't correlate well to quality or usefulness in every case, so we've been refining our crawl selection and metrics before we attempt to scale up again. We do plan to grow the index again, but we're much more concerned with the value of the links and pages we report back, so we won't grow just for the sake of numbers - as Danny Sullivan and Google themselves have pointed out many times, size ≠ quality.

Changes to How OSE & Linkscape Define "Followed" vs. "Nofollowed"

Based on some more feedback from users and API partners, we've made a change to how we define "followed" and "nofollowed" links through our API, and you'll see this in Open Site Explorer. Our friends noted that links containing the rel="nofollow" attribute aren't the only ones that don't pass link juice, so we've gone ahead and made two buckets as below:

If you're using the API to pull in link data, you'll see these new delineations, which should also help with previous disparaties in link count numbers (because adding followed+nofollowed previously didn't include some of these other types of links).

Some News on the SEOmoz API

We are proud to announce the release of a Linkscape Ruby gem. This gem contains all of the code we used to access the Site Intelligence API and power Open Site Explorer. If you were looking for a time to get started with our API, this bit of sample code should make it even easier. For more information about the gem, check out the Ruby section of Sample Code page here.

We're also making it easy to track future updates via the Linkscape Schedule in our API wiki. If you haven't yet checked out the API, now's the time - you can build remarkable things for on-site analysis, link data extraction or anything else that requires trillions of links :-)

A Fond Farewell to Nick Gerner

Unfortunately, I've got some sad news to report as well. Nick Gerner, who helped to create Linkscape in 2008, is leaving the team next week. He's been an incredible engineer and a good friend to everyone here at SEOmoz and many of our colleagues in the community as well. We wish him well and can't wait to see what he does next (he's assured us it's something exciting in the startup world).

If you've been connecting with Nick regarding the API, you can send those requests to Sarah Bird and feel free to pass any direct questions about Linkscape to sitesupport where Ben, Chas & Phil are helping to improve the index and our tools on that front.

Looking forward to the discussion - hope this weekend post doesn't intrude on too much family time. Don't forget to have a great Mother's Day!

It's been blast working at SEOmoz. And I'm excited to see so many new hands getting involved in Linkscape (both inside and outside SEOmoz). I'm confident it will continue to be a useful source of data and intelligence, and will continue to improve.

What would make the % of nofollows over time graph more useful is if it also had the number of links of time graphed with it. I'd like to see if that number went down just the same as nofollows. If so then nofollows didn't really go down since we would have to look at the percent of nofollows of the overall links and it's change over time - or is that what this graph already shows?

Awesome article, Rand. In a way I think it's good to see that a lot of the web is taking the initiative to prevent canonicalisation problems and also reducing the overuse of rel=nofollow on their websites (a lot of the time it is unnecessary and is sometimes downright rude).

Unfortunate to see Nick Gerner leaving your ranks, hopefully he'll go on to do great(er) things!

WMT (not surprisingly) was giving me duplicate warnings for listing pages (page2, page3 etc). So I thought Canonical tag to the rescue. Unfortunately I saw the same - a seemingly dramatic drop in GoogleBots' visiting activity and a marked drop in indexed pages.

I couldn't nail it down to anything else I had changed. Removed the tag, and the trend reversed - more spidering, and more pages in the index.

The listing pages themselves aren't important of course, and don't need to rank for anything. What they link to on the other hand is another matter entirely.

We also had a significant decrease in traffic following using canonical and found the culprit to be pagination. We (unfortunately) have significant pagination and found that canonical affects the bot crawling deeper into the pagination.

I saw that exact trend too. I got worried when WMT threw warnings saying that "it may be crawling uneccesary pages" due to the high number of pagination/search result filters on my sites.

I ended up reverting back to the original structure, and decided to let googlebot figure it out and do what they want with it. Seems to be doing better now. Besides, MNT even says that warnings wont affect crawl rates or indexation, which to me means i can ignore until they saw otherwise.

Just wondering where in WMT you see the it may be crawling unecesscary pages? We've a massive site and we are undergoing a redesign at the moment so just want to make sure the crawl of the new site is as efficient as possible. So i'm trying to identify the crawling of unecessary pages. So other than html suggestions, not sure how to find these in WMT?