Introducing Scrape Rate – A New Link Metric

by Jon Cooper

Drop what you’re doing and go the Boing Boing. Go to the archives page and select three different posts that were published at least a month ago. Go to Google and type in intitle:”post title” (obviously replace post title with the actual post title. Keep the quotes). Do this for each post, add the number of results together, divide by 3, and you have now calculated the scrape rate for Boing Boing.

After doing a quick check, I calculated Boing Boing’s scrape rate as 40.33. Your number will vary based on which 3 you check, but this number should give you an overall feel on how often Boing Boing’s content is scraped.

Scrape rate is a guest blogger’s best friend. For those who guest blog, the links you get in the post only go so far. Having the content scraped, although no longer on original content, gives you more link equity.

Imagine if you wrote a guest post on an average blog that was scraped 100 different times. Now compare that link power to a guest post written on a more authoritative blog that only gets scraped once or twice. While the original content on the second option yields more quality & trust, you can’t beat the quantity of links the first option provides. Argue all you want, but in terms of link building, having your content scraped (as long as the links are intact) like that trumps the quality of the original source in most cases.

Here’s a good real life example. Go to OSE and paste in the URL of a recent blog post from the SEOmoz blog. A lot of the posts don’t get an overwhelming number of high quality links, save a few successful posts, so the majority of the link power is coming from the content being scraped. The result? Most of the posts have a page authority of 60 or greater.

Note: I know a lot of that authority comes from the site it’s hosted on & the internal linking, but the point I’m making is when all else equal, scraped links can provide authority when found in great numbers.

When I guest posted on the SEOmoz blog in October, I had a targeted anchor text link back to a page I was trying to rank for a certain keyword. After the post was live for a few days, I saw no change in the SERPs, but after a week or two, my content was scraped by roughly 30 sites, and I saw an immediate jump in the SERPs. I went from not even in the top 50 for that keyword to the second page.

Why is scrape rate important?

If you’re guest blogging on a regular basis, you need to make sure you do your research. Guest blogging is more than taking an hour of your time to write up a post and throwing it at any blogger that’s willing to publish it; it’s about finding what resonates with the audience, interacting with the readers (i.e. via comments), and getting the most bang for your buck in terms of links. That’s where scrape rate comes in.

I’m not sold on sorting guest blogging prospects solely on domain authority and pagerank. Take it a step further. Go to Google and calculate the scrape rate (if someone creates a tool that does this automatically, let me know; I’ll happily send a few links your way). The best part about the metric is that some previously looked over blogs that don’t get pitched as much for guest posts might actually be the one who provides the most link power.

While everyone else in your niche is struggling to put together a post that gets published on blog X, you’re putting together a post for blog Y that you know has its content scraped and, in the end, passes more link juice.

The idea isn’t perfect, because a lot of blogs don’t get scraped at all, but this metric can help you find identify, as I said, a few picked-over blogs that others have missed.

I’m not dedicated to the idea of finding 3 average month-old posts, counting the number of times scraped, and dividing by three, but I think it’s fairly accurate. Here’s why:

If you just calculated it by looking at one post, it could skew the results. That’s just basic data knowledge.

Having it a month old means that it was given enough time for it to be scraped. That’s the problem with just finding the most recent post on the blog: some sites might syndicate it, but it will take a week or so after it’s published for it to happen.

Using the intitle search yields pinpoint accuracy, but only if the title is unique. One problem I’ve run into when doing this is that some results are from Friendfeed, Tweetmeme, and other similar social sites; I count them because it’s not worth the hassle of individually counting them out.

Granted this is a brand new idea, I want to hear your thoughts on this metric. I think it’s got potential to catch on in the SEO community, but I’m biased, because I’m the one who came up with it. Please leave me a comment; if you think it’s a bad idea, and you see flaws, feel free to trash me. I can take it. At the same time, if you like the idea, I’d love the words of encouragement.

re-reading my comment, its not to say scrape rate needs this factored in, and I think its a great metric that is very insightful. But the bigger question is how do we apply scrape rate to the authority passed from these scrapes. For example 5 scrapes of same post, do not equal 5 unique content posts. I saw another post somewhere complaining how a lot of companies just do PR releases and think as they get a load of articles published they assume its great link building. When in fact its same sites each release scraping/publishing each release, plus duplicate content for all the links.

Unique scrape rate, or a scrape rate which somehow removed the sh*t who have scraped the post may be a good enhancement?

Very, very interesting. I was analysing a competitors links through guest posting yesterday and noticed a trend of some posts being duplicated elsewhere. Whilst I saw the value of the extra links these scrape sites were generating I hadn’t thought to formalise it and actually LOOK for this feature.

What I found, after trying your method on the same stuff I looked at yesterday, was that not all sites are republishing the full page content, even if they do use the same title. However, all of these that I found at least linked back to the article on the original blog,, increasing the value of that page (and any links coming off it).

Where I think it gets really interesting is when you start to analyse the page metrics of the scrape-pages (don’t know what else to call them). A lot of them will be 1, that is true, but I was seeing some 10s, 20s, 30s etc… If you combine this into your scrape rate you could come back with a metric that measures value rather than just quantity.

Using the MozBar in Mozilla allows you to export results to CSV, with Page Authority. You could add that to your Google Search (put results on 100) and add that to your measure – or as a secondary measure?

Great insights man. Thanks for taking it to the next level (analyzing scrape pages) before I had the chance to mention it.

One thing you mentioned is that not all sites scrape the whole article. Here’s the solution: if you can, get a link in the first few paragraphs. Of course, you’ll have to make it relevant, but if you can, there’s a much higher likelihood that the link will get scraped no matter how much of the article is scraped.

Thanks Patrick for the comment! Hope to see you blogging more often on the Search Engine People blog – you honestly write some great stuff 🙂

It looks like it was quite effective back in the day and as you’ve noticed the strategy still works. Its amazing really that even though those scraped posts are duplicated content, Google clearly gives them some value. As mentioned, nothing like the original, but at least you’re rewarded something.

@Patrick “analyse the page metrics of the scrape-pages” Great thought. Let us know how the test turns out. I’m theorizing that the major difference between these pages will be whether or not the scrape-page links back.

Great post, Jon. Definitely an interesting thought. I look forward to hearing how everyone’s tests go.

Maybe the benefit seen in the serps was not virtue of the scraped content and the links but more about the idea that many republishing is still seen as some kind of low level social proof. Just a thought!

Hippi the guy who thinks 80110**5 to that! I have more than enough metrics to keep me going for life 🙂

Hey Jon thanks for the comments. I’m a big fan of your blog so it’s nice to be able to contribute. I’ll post on here any results when I’ve had a chance to test.

Good solution to the problem with scrape sites truncating the post, although as you say may be difficult to keep it relevant. Some sites allow you to put guest post mini bios at the top, maybe these are more valuable still?

Really smart Jon. I would only say that I’m not sure it’s worth really computing the Scrape Rate each time – it is however worth looking at the PA of each historical blog post. Cause these can vary depending on architecture and things like this – the scrape rate. I like looking at historical blog post value to put a finger in the air value of a link I might get – doing that analysis is important when determining how much we should invest in said link.

I think this type of link-building died a while ago. Like mentioned in the comments above, scrapers do their best to strip out or nofollow any external links. Besides that, the term “bad neighborhood” comes to mind.

Instead of inutl:”title” – have you considered “sentence from post with your link in it” ? That should come up with less results, but at least they’ll be the full posts as opposed to partial feeds.

I’ve had a few links from SEOmoz and a few of the other big SEO sites recently, it provides hundreds of scraped back links, and the ranking/visibility jump you see afterwards is almost always significant.

This should definitely catch on and be apart of every link builders ‘quality check’.

If this is a good example of writing a controversial link bait post then well done as I’m sure the post will get some good attention.

Scraped content is duplicate content and gets completely devalued. These days even spun content – which is much more unique than identical content – also gets devalued. Unfortunately we aren’t in 2007 and such theories do not work any more. Not sure what makes you think that Google trusts domains consisting purely of duplicate content. Does the word Panda ring a bell?

Guest posting on a selection of influencers and authorities within your niche is a great way to build a network which drives quality traffic to your site. The exposure and brand building you get from a well placed guest post is far more valuable than a handful of duplicate content scraped links and should that not be your primary driving factor in terms of selecting sites to target and not purely how many scrapes your article can achieve?

I can see why you have suggested the scrape rate but am questioning if you only select a guest post based on the amount of scrapes it can attract how it differs from using automated software to distribute a spun article (which was never a sustainable way to achieve good rankings, it was always gave flaky short term results).

Does a duplicate content link give value on a long term basis or is it just a short term boost (like article spinning)?

Though – it’s not really any different then just loading up something like article marketing robot and blasting out 200 article copies with your links in it. (Save the authority from the original post) The value of those sites and the value of scraper sites are essentially the same.

Also – the people saying links from scraped sites and spun content links are now dead and “we aren’t in 2007” – you’re hilarious. Please keep telling people this doesn’t work anymore.

I think we can agree to disagree, but I’ve personally seen results from this. What makes this different from article marketing is that the sites scraping it are usually both relevant and contain less duplicate content. For example, an article directory is usually marked as total crap by Google, because all of its content is usually duplicate or spun. But a stand alone site that publishes its own unique content and an-every-once-in-a-while scraped piece of content usually holds more value (sorry for the unforgivable amount of dashes).

Yeah that would make sense to me. I wasn’t dissing your theory ( although it does warrant more research) I was just trying to understand the relationship between that and article marketing. As you don’t need to be aaron wall to work out that Article marketing is as dead as disco.

Are you suggesting that links from duplicate content pass value and help rankings? That sounds like SEO back in 2008 when article marketing was thriving. The 1-2 week delay in the rankings may not relate to the links you got from the scrapers at all. This is just speculation but if that really works, Google sucks!

Thanks for the comment Lovett! I’m saying that they pass minimum value, but when found in great numbers, they have the ability to pass some value. Why? Because if your content is being scraped by 80 different sites just days after it was published, it’s telling Google that it’s content worth scraping.

Here is how I look at it. I suppose that anytime you are given the opportunity to publish somewhere that you know your post is going to earn a couple dozen backlinks instantaneously (whether from scraped content or not) then you should definitely take advantage of it. I just question how many opportunities (on different domains) you are going to get to do that.

Your idea of inventing a tool to measure the metric is a creative idea……I tip my hat to you. I just think that if someone is actually going to take the time to analyze whether or not they are going to submit a guest blog based on that particular metric then that person is guilty of analysis paralysis.

Why bother measuring it at all? If you have the opportunity to publish a guest blog post at a decent site that shares your same target audience, why in the hell would you pass it up? You would be crazy to pass it up even if the scrape rate was zero.

And, if you are guest blogging primarily for links, there are easy ways you can juice up a less juicy post on someone else’s website without crossing any black hat lines.