Duplicate Content Advice For SEO From Google

Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar. Mostly, this is not deceptive in origin…..

It’s very important to understand that, as a small business owner, if you republish posts, press releases, news stories or product descriptions found on other sites, your pages are going to struggle to gain in traction in Google’s SERPS (search engine results pages). If your entire site is made of entirely of republished content – Google does not want to rank it.

Mess up with duplicate content, and it might look like a penalty, as the end result is the same – you don’t rank.

A good rule of thumb is do NOT expect to rank high in Google with content found on other, more trusted sites, and don’t expect to rank at all if all you are using is automatically generated pages with no ‘value add’. While there are exceptions to the rule, (and Google certainly treats your OWN duplicate content on your on site differently), your best bet in ranking in 2015 is to have one single version of content on your site with rich, unique text content that is written specifically for that page.

Google wants to reward RICH, UNIQUE, RELEVANT, INFORMATIVE and REMARKABLE content in it’s organic listings – and it’s really raised the quality bar over the last few years. If you want to rank high in Google for valuable key phrases and for a long time – you better have good, original content for a start – and lots of it.

Seriously.

Onsite Problems

If you have many pages of similar content your site, Google might have trouble choosing the page you want to rank, and it might dilute your capability to rank for what you do what to rank for. For instance, if you have PRINT ONLY versions of content, that can end up displaying in Google instead of your web page, if you’ve not handled it properly. That’s probably going to have an impact on conversions – for instance.

Google Penalty For Duplicate Content On-Site?

Since I wrote this article back in 2009, Google has given us some explicit guidelines when it comes to managing duplication of content.

Generally speaking, Google will identify the best pages on your site if you have a decent on-site architecture. It’s usually pretty good at returning specific duplicate content depends on a number of other factors.

The advice is to avoid duplicate content issues if you can and this should be common sense. Google wants and rewards original content – it’s a great way to push up the cost of seo and create a better user experience at the same time. Google doesn’t like it when ANY TACTIC it’s used to manipulate it’s results, and republishing content found on other websites is common tactic of a lot of spam sites.

You don’t want to look anything like a spam site, that’s for sure – and Google WILL classify your site… as something.

The more you can make it look a human built every page on a page by page basis with content that doesn’t appear exactly in other areas of the site – the more Google will like it. Google does not like automation when it comes to building a website, that’s clear in 2015.

I don’t mind multiple copies of articles on the same site – as you find with WordPress categories or tags, but I wouldn’t have tags and categories, for instance, indexable on a small site, and especially not targeting the same keyword phrases.

I prefer to avoid unnecessary repeated content on my site, and when I do have automatically generated content (like my news feed), I tell Google not to noindex it in meta tags or in XRobots (in this case – I AM probably doing the safest thing, as that could be seen as a scraper if I intended to get it indexed.

Google won’t thank you, either, for spidering a calendar folder with 10,000 blank pages on it, or a blog with more categories than original content – why would they?

Offsite Problems

…in some cases, content is deliberately duplicated across domains in an attempt to manipulate search engine rankings or win more traffic. Deceptive practices like this can result in a poor user experience, when a visitor sees substantially the same content repeated within a set of search results. Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has a “regular” and “printer” version of each article, and neither of these is blocked with a noindex meta tag, we’ll choose one of them to list. In the rare cases in which Google perceives that duplicate content may be shown with intent to manipulate our rankings and deceive our users, we’ll also make appropriate adjustments in the indexing and ranking of the sites involved. As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results. GOOGLE

If you are trying to compete in competitive niches you need original content that’s not found on other pages in the exact same form on your site, and THIS IS EVEN MORE IMPORTANT WHEN THAT CONTENT IS FOUND ON OTHER PAGES ON OTHER WEBSITES.

Google isn’t under any obligation to rank your version of content – in the end it depends who’s site has got the most domain authority or most links coming to the page.

Don’t unnecessarily compete with these dupe pages by always rewriting your content if you think the content will appear on other sites (especially if you are not the first to ‘break it’, if it’s news).

A Dupe Content Strategy?

There are strategies where this will still work, in the short term. Opportunities are reserved for long tail serps where the top ten results page is already crammed full of low quality results and the SERPS are shabby – certainly not a strategy for competitive terms.

There’s not a lot of traffic in long tail results unless you do it en-mass and that could invite further site quality issues, but sometimes it’s worth exploring if using very similar content with geographic modifiers (for instance) on a site with some domain authority has opportunity. Very similar content can be useful across TLDs too. A bit spammy, but if the top ten results are already a bit spammy…

If low quality pages are performing well in the top ten of an existing long tail SERP – then it’s worth exploring – I’ve used it in the past, but I am not keen on it today. I always thought if it improves user experience and is better than whats there in those long tail searches at present, who’s complaining? Unfortunately that’s not exactly best practice seo in 2015, and I’d be nervous creating any type of low quality pages on your site these days.

Too many low quality pages might cause you site wide issues in the future, not just page level issues.

Original Content Is King

Stick to original content, found on only one page on your site, for best results – especially if you have a new/young site and are building it page by page over time… and you’ll get better rankings and more traffic to your site (affiliates too!). Yes – you can be create – and reuse and repackage content, but I always make sure if I am asked to rank a page I will require original content on the page.

There is NO NEED to block your own Duplicate Content

There was a useful post in Google forums a while back with advice from Google how to handle very similar or identical content:

“We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methods” John Mueller

John also goes on to say some good advice about how to handle duplicate content on your own site:

Implement the rel=”canonical” link element on your pages where you can. (Note – Soon we’ll be able to use the Canonical Tag accross multiple sites/domains too.)

Use the URL parameter handling tool in Google Webmaster Tools where possible.

Webmaster guidelines on content duplication used to say:

Consider blocking pages from indexing: Rather than letting Google’s algorithms determine the “best” version of a document, you may wish to help guide us to your preferred version. For instance, if you don’t want us to index the printer versions of your site’s articles, disallow those directories or make use of regular expressions in your robots.txt file. Google

but now Google is pretty clear they do NOT want us to block duplicate content, and that is reflected in the guidelines.

Google does not recommend blocking crawler access to duplicate content (dc) on your website, whether with a robots.txt file or other methods. If search engines can’t crawl pages with dc, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where DC leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools. DC on a site is not grounds for action on that site unless it appears that the intent of the DC is to be deceptive and manipulate search engine results. If your site suffers from DC issues, and you don’t follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.

Basically you want to minimise dupe content, rather than block it. Google says it really needs to detect an INTENT to manipulate Google to incur a penalty, and you should be OK if you arent doing this, BUT it’s easy to screw up and LOOK as if you are up to something, and it’s also easy to fail to get the benefit of proper canonicalisation and consolidation of relevant content if you don’t do basic housekeeping, for want of a better turn of phrase.

Advice on content spread across multiple domains:

Reporting News

Content Spread Accross Multiple TLDs

Mobile SEO Advice

Canonical Link Element Best Practice

Google also recommends using the canonical link element to help minimise content duplication problems.

If your site contains multiple pages with largely identical content, there are a number of ways you can indicate your preferred URL to Google. (This is called “canonicalization”.)

Google SEO – Matt Cutts from Google shares tips on the new rel=”canonical” tag (more accurately – the canonical link element) that the 3 top search engines now support. Google, Yahoo!, and Microsoft have all agreed to work together in a

“joint effort to help reduce duplicate content for larger, more complex sites, and the result is the new Canonical Tag”.

You can put this link tag in the head section of the problem urls, if you think you need it.

I add a self referring canonical link element as standard these days – to ANY web page.

Is rel=”canonical” a hint or a directive?
It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Yes, relative paths are recognized as expected with the <link> tag. Also, if you include a<base> link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel=”canonical” returns a 404?
We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel=”canonical” hasn’t yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.

Can rel=”canonical” be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel=”canonical” designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.

Can this link tag be used to suggest a canonical URL on a completely different domain?**Update on 12/17/2009: The answer is yes! We now support a cross-domain rel=”canonical” link element.**

24 Responses

Well I’m going to to continue blocking duplicate content. First, Google may be the biggest and most important search engine to focus on, however they’re not the only ones. A while back they made a deal with Adobe to claim that Flash was now more SEO friendly. Anyone who actually used that financially motivated marketing hype as an excuse to change their anti-Flash views was a fool. And Google does not state anywhere (nor can they nor will they) that content you want kept out of the SERPs is guaranteed to be kept out by the new recommended methods. In fact, they say “In cases where duplicate content still leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools.” Well guess what – that is the most arcane and pitiful directive I’ve seen in a long time. If I want to keep the googlebot’s grubby digital hands off of certain content, I’ll do it the intelligent, proven way and not rely on Google hacks and their guestimating algorithm…

Hi Shaun, I rewrite my articles before syndicating to article directories & other blogs. but I have to admit it’s a huge pain. What’s your view on the value of a link from a page regarded as duplicate content. So, for example, imagine I put the same article on a load of article directories and many of them end up in the “we have omitted some entries very similar…” bit of search. That’s OK (I prefer the original to rank, of course), but do the links back to my site from these pages still count? Ian

Hi, I understand your line of thought but wonder how I can write unique content on technical details on products. For example I sell shower pumps and have added content explaining when and how to use. These daitails can be found on many web sites including manufacturer’s sites. no scoope for original content! Regards George

I think that if you have a new/young site, the worst thing you can do is providing duplicate content. This practice can “kill” your SERPS. But if I create a page in which i copy another page and buy a bunch of links to that page, could I rank better than the original page?

Nice uncomplicated post. I have one page that is ranked by Google and is number one oganically for Emotional Intelligence coaching. I can see I need to put the same love and attention into the other pages to get them recognised. I downloaded sometime ago your free book, I need to schedule an appointment wiht myself to read them. Thanks for sharing, Joseph

Just wondering on the topic of duplicate content whether Google treats copies of pages translated into different languages as duplicates? For example, client designed site in english, we’ve sorted keywords, titles, descriptions etc He was just going to get it all translated as is, into Spanish, (it’s for an english painter living in Spain). including keywords,title etc. Will this account against him?

Hi, Shaun! Wise words! But I have a question: even in HOBO, you have partial duplicate contents between main page and article pages. This happens because the beginning of articles are also shown in main page. Same happens in my blog. Is that a problem? I notice that Google first shows results of my main page, before showing article pages. If so, how can I avoid that? Thanks!

Shaun, On my website http://www.pimlico-flats.co.uk I have lots of pages where a flat is described – these pages are often identical because the flats are identical. Also I use 3 different pages ( see http://www.pimlico-flats.co.uk/rent_london_flats_75_11_b.html and http://www.pimlico-flats.co.uk/rent_london_flats_75_11_a.html ) to create the picking a view effect. Is any of this causing a problem for me?

@Nick – Possibly-I’d consider 301ing old pages or using the canonical tag if you have a lot of dupe contentpages. @Adelson Google knowsthissiteisa blog. It can handle that kind of dupe content no problem. @pippa I’n not 100% up to speedwith translated duplicate content but if you are going to the trouble to have it translated, why not rewrite a little. @George that’s no excuse! lol If I had your site every product page wouldhave unique content-even padded out by an article writer. @Ian – It’s a low quality link at best and at worst an invitation to a keyword ranking filter in my experience.

I am considering developing a site for a colleague who has been sending emails out based on links to articles featured elsewhere on the web. Articles will be appropriately referenced. The site aims to be a good reference source for the email base , but would like to have a platform to evolve the site with respect to organic search listings. Is there a way of isolating the duplicate content from being evaluated by Google ? Thanks,

Shaun – these aren’t old or redundant pages, they are pages that describe different flats, because the flats are identical the pages are very similar. This can’t be an unusual situation, lots of companies carry similar and identical products. Great blog BTW I follow, but can’t always find the “Leave a Reply” Box so don’t comment as much as I’d like to. I wanted to comment yesterday about WordPress Plugins but there was no reply box. Today I can’t even find the blog! I wanted to recommend LinkWithin & Zemanta as Plugins.

HI NIck – I close old posts just to keep my spam down and the focus on new discussions :) Yeah I had a look at the pages you linked to ;) and saw that. I work on some sites with similar issues, but I try to ensure an internal navigation structure emphasises at least one of those type of pages (even a sort of category page that would lead to those detail pages) so at least to ENSURE one page has a chance to rank for the terms you are optimising for. As I say in the article, NEAR duplicate content is not always a bad thing, but you need a sensible strategy. If it’s not clear which page ideally you want to rank for a certain term, how can you possibly expect Google to pick it out? But as I say, near dupe content can be of limited use in supplemental, or low quality, results.

How bad is framed pages? I have a section of my homepage that is framed and a news feed comes through. How much text exactly should i have? I see a lot of top ranking sites in my category with very little wording on there home page , they just have links and search boxes, pictures, ect?? Thanks and these daily tips have been very helpful

@ Drew depends on your intent. I’ve heard of Google penalising such pages because of a LOT of content in hidden scroll bar DIVs (like Frames). I have always avoided introducing hidden text and elements to pages as I think if you dont think it is important to display on your page prominently, why would Google think it’s a valuable addition to the page? I wouldn’t ever build a site with HTML FRAMES of course. @Asif – Duplicate content ranks – it depends on how you implement it and howitis linked to. It’s notreally a strategy I like to do though for real sites.

Hey, this is my first post and hope im doing it right! I worked at the Priory before starting up on my own and i did their website and as many pages were about the same thing; depression, addictions etc a SEO charged us a fortune just to point that out, so thanks for doing this for free!

I wonder how do duplicate content checking algorithm really works. And how do the search engines really figure out if the words has just been replaced with their synonyms. Moreover duplicate content on multiple sites. Is it really feasible/possible to figure out it among multiple sites if they have been altered slightly.

Hi Guys, If I use a forwarding URL from someone like godaddy.. which I think uses a 302 redirect…. will the SE’s look at the forwarded URL as a duplicate site? Ex: if I forward a aaa.net to fff.com and the aaa.net is just a domain, no pages or content. Thx Dave

@George…. I had a client once with a clean site, he was offering products that came with manufactures descriptions. The site was ranking well for many keywords but for some reason we just couldn`t get one particular SP to rank. I ran the page through Copy Scape and found that the content was flagged as duplicate. I asked the client to modify the content. He explained tat because it was the manufactures content they required it to be exact. Long story short, he finally got permission to change the content and just like that, the ban was lifted, the page was indexed and within 2 weeks was ranking in the SERPS. I think this is a common mistake people make with shopping site. Even if you are reselling a product, your own website content should be unique! If your manufacturer has a description, explain to them that for SEO value your content needs to be unique.

Hi, On the line for line note: if you have say one duplicate paragraph that changes only slightly on a lot of your pages will that be better? or will it still get penalised? Thanks, Jen Web Design Wales Graphic Design Wales

I love your post, always good to hear something that I myself have found. Young sites definitely are more effected by dup content. I get incredibly frustrated with Google regarding duplicate content. I get clients homepages ranked well, then because they are top ten, scrapers copy their metas and words including the first instance of the search phrases. Since the client websites are in general new, the duplicate is enough to knock them off the ranking perch. It has even happened with a powerful page on my own SearchMasters.co.nz site – I was ranking top ten for “marketing” on Google.co.nz. Scrapers copied my content and my page dived to not in top 1000. I made the content unique again, and the rankings came back up to top 20, now 40th. Unfortunately Google seems to have a memory of such dup content that even after the content has been made unique, it holds a black mark against you. While you say that “Don’t unnecessarily compete with these dupe pages by always rewriting your content”, I would rather be safe than sorry. And so I am often rewriting meta descriptions and first instances of search phrases on my own and client pages. Shopping cart type sites are that much easier when you have a formula generated opening paragraph. Just change the formula, and wholla you have unique opening para’s again.

I just found out that my site is being scraped constantly by someone who now has my entire site (albeit in a static version) available on a different url. Should I be worried? I am a little as Google doesn’t seem to be crawling my site that strongly lately, with some new pages I put up not being revisited for around 2 months now (I’ve got sitelinks so I presume that Google considers me a decent site at least)