Duplicate content and SEO

There’s a common misconception in the SEO (Search Engine Optimization) field that duplicate content is this big scary monster. And that webmasters and bloggers need to be scared about it and make sure they never receive a dreaded “duplicate content” penalty. This is all nonsense! This article will explain the truth about SEO and duplicate content.

What is duplicate content?

This is an important question and not as simple as you may think. Many people confuse duplicate content and copied content.

Google understands that duplicate content exists everywhere. In fact, it is estimated that around 30% of the entire web is duplicated! That’s mainly because content might be duplicated on mobile versions of sites, translated versions of sites, published press releases that include article copy, old domains that didn’t get shut down, people copying and pasting articles into forum posts, and so on.

So there are plenty of ways that content gets duplicated. Duplicate content is everywhere and Google is ok with that.

There is no such thing as a duplicate content penalty

Lots of people think there is a dreaded “duplicate content penalty” they will get slapped with. That’s simply not true. Google has stated categorically that there is no such thing as a duplicate content penalty.

What Google is good at is understanding what is an original version of a piece of content, and what is not.

For example, if an article appeared on the Huffington Post in November 2017, by an established author who had written similar articles on that topic before, and then the same article appeared on a free WordPress.com blog by an anonymous author in March 2018, Google would obviously have no difficulty telling which was the original.

If it appeared in an article aggregator a month after the Huffington Post article was published, it also wouldn’t have any problem.

There wouldn’t be a “penalty” applied, it simply wouldn’t rank the clearly copied article. You may have noticed that sometimes when you get to the end of search results Google says something like “some entries have been excluded from these results because they are similar to the ones you’ve already seen”. That’s what’s happening.

You can even switch this off and see the duplicates. Google isn’t pushing those articles or domains down the rankings by a few places, it just isn’t bothering to show you repeated articles in its SERPs.

Google hates copied content

What Google really does not like to see is “copied content”. So rather than cross-posting an article or including in a content aggregator or something, where the words are exactly the same, this is where an article is very similar but not identical to another.

This usually happens because someone has put it through an article “spinner”, software that rewrites existing content to be seen as substantially different. Or someone has just copied and pasted and then gone through, changing some words here and there.

Google hates this and does penalise this. So it might seem strange, but Google is more lenient on identical content than copied content. If something is a word for word copy, Google will just figure out which is the right one and consider the other a duplicate. If it sees them as similar, it will see your page and probably your entire site as a shoddy content photocopier.

The solution is of course simple: write your own content! It takes more work but will help you, in the long run, to build trust and authority with your audience.

So hopefully you never have to worry about a copied content penalty because you’re writing your own content. But you might sometimes have duplicate content. There are some simple ways to deal with those.

Canonical can be your friend

It’s pretty smart in figuring out which is which. If you think it might be confused, and you want to make sure it understands, you can use the Canonical tags.

Like alt tags, Canonical is not actually a tag. It is a value on a “rel” attribute on a “link” tag. In the head of an HTML document, you can put <link rel=”canonical” href=”someURL”>. This basically tells Google “this isn’t the real document, the original one is there” (at someURL).

This can be quite handy. For example, I have a WordPress plugin that cross-posts my blog articles to Medium. They have identical content and come out at exactly the same time as my real blog posts. I’m 90% sure Google would figure out which was which, but just in case, the Medium ones have a Canonical reference in them, pointing to my blog.

And the good news is, the Medium integration plugin does this automatically when it does the cross-post! Which is very useful, since there is no way in hell Medium is going to let me edit the HTML of my articles there by hand.

Don’t forget 301 redirect

You can also set up 301 (Moved Permanently) redirects, to bounce Google to the right place. (Not sure what a redirect is or what 301 means? I wrote an article about how websites really work, that explains a lot of this stuff in plain English).

A 301 redirect does a very good job of ensuring the duplicate version. But keep in mind that this won’t just stop Google reading it, it will stop anyone reading it. THat page will be effectively erased from the web. Which might be what you want, of course. And unlike deleting it, it won’t start throwing 404 errors.

So 301s can be a pretty extreme solution (more so than Canonical link refs), but a good solution if you use them properly.

Summary

I hope you found this guide on duplicate content and SEO helpful! Do you still have any questions about it? Please leave them in the comments below!