Duplicate Content Demystified

Most website owners have heard of duplicate content and know that it can cause Google to send you to the penalty box. But just in case you don’t know what it is, duplicate content is content that appears on more than one URL on the internet. It can be on one site, two sites, or many sites. But if it appears more than once, it’s duplicate content.

Duplicate content is not new to the publishing world. Magazines and journals often reprint stories from other publications (with acknowledgement, of course) and online magazines and blogs often print quotes from other articles. So what’s the big deal? Why does this lead to penalties?

Let’s say you run a website and you fall in love with a post found here on E3. You love it so much, in fact, that you want to rerun it on your own website. You ask us for permission, we grant it, you rerun it. There is nothing illegal or unethical about this practice. You simply found something we wrote to be extremely useful and relevant, and you want to keep your visitors on your site rather than directing them to ours.

But the trouble comes in when a search engine tries to rank the content. Which page should rank better for that article? Is it fair to rank yours higher even though it originated with us?

Mass article syndication and content scraping was duplicate content at its worst, and it’s what led Google to devalue article marketing as a viable link building practice and create duplicate content penalties. But because search engines are essentially robots, they can’t decide that your website and E3 shouldn’t be treated the same way as XYZArticleDump.com. Therefore, we must adjust the way we deal with duplicate content on our own.

Dealing With Article Copies and Reprints

Let’s stick with the example of your website loving an article found on E3’s blog. You want to share this article with your audience, while giving us the credit, but you don’t want visitors leaving your website to read it.

In order to ensure that this piece of content isn’t devalued on your own site and to shield yourself (and potentially us) from a penalty, you’ll have to utilize a canonical tag. This tag tells Google that your site isn’t the source of the content, but that it was generated on our page. Canonical tags indicate that you added the content intentionally and you want the weight of the article to stay with the source. You do this by adding:

This doesn’t mean you can build your own blog by curating content from other sources and simply slapping canonical tags on them. You can, but you shouldn’t. Why? This tag passes all link juice to the original source. So any inbound links you earn from reprinting our post transfers to us. Thanks! You want to copy and reprint articles sparingly so that you can build your own authority.

Duplicate Home Page: The Case of the Missing WWW

As users, we take it for granted that we can type in a URL without adding the “www” to the address. But what happens when the site for your company, ABC Business, can be found at abcbusines.com and www.abcbusiness.com?

Google can usually sort this out. But if you’ve been running a site for longer than a week, you probably know better than to place your faith entirely in the hands of Google’s algorithm. To eliminate this potential pitfall, use 301 redirects to direct the non-www version to the www version, or vice versa. These will permanently direct all versions of your site to a single destination, known as your preferred domain.

You might have inbound links that point to both versions. It’s important that you choose which version you want future links to point to, and that when you’re doing internal linking you structure those links to your preferred domain.

Ecommerce: A Duplicate Content Extravaganza

If you run an ecommerce site, you probably get product descriptions straight from your manufacturers.

…And so does every other ecommerce website that sells the same products.

The best way to avoid a penalty is to rewrite every product description so that it is original. This takes time and energy and nobody likes this answer. But original is always better.

This issue can also come into play when items that come in different sizes or colors each have their own URL. You’ll want to combine all versions of a product onto one page. You’ll then have to make each size/color accessible through a drop down menu. Finally, make sure you 301 redirect all the old URLs to the new one. Sound complicated? It can be if your CMS is outdated or clunky. Consider utilizing an SEO-friendly ecommerce platform like Shopify.

Ecommerce sites also run into duplicate content issues with sortable product lists. If a user can sort search results by price or popularity, each choice resulting in a unique URL, you’re going to be up against duplicate content. Why? Because the price and the popularity lists are the same, just in a different order. Here, canonical tags are the solution. Add the tag to all sub-pages of a category to ensure they aren’t flagged as duplicates.

Don’t Ignore Duplicate Content Issues

At some point in your tenure as a webmaster, especially if you run a popular blog or an ecommerce website, you’re going to run into a duplicate content issue or two. Fixing them can be time consuming and you may be tempted to let them go. And it’s true that one ding isn’t going to get your site banned, but it’s important to educate yourself on how to best avoid potential issues. Clean coding and vigilance can also protect you from content scrapers or other online ne’er-do-wells who would highjack your quality content for their own sites.

And when in doubt, remember that original is always better.

Have you run into a duplicate content penalty? How did you solve it? Let us know in the comments.