Blog

How Did I Beat The Duplicate Content Game?

(There have been a few updates to this article at the end; the title of this article has been changed to reflect all the data. I highly recommend you read the comments as well).

Yesterday I posted an article on quick link wins from Moz’s new Fresh Web Index. I happened to catch the announcement of the tool and tested it immediately. I wrote up a quick post about an hour later. There were comments from Twitter, inbound.org, and my own blog about how fast I produced the article.

So, being that it was quick, I wondered if I had a shot ranking for Fresh Web Index this morning. After all, Google’s all about speed, and QDF now, right?

Unfortunately, my domain didn’t make the first page. But two sites who republished my article did. My post was the canonical version – Google is supposed to figure that out, right? Especially since my page was indexed before the other two. Let’s look at this deeper.

I get republished by Business 2 Community. They hand-pick posts from my feed that might suit their members. Yahoo is a publishing partner of B2C, so they again publish some of B2C’s posts. If you look at the image above, both those domains are ranking for my article. Authorship didn’t help me here (not that I expected it to), and the links back to my site didn’t clue Google in. Nor is there a canonical tag in place by B2C or Yahoo. From the looks of it, I appear to be beaten by sheer domain authority. Not only that, I appear to have been completely filtered out of the first 100 ranks.

To me, this is Google doing a poor job.

So it got me thinking – what else can I do to signal to Google that my original post should be shown in place of one of these re-publishers? I could ask B2C to remove my posts, citing duplicate content issues, but I like the visibility I get there (and on Yahoo).

The Long Shot

If you look at my single post pages, my template actually removes the time stamp. It has the date, but not the actual hour the post went live. Could that be the magic bullet to get Google to value my original post higher?

As of 10:20am (of day 2), I have coded the time stamp into my WordPress single-post template. Again, I think this is a long shot. Because it’s easily faked, would Google actually factor that?

Now we wait to see if Google actually pays attention to the posted time. I’m also going to “fetch as google” and submit to the index again, since some think that might work as an old-school ping. Can’t hurt.

Day 2

Success. Google decided to list me on the first page today (a fresh cache is listed for today, March 8th), right under a great post that came out by Rhea at Outspoken Media. The Yahoo listing still exists, but the blended News listing (Business 2 Community) has dropped.

So other than adding the time stamp (my long shot), what changed?

Well, let’s check FWE to start. According to the tool, I got two new linking root domains (aside from the Yahoo and B2C) link. One is from the result right above me, the strong Outspoken Media. Clearly as I sing FWE’s praises, I know it can’t catch all the links out there. There may be more. Additionally, Yahoo and B2C probably received links too (at this time, it’s still too soon to see in OSE, Majestic or ahrefs).

Second, since the news vertical dropped off, it could have specifically been my barrier to entry. While that algorithm runs differently to Google’s general search algorithm, I could understand where an IFTTT type of scenario occurred. By rule, possibly Google says, “if three of the same post appears on a page, then kill the least authoritative.” If the freshness of the news vertical times out, maybe my site is granted it’s appropriate return. This still doesn’t speak highly of Google’s internal canonicalization abilities.

So What’s My Best Guess?

Correlation doesn’t equal causation, so I have to go with my gut until I can get more information. Currently I suspect the answer lies in one of the above three explanations.

I’m publishing this post now, but expect to come back to it as I think a little more through it. Would love to see your thoughts in the comments!

Update 3/28/2013: Well, it’s been about a month, and my page no longer ranks for the term. The Yahoo duplicate content listing still does (on the first page as of this writing). It looks like the QDF and any internal canonicalization Google may do has worn off. Some of the web pages now dominating are strong, unique pieces. Some are low quality.

Quite disappointing. FAIL… and updated the title of this post accordingly

At the very least, hopefully this post is useful for someone in the same situation to understand more about how Google is currently processing through this issue. I urge you to read the comments, as more information is contained there.

Update 11/17/2013: Much time has passed. I’ve been noticing that duplicate content issues have seemed less and less dangerous for some of my clients. In the past couple months I saw Google start getting it right for two clients in particular, who struggled with some of the same issues I noted above.

I remembered this post and decided to do the query again. Now the duplicate pages are completely out of the index, and my URL is the first (and only ranking) piece. It came back. I’m quite pleased, actually.

It looks like Google may have gotten its act together a bit more in the recent months.

Ex-big agency guy, now focused on helping small and medium sized business. I've been practicing SEO since 1998. I started the SEO practice at a major digital agency owned by eBay and helped develop SEO products for one of the largest ecommerce platforms. I'm a proud member of the Philadelphia SEO scene. I'm passionate about search, writing, UX, CRO, and psychology in marketing. Read More

Comments 8

Keep testing! Better yet, make ‘em work for it. Contact both sites to get a legit author account setup. Tie that to your G+ account. Also, provide them with a byline that has a backlink to your site and request that all republished posts link back to the original. They’ll usually oblige, because they know they’re building their business on your content and if they don’t, you kill it or only give them a snippet of your feed vs the full! My ideal is to leave the content be and just make it work for you by building backlinks to your site.

Brent Nau

This has always been an issue for Google. It all comes down to the crawler and which site it hits first. Of course higher authority domains are going to be crawled at a more frequent rate than a personal site. If even with a link back to the original source the crawler is going to store that URL and will crawl it later. Google does allow the cross domain URL canonicalization tag, which you should try to negotiate with the publisher of your content to prevent this from happening.

Interesting…
I think it has more to do with authority than the time stamp, though.
I used to see something similar happen when using the old Digg. If I Dugg a relatively fresh article and used the same title as the original, the Digg would rank much higher than the original article since Digg is much more established than where I was publishing the original. Within a day or even a few hours, Google would put the original on top. It is as if Google makes a quick judgement based on authority, then sorts it out later after it follows the links.
Haven’t seen that kind of thing in a while though. I would have expected the canonical to kick in immediately.

The time stamp was a long shot that served to get me thinking about the whole phenomenon, and though in the distant past something like that could have been more possible, I also don’t think it was the time stamp that brought me in.

If your theory is right – and I tend to agree – its a flaw in Google’s logic, but one you possibly wouldn’t need to worry about most often if you’re willing to wait it out.

But it also tells me that I could exist as a content syndicator (or scraper) in today’s index if I just build huge domain authority (like Yahoo).

I tend to think it has to do with crawl priority and incremental indexing. You might not have been the first to be crawled depending on how quickly those two were syndicated.

Even if you were, the ability to apply those to the ‘main’ index might be delayed. Google’s gotten amazingly good at incremental indexation but I’m unsure if those increments apply the same rule set.

In short, during that first window it’s a bit wild and the site with the highest trust and authority may win. But once Google applies all of the rule sets (digests the canonicals, looks for attribution, does a timestamp analysis etc.) the results then change.

I can’t say this is how it really works (for obvious reasons) but in watching how things propagate and piecing other things together I think it’s something like this.

Bill Sebald

> In short, during that first window it’s a bit wild and the site with the highest trust and authority may win.

That was the big takeaway for me. Disappointing that Google, who talks about QDF may not be great at getting this right immediately. I’ve actually heard (since this post went up) how a few others have had the same experience, so this certainly doesn’t seem to be isolated.

A Philadelphia SEO Company

We are SEO Consultants based out of the Philadelphia and Reading, PA area. Greenlane Search Marketing has been an SEO company since 2005. Contact our company for more information about search engine optimization, inbound marketing, analytics, content marketing, and digital strategy services.