It All Started With Caffeine

If we look back a year ago, when Google rolled out Caffeine, which was (and still is) unprecedented in search, it was this infrastructure change that allowed for the dramatic algorithm improvements we’ve seen recently.

Caffeine was not an algorithm change but instead a massive improvement to the freshness of Google’s index and its ability to crawl and then index content nearly in real time.

But closely timed with these changes was the Mayday update, which specifically focused on returning quality results for long-tail queries. Ecommerce sites were impacted, as were any sites with an architecture built around item-level URLs standing on thin content and separated by several clicks from higher-authority pages (like home pages, major categories, or any URL with authority and unique content).

Then came Panda/Farmer. While Mayday appeared to hit a relatively small portion of the total query space, the latest version of Panda has a much stronger impact, hitting about 12% of all searches. As distinct from Mayday, which focused on long-tail quality and authority (penalizing shortcuts such as simply matching keywords to queries), Panda focuses on concepts such as quality, authority, trust and credibility, and also incorporates user signals.

So why does Caffeine matter so much? It seems that Caffeine, at least in part, has enabled these evolutions in the algorithm, through its ability to index such a massive portion of the web. Carrie Grimes from Google again:

“Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.”

In order to rank URLs appropriately, they must be in an index (after being crawled and fetched). These are distinct processes with their own sets of algorithms. Caffeine represents a new model in search, whereby the largest modern day index of the web has been created, in order to model the data accurately and rank pages based on content and social signals, as well as the PageRank equation signals Google has built its search engine upon.

What Does This Mean For SEO?

It has been common practice for many years to monitor overall site indexing in each of the major engines (mostly focusing on Google, naturally). Sites that weren’t being indexed deeply would need specific tactics to push that number up, and sites well indexed would be monitored closely to ensure that was sustained.

What’s different post-Panda is that indexing, as a metric or signal, is no longer viable, simply because Google seems to want everything it can get in its index. The index is not a signal of anything, anymore, except that Google has the URL in its databases.

We’ve seen several large sites which were impacted by Panda, and in each case, indexation remained fairly flat while traffic from Google organic search plunged 50% or more.

“I asked Google for more specifics and they told me that it was a rankings change, not a crawling or indexing change, which seems to imply that sites getting less traffic still have their pages indexed, but some of those pages are no longer ranking as highly as before.”

This is precisely what we’re seeing with Panda, as well.

Recommended SEO Approach For Panda

While most of what works now, has always worked, there is at least one important change.

The SEO model has changed with Panda in that, rather than getting as many URLs as you can indexed, you now want only your highest-quality, most important URLs indexed. Consistent signals should be sent as to which pages are most important:

Pay special attention to number 3 above. If your properties have low-quality or significantly duplicative content, it is best to remove those URLs from the indexes. Even a site with some high-quality content and lots of thin or low-quality content could see traffic deterioration because of Panda.

The new SEO, at least as far as Panda is concerned, is about pushing your best quality stuff and the complete removal of low-quality or overhead pages from the indexes. Which means it’s not as easy anymore to compete by simply producing pages at scale, unless they’re created with quality in mind. Which means for some sites, SEO just got a whole lot harder.

About The Author

Adam Audette is SVP Organic Search at Merkle. During his 15 year career in digital marketing, Adam has held various executive management roles on both the agency and in-house sides. He founded his own agency which was acquired by RKG (now part of Merkle Group). A well known expert on the topics of SEO and e-commerce, Adam and his team have helped drive growth for brands including Zappos, Amazon, HSN, Walmart, Urban Outfitters, UnderArmour, Google, Intuit, Apollo Group, Symantec, Edmunds, National Geographic, Experian and many other household names.

Sponsored

http://www.michael-martinez.com/ Michael Martinez

I think a few people might want to know the story behind that graphic, Adam. :)

http://www.rimmkaufman.com George Michie

Great stuff as always, Adam. Question: when you say ‘remove … low value urls’ what does that mean? For a retailer with 10s of thousands of skus, does this mean no following links to product pages, if those pages don’t generate much organic business? Does it mean publishers should no follow links to weak articles? I guess the question is: how much pruning are we talking about, or is the real issue focusing more love (onsite and offsite) on potential hero pages?

http://www.swansonvitamins.com Anthony D. Nelson

@George Michie- I think Adam is mainly getting at the removal of duplicate urls that contain various parameters such as session ids, source codes, etc. I see no value in no following links to product pages if the product pages contain unique content and a real product you are actually trying to sell.

I think your last sentence hits the nail on the head.

http://www.esizemore.com Everett

I have a real problem understanding why Google would say on one hand that they want to organize the world’s information, and on the other hand tell SEOs (via their aggressive algo changes) that they should work to remove content from the index. I have a site that is a database of products that have been recalled over the last five years. 99% of the content is old news, but that’s the thing about a database designed to archive something… I could noindex it but keep it searchable on the site (which is what I’ll have to do) but that doesn’t allow people to find it from Google, which is where they’ll be initiating their search.

Let’s put it this way: It would be like asking libraries to get rid of or lock up their microfiche because they’re all from old newspapers anyway and hardly anyone reads them…

http://www.audettemedia.com Adam Audette

@George — thanks for the comment. No ecommerce site (well, almost) should remove any product pages from the index. That’s not the type of “thin content” Panda is designed to de-emphasize. Instead, it’s the “made for SEO” thin-content-recipe-for-success (pre Panda):

1. Have fairly strong authority domain (lots of ways to accomplish, depending on goals and risk)

3. Worry less about content quality than the SEO recipe, churning out pages by the thousands. You’ll see directory lists designed to do the same thing, targeting a specific phrase (used to work great for geolocation + keyphrase type searches).

4. Build a few links into the deeper categories to push equity, but mostly just ride on the overall domain authority to get those pages to rank.

I would love to show you some specific examples. Later this month at the Summit :)

Deauville

Hi Adam

Interesting article and suggestions. They may well have an impact short term but are likely to be a temporary fix.

What we are looking at is having quality pages indexed and promoted, with possibly a significant proportion of lower quality URLs excluded. Isn’t this falling into the trap of taking action just because search engines exist, the type of tactic they tend to rail against in the end and surely will in this case. Site owners would effectively be asking Google to return a viable URL, so they can get people to their site and then find the poorer pages.

Unless the entire Google staff are missing a beat, this course will have been foreseen and covered in the not too distant future. Not so easy and a break with the past again in terms of the position on unindexed pages but no alternative, or updates such as Panda simply become a sham.

Doubtless ever growing user signals will play a part but wouldn’t be surprised to see more direct action in the not too distant future. The only answer being to have a site that does offer quality throughout, rather than a veneer for indexing purposes. Once again, not arguing with your analysis on “fixing” the present for those who wish to but seems fair to mention this isn’t likely to serve them long.

http://www.redmudmedia.com Ralph

I would probably have given this post a title like, “Five things we knew we should be doing, but didn’t get around to…”

Thanks for this post though because now I have licence to tear down all the dreadful over-optimization on some of my client’s websites AND I get to have another go at developing fresh content and organic links.

If they resist I shall simply point them in your direction for a second opinion.

http://www.audettemedia.com Adam Audette

@Michael – just saw your first comment. I’m sure they would… !

http://www.rationalfx.com Jake Holloway

I would be interested as to anyone’s views about affiliate links being a significant signal in Panda. I have seen data from a site that lost around 35% of it’s natural Google traffic overnight on April 11th (UK-based site). The URL class that got hit most, has 3-5 affiliate links per page, whereas other URL classes suffered much less or not at all.

It would seem logical to me that Google could consider affiliate linking as a sign of “low quality content”. And again it could look at situations where Site A outranks Sites B, C and D for a keyword – but where Users subsequently visits those sites (B, C, D) via affiliate links – as a sign that Users ‘prefer’ the content on site B, C, D. So Google could feel justified in ranking those sites higher as the result of this “ultimate user destination preference”. I think this is an even stronger signal if it’s only a one-page visit to the intermediary site (A).

How it looks at sites that run Adsense is another matter! :) I mean, surely (if the above is true) Google would punish Adsense the same as, say, TradeDoubler or AffiliateWindow!?

http://www.audettemedia.com Adam Audette

@Jake, you’re on to something there. Great insights. The affiliate link issue is something to chase down, and I believe it applies to ad spots as well. But rather than saying “this page has lots of aff links or ads,” I think the algo may be saying, “this page has thin content, AND the ad saturation is out of proportion” and penalizing it that way.

So it has something to do with the ratio of content to aff links/ad spaces. Thin content is bad and can hurt a site (especially en masse), but thin content with heavy ads is a recipe for disaster.

http://www.familyfriendpoems.com W.P.P.

Interesting article. Is there a story behind the screenshot, I didn’t know anyone had recovered from panda, please do share?
A question, does Google’s new focus on site-level quality travel across subdomains? I have a large forum on a subdomain with low level quality content that has collected for 4 years or so. Google has never paid much attention to it before besides indexing it, never sent any traffic to it and it has had no effect on my main site. Do I now need to noindex anything say older then a year, anything which may set off a low quality filter for my entire site such as terrible spelling and grammar? Also, what’s to stop scrapers from ranking for it if am blocking Google (and Bing?) from the content?

Roy Olders

It seems that your item 5 (Build high-quality external links via social media efforts) is getting more and more important these days.

http://www.tuneyfish.com Scott Golembiewski

Thats a great point, be aware of who you’re linking to because God errrr Google knows what you must be doing with all that deceptivity and yes I know thats not a word.

I am very new at this, so I hope someone can clarify what is meant by #3 above.

Does this mean?
1- remove the content and deliver a 404.
2- remove the content and redirect 301 to the best content on the same topic
3- remove the content from the sitemap
4- go to Google Webmaster and enter the URL to be removed

Or something else. thank you.

http://www.audettemedia.com Adam Audette

CH, I would test some things. Best options are probably using robots.txt or meta noindex, or even removing the content and returning a 404. Place the low-quality stuff on a new sub-domain or on an entirely new domain. Do not 301 low-quality content elsewhere.

http://www.q3tech.com Chrisopher Wilson

Nice article Adam. I want to know If Panda is about the Content duplicacy then is it a good way to publish the same article content on different sites..?

pramodchauhan1980

Really insightful information Adam.
Content always has been the King & it will remain so for long time..With the latest Panda update, it is quite evident that the smaller sites will have great deal of work cut out if they wanna rank any closer to the top…it’s not gonna be that easy..dirty black hat tricks won’t get them any where…

regards,
Pramod

http://www.ianswer4u.com/ penna topology

One of my friend’s sites which was hit by Panda update never recovered.. He used to get 20k visitors per day and now no matter he does they never go above 300 visitors per day…. On his part he was guilty of over optimising ans spamming

pandavictim

That is so right, we have 2 sites free computer recycling.co.uk and .com one on wix that was obliterated by panda and one by a local web company which they are trying to sort, they were number 1 and 2 and define our company name and what our company does, now they are in useless positions except on bing etc.
However as soon as we reinstigate pay per click no such problems with rankings, paying them £1.30 per click, as you say the paid advertising is not penalised, completely immorral.

Attend Our Conferences

Attend Marketing Land's SocialPro conference and learn fresh new strategies and tactics from some of the savviest brands and digital marketing agencies managing earned, owned and paid social media marketing campaigns across multiple platforms. Visit the SocialPro site to learn more..