The Google Panda Guide - Part 2: Machine Learning And The New Mindset

If you have been hit heavily by the Google Panda penalization, like MasterNewMedia has, one of the hardest thing to do is to understand what is the appropriate frame of mind to get into before starting to fix, modify or correct possible problems.

Photo credit: Eric Isselée - iStockphoto and Ashwin KA

Rushing to prune and modify content on your web site may not really get you the results you need, unless you a) understand what is really happening and b) adopt a new attitude about the way you are going to create value for your readers, before worrying about what the search engines want.

With Google Panda looming on the head of many international webmasters, the best advice I can share on how to prepare and recover from it is really not at all about the nitty-gritty of deleting this or avoiding that (though there is a good chunk of that to do too) but about helping you understand how deeply different is this new Panda thing from anything you have seen before and "how" to "think" when it comes to decide what to do with your site.

Let's start from the basics:

Google has realized that search results have been eroded by low quality sites, MFAs, spammers and scrapers and by those who have invested a great deal of time and resources (SEO work) in testing and exploiting Google existing search algorithms for the benefit of their sites.

To resolve this issue, Google has decided to do three things:

1) To "filter" automatically all of the sites that do not fit its new Panda algorithm.

2) To give increasingly greater credit to indicators and data that cannot be easily gamed. For example: if you look at the quartet of variables made up by Bounce rate, Time on site, Actions on page (e.g.: scroll, print, boookmark, click on ads, no-action, etc.) and Actions after leaving the page, you can get a much better idea of whether someone has derived a benefit from visiting that page or not.

3) To devise an algorithm that builds itself by continuously learning from new data and objectives it is set to go after. This way no-one at Google can really tell what specific variables determine the success or devaluation of a site, because these will be the fruit of software calculations and not of a specific rule and input provided by human beings.

These three points, represent profound changes to how Google has decided to manage low-quality content and to how Google is going to move gradually away from traditional gameable "signals" and into reading into the rich flux of user-data it has been collecting through cookies, toolbars, Google Analytics and Adsense accounts, Youtube accesses and more.

If you are curious and courageous enough to want to find out why these changes are so revolutionary and what would be the best attitude to face them, read on:

Google Panda and The Machine Learning Algorithm

Rand Fishkin - "They are using the aggregated opinions of their quality raters, in combination with machine learning algorithms, to filter and reorder the results for a better user experience.

That's a mouthful, but essentially what it means is that Google has this huge cadre of human workers who search all the time and rate what they find.

What they want to do is find ways to show things they like and suppress things they don't like. Google and has previously been reticent to do this across the board and use it as a primary signal, and have historically used this data only as a quality control check on the algorithms they write.

"

If you read again those words carefully, you can see how much the "search game" is changing and how much more difficult it will be for anyone to be successful by using the old classic SEO approach.

By looking under the hood at what this new machine algorithm is all about you can also understand why there has been so much collateral damage and why it is taking time for Google to further fine-tune Panda.

Even Google is aware of these dangers and is likely preoccupied by them, as Rand Fishkin further reports in this interview:

"Machine learning takes a bunch of predictive metrics and uses a neural network, or some other machine learning model, to try and come up with a best fit to the desired result.

I think one reason machine learning is slowly making its way into Google's algorithmic updates is they are uncomfortable with not knowing what is in the algorithm.

It's not as if you target specific sites like Ezinearticles and eHow, but sites that the quality raters identified as fitting into the eHow profile.

The challenge is to find metrics that will push those sites down, but keep deserving sites high. The machine learning algorithm will search across all data points it can, but it may use weird derivatives, for example, the number of times the page uses the letter x may have a super high correlation to whether people didn't like its quality so the machine learning algorithm pushes down pages that use the letter x. That's not an actual example but you get my point.

[This is why, from now on] You can no longer dig into the code and figure out which engineer coded into the algorithm that the letter x in pages means lower rankings. An engineer did not do it, the machine learning system did it. So, you know they have to be careful with how they implement it.

"

"I got the sense from the Wired interview and other writings that even Amit and Matt were a little nervous about how this works. I think they recognized that they hit some sites unintentionally.

The most frustrating part for them is that they don't know why the algorithm hit sites they didn't want to.

If the above is really the case, which I think it is, you now understand how major this change really is, and why I think it marks a critical departure from what Google has been doing to organize web sites since its very beginning.

Why You Need a New Frame of Mind To Face the Panda

If what I have learned in these three months is of any use, the first thing you need to do, is to change the way you think about how you should optimize your web site and the use of SEO practices.

Though the reporting from relevant web sites has gradually vanished and the main thread is overloaded by quite useless discussions, in the previous weeks and months there has been a lot of very valuable information published in this thread.

In particular, I was struck by one unique poster running under the nickname of Lyrical Question, who caught my attention with her numerous answers to webmasters complaining about the effects of Google Panda on their sites.

While I am not in agreement with several suggestions she provided elsewhere in that thread, there was a beautiful set of strategic tips that Lyrical Question provided in a few of her answers which I thought provided the best mindset from which to engage a Panda defense or recovery plan.

They all pointed in one specific direction: stop optimizing your site for Google. Optimize it for your readers!

Here it is in her own words:N.B.: The text that follows has been posted by Lyrical Question inside the Google Webmaster Central Forum - I am quoting it as is - with only a few grammar corrections - in the same thread I have asked Lyrical Question for permission to republish her content but have received no reply. I hope that the good work she has done can be made accessible to many others via this article without restrictions.

"Now - about building your site for your consumer and ranking well.

Why do you think algorithms happen? Or are created?

Because SEO people learned what moved sites to the top - things that really had nothing to do with the QUALITY of the site.

Things like... Oh say ---- Link exchanges... Or like Spun Blog posts... Or random comments on blogs that were spam...

etc. etc. ad infinitum.

...

The whole CONCEPT of being at the top of the line in Google is that your site is valuable - is relevant to the search and is worthy of being there.

Otherwise - in all honestly - Casinos and Porno, mesothelioma, JC Penny and Amazon would rank in the top ten thousand spots regardless of the input search request - because they paid for FORCED cheating.

Now - Google will continuously change its parameters. To continuously stop cheaters or blackhat seo practices.

Which means... one of two things....

a) You can spend all your time and your money racking your brain to figure out the steps to beat the algorithm every time it changes...

or

b) You can build your site to entice your visitors to remain, stay and enjoy your site once they land there.

The Story of Queen SEO

"Queen SEO wants to stay young and beautiful... She relishes her beauty in the mirror daily... She wants to be the ONLY young and beautiful queen ever....

And she wants to be the only one to show up in the mirror.

Queen SEO demands that MAGIC MIRROR MAKER give her the magic scroll that will make the mirror hers - and hers alone.

And she wants the magical potion to keep her young and beautiful.

The Magic Mirror Maker cannot give only one person the mirror - for that would be ludicrous. And if he should reveal the magic potion to QUEEN SEO - then the secret to the potion would be given away - and everyone would eventually be making the potion... and alas... if EVERYONE was young and beautiful then - youth and beauty would hold no meaning.

Sadly - the Magic Mirror Maker shakes his head gently and says - "My Queen - you must simply exercise, eat well and take care of yourself - allowing your beautiful personality to shine through... So that OTHERS may see your beauty... and all will flock to you..."

Unhappy that she must do work herself - QUEEN SEO screams ---- "OFF WITH HIS HEAD!"

Strangely - in your fairytale - you're demanding Google give you the recipe for the ways to rank.

If you have them - then everyone has them. Then ranking is done based on the "RANKING REQUIREMENTS" (or mirror or potion) --- instead of the quality of the site.

....

Make your website. Build it for your consumers/ viewers/ friends. Make it the best you can be.

Google is going to change the potion - on a continuous basis.

Unless you intend to sit staring in the mirror - to constantly get feedback - and recalculate the potion, then just make your site the best site there is.

If you do that - WITHOUT trying to decipher the potion content --- which IS going to change - because people are abusing it, then in truth, you should rank fairly well.

Google has listed most of the determining factors. You're not going to get the full recipe. So screaming about it isn't going to do any good.

Conclusions

Remember that most of the advice, this included, about Panda, remains for now all very speculative, as there are yet very little reports of web sites re-emerging from its penalty.

Other than this, if I were asked what to do or how to best prepare for it, the one that you have read above would be my sincere advice ...even if Google Panda didn't exist.

Whether you like this or not, if your web site largely depends on Google for its traffic, advertising or business needs, I strongly urge you to look in depth at what Panda has done so far, and to prepare adequately for its coming landing in your language territory.

In the next part of this Guide to Google Panda, I will share the details of what I have specifically done over the course of these three months to improve the quality of MasterNewMedia and recover from the Panda penalization.

From "thin" pages, to whole blocks of thousands of news articles, Panda has caused a little-seen but deep revolution inside my content, and in the next part, you will learn what steps I took to find and then to change, modify or get rid of all the problematic things I have found.