Tips & tricks to help you get the most out of your online advertising.

The Evolution of Ranking Signals: Google Is Getting Past the Link

There's always been a fair amount of discussion that Google's reliance on links is the bane of the internet. From the insanely dynamic tin foil wearing crowd to those that muse. From the link economy to the spam that litters social sites and blogs. The never-ending thirst for more links has turned some people I know into freaking link machines (yeah, I'm lookin' at you – BOOYAKA).

But why? You get the impression from some folks that it is some evil master plan or sheer bumbling and incompetence. I submit to you that those people have never built a search engine of the magnitude of what Google is today. It is a certainty. And let us not forget that Yahoo and to a lesser extent Microsoft have also had the same apparent link addiction. But can no one tell me WHY?

The Spam Funnel Effect

Now, from first glance at the situation, we can observe one compelling reason: it can be easier to address spam. Yup, it may seem counterintuitive that this factor many blame for the rise of spam can within itself be a weapon, but it's true. Consider that when you have a signal that outweighs the others, it becomes a magnet to spammers. In short, you need links to rank. So do the spammers.

You (as the search engine) know exactly WHERE to look for spammers because they are invariably drawn to these points of value. If there are dozens of high-value signals, you will need to have dozens of spam bots patrolling and correlating data. If you limit the scope, you limit the places you need to look. I call it a spam funnel, and it's a smart way to do things when faced with limited technology/resources.

It was effective for ranking/popularity signals and availed itself to a logical spam funnel as well. Once your spam bots flag an entry, further resources can be sent to look for secondary spam signals to make a final valuation of the page/site in question (more on spam signals here).

That was then, this is now

That is a logical concept. It still doesn't fully explain WHY. For that let us look at some of the issues that lead to this point. Or at least some thought on potential aspects leading to the addiction.

Does not compute – right away we have to remember the speed of search evolution. With link-centric algorithms such as PageRank at the core, they have undergone a massive evolution tied to this approach. There simply weren't other approaches that worked as well on indexing, retrieval and spam reduction levels. Even latter evolutions such as Personalized PageRank did little more than adapt more dynamic signals to the linkscape. Fact: it works.

More power Cap'n – the next consideration is that the more signals you have, the more processing you need. Not only for indexing/retrieval but, once more, the spam reduction as well. Just because they CAN come up with new signals doesn't mean implementing them all is feasible in a large-scale implementation such as Google.

Or at least that's part of the journey we've travelled in the search world. Each year there are new methods and technologies which can be leveraged. There is every indication that we're getting closer to a world where links might actually get some competition.

Gospel of the Geek

Along the way Google has looked at speeding things up on the processing side with elements such as Personalized PageRank, which was touted as a faster, more efficient way of calculating PageRank. This was also an early glimpse of the incorporation of user data and personalization. This is important when we bear in mind the need for better processing and more signals, while reducing spam detection requirements (which personalization can do).

Those long-time readers of the trail will know my ongoing interest and frustration with the world of behavioral metrics. While it often seems intuitive to the casual observer, signals gleaned from user interactions are often noisy for search engineers (bounce rates for example). It isn't always self-evident given individual factors, what exactly the data may mean. From many studies done it seems that one would need a larger number to get any type of truly meaningful data.

Add to that the problem, once again, of policing new signals in an effort to reduce web spam, and we can start to see the problems that would require greater processing.

We've also seen more in the area of query analysis as we've seen with the QDF (query deserves freshness), real-time search efforts and other temporal data elements over the last 4-5 years. These approaches combined with rise of social means the potential was there.

It simply requires better processing abilities. Just look at the myriad of ranking signals available here.

Time to wake up

What's changed? Well, for me at least, it all began in 2006 with the Big Daddy infrastructure update at Google. This is when the real depth of behavioral elements, in the form of personalization, came into their own (supposition peeps, only the Googlers know for sure). It continued along on the trail towards link liberation, which only became more possible with the rise of social networks/media that gave even more possibilities.

We're now looking at the next major infrastructure update (that we know of) in the form of Caffeine. The interesting part about this one is that it has been accompanied by an apparent need for speed. To me, this further implies that Google is looking to get the most from their systems to deal with ranking/indexing methods beyond PageRank centric approaches.

One thing we know already is that the recently implemented "personalization for all" signals a move toward behavioral and social signals (implicit and explicit user feedback) beyond anything we've seen before.

SEO is still dead, or is it?

So, what's it all mean? There is every reason to believe that we're at the dawn of a new age where links aren't the only valuable signals. Of course, I might be wrong, but it does seem to add up. In the near future I'd say that more efforts will be made to strengthen the value of other factors, primarily:

Temporal data – Historical elements can be valuable in many ways from indexation decisions, query analysis and retrieval as well as spam detection. Sure, they've been around for some time, but deeper usage seems to be important in the direction we're headed.

Behavioral data – Given the ability of personalization to curtail spam (you can't spam yourself, there's no point) we can assume that this trend will continue. Stronger infrastructure may be in part to enable a wider delivery of these features.

Social data – Google has been fond of using the social graph, and if we look at the above points, it will play nicely with them. Unlike the more spammable "real-time" data, social data is far more trust-able. Once more, you also won't keep people in your network that are obvious spammers.

And there is actually so much more. Those are just a few that are more obvious ones. I guess the main point I am trying to make is that there is every reason to believe we're going to be seeing a shift in how Google goes about (organic) search. It seems we might just have to re-align ourselves with strategies beyond mere links, links and more links. We haven't even touched on the improvements in relevance and semantic analysis computation.

It is yet one more reason to remember that as search engines evolve, so will SEO. The demise of SEO has consistently been wrong and I can guarantee that if anything, it will become more challenging.

What do YOU believe the future of search (and ranking signals) holds? Sound off in the comments!