How Search Engines Rank Web Pages

Once upon a time search engines looked at words. Over the years the movement has been towards concepts. Or as the new mantra goes, "things not strings."

Yes, indeed, the words on the page do come into play, but more often than not, it is more about identifying the concepts of a page or a website. This helps them better deliver results in a commonly more personalized world.

The goal of this article is to give you a sense of the core concepts used in modern search. This is not a guide to Google. Nor Bing. It is a starting point to better understand the landscape so that you might venture out to discover more.

Corralling the Wild, Wild Web

While this journey is more about the elements of ranking a page or a site, one really can't get to that point without the page actually being found. The two obvious elements are;

Crawling: The ability for the search engine to get around the site

Indexing: Actually getting pages into the search engine's index.

These days most of this can be handled by automated tools provided by the major search engines. At least to let them know you exist.

What's not quite as evident is the level and depth of the crawling and indexation given to a website. This can often be attributed some various on-site and off-site factors we'll be looking at shortly.

Google's Matt Cutts has a nice video worth watching;

Understanding Signals

First things first: signals. It is strangely a commonly bandied about term in the world of SEO. But often misunderstood.

A search engine can use signals for many things including categorization, geo-localization, behavioral, demographic, and more. Not just for ranking purposes. Some might be used as signals of quality (task completion) while others used in display elements in the search results.

Where things get interesting are the various page-level signals and site-wide signals. How a search engines "views" your site on the web. In the strictest understanding of "ranking factors", these might not always be considered, but indeed are important concepts.

In simplest terms these can include:

Site-Level Signals:

Authority/Trust

Classifications

Internal link ratios

Localization

Entities

Domain history

Thin content

Page Level Signals:

Meta data

Classifications (and Localization)

Entities

Authority/trust (external links)

Temporal signals

Semantic signals

Linguistic indicators (language and nuances)

Prominence factors (bold, headings, italics, lists, etc.)

Off-site Signals:

Link related signals

Temporal signals

Trust elements (known by the company you keep)

Entity/Authority; citations, co-citation, etc.

Social graph signals

Spam signals (that might incur dampening)

Semantic relevance (of the other signals)

Please do bear in mind, 'link related' doesn't mean PageRank. Links can send a variety of signals, methods like PageRank, being just one.

While we'll avoid the specifics, you can get a sense that there are a variety of signals a search engine might use to understand what a site and/or a web page is about. And of course what types of elements might be used in scoring search results.

The Land of Graphs

Next stop in our journey is to start understanding some of the various classifications, categorizations, relations, and correlations that a search engine might employ. These days we tend to think of them as "graphs".

Some of these include;

Link graph: Most commonly known one to SEO professionals, all the links to sites that determine relevance, authority, and trust.

Social graph: Connections, topicality, behavioural data, etc.

Entity graph: People, places, things, events, etc. (named entities).

Knowledge graph: Information related to entities.

Term and taxonomy graphs.

Where this can play into rankings is that Google categorizes the relations and can score and re-rank web pages through these graph relations. Consider things such as co-citation, topical link graphs, social associations toward authority, and much more.

Get the idea? Yes, I know it can start to get confusing. But for the purposes of this article, we're keeping it simple and moving fast here. Just fire up the desire to learn more. That's the goal here.

What I hope is emerging is that there's a whole lot going on under the hood. There's an entire article that could dig into each of the points above. Never fall for the simple answers when trying to understand what's happening to your site and your target query spaces.

Understanding Ranking Mechanisms

So now let's move along to why we're here. To get a better sense of how a search engine might rank web pages in today's environment. Again, I am trying to get you moving down the tracks to wanting to learn more. This is the tip of the proverbial iceberg.

Simple concepts:

Scoring: The major search engines use hundreds of factors nestled into many algorithms. Think about it like an onion and it's layers. All too often, people say things like "the Google algorithm" when in fact, there are many. The scoring over all of them makes up the initial rankings.

Boosting: This is another element or signal that might raise a page's position in the rankings. One example is a statement Google made that fast mobile sites are given a boost in mobile search. Various forms of personalization also use a boosting element to re-rank results.

Dampening: Not to be confused with penalties, a dampening factor is an element that would lower the rankings of a web page after the initial scoring process. One example is the now infamous Google Penguin or Panda algorithms. While it may seem like a penalty, it is in fact a dampening element.

Now we're getting somewhere right? We consider the various on site signals, off site signals and graphs (link, social, entity, knowledge). Within each of those, there are ranking and scoring mechanisms at play that are affecting where a website appears in the results.

Personalization and the World of Flux

While the path so far may seem daunting, it's about to get a little more convoluted

Once upon a time, life was simple. We had 10 blue links. Those links were by and large stable no matter who searched it nor where they were at the time. No longer.

The shift started most notably with Google back in 2004. At the time it was in Google Labs and users could identify preferences. By 2009, they unleashed it on the world for all users. That flavor used search history and behaviour to adapt the results.

Today, we have a few different types of re-rankings and results based on the end user including:

Behavioral: Based on search history, query reformations, last query, etc.

Social: The rise of the social graph has lead to logged in personalization.

Some elements are tied to user accounts that are logged in, (social, behavioral) while others, can be in any state (such as geographic). We've seen instances where searching the term [hockey] produces personalized results tailored to the region, regardless if the user was logged in.

This means that at any given time, the results and rankings can be drastically different from one searcher to another. It is, in large part, the reason we've seen ranking reports become less important in the world of SEO.

The Web Spam Connection

While we might not ultimately consider it as a ranking element, web spam does bear a mention along the ride. While it may not be something that can increase your rankings, it sure can lower or demolish them. So may for the sake of this article, we might consider them as "ways a search engine un-ranks a web page".

Web spam is the term used to those that seek to manipulate the search results. As with all things in information retrieval and SEO, there are a multitude of methods they might use and associated scoring that we don't really know. But at least Google gives us this handy page to get a sense of some things that might be in play.

The point is that we should always consider what affects a ranking of a web page from a negative as well as a positive standpoint. I prefer to leverage the many factors we looked at here today more so than manipulating them. If you do this, you probably won't ever have to worry about the web spam issue.

No Pretty Bow, I'm Afraid

And so where does this all leave us? I know you might have come to this page looking for a definitive guide as to how you can go and start making bucket loads of cash ranking your pages. The reality is that it is never that simple. Indeed, SEO isn't dead, in fact, it becomes more complicated all the time.

Any one section of this article could be broken up into a post of its own. In fact, many. What we tried to do for you here today was to give a sense of what goes into the process.

There is no one way to attack SEO. Given what's involved in how modern search engines rank pages, you need to think on your feet as to how you can appeal to them and the strategy that will work for your situation.

This is your starter guide... now go forth and learn even more. Then, and only then, will you become a kickass SEO pro.

About the author

I am an avid search geek that spends most of his time reading about and playing with search engines. My main passion has always been about the technical side of things from a strong perspective rooted in IR and related technologies.