Think Like a Search Engineer

Most search marketers are used to looking at things from a single perspective: that of a search marketer. Some of the more savvy marketers know enough to look at things from the perspective of end users as well, since those are the people they are ultimately trying to influence. The savviest of search marketers know that it’s also important to step back from time to time and try to think like a search engine engineer.

By stepping into the shoes of the men and women who spend their days developing ways to improve search results, we can gain a unique perspective on the world of search engine marketing. To do that, we are going to examine what search engines are trying to do, consider their goals, and look at how those goals affect their interaction with the webmaster and SEO community.

First, a disclaimer: None of this represents the official position of any search engine. This is my interpretation of what a search engineer is thinking, based on what I have seen out there, and based on my discussions with a variety of search engine engineers over the years.

I have found adopting this way of thinking to be an extremely effective technique in reviewing a Web site strategy. I don’t have to agree with the thinking of the search engine engineer, but understanding it makes me better equipped to succeed in a world where they define the rules.

So we can more completely adopt the mindset, this article will be written in the first person, as if I am the search engine engineer.

The Basic Task of a Search Engine

Our goal is to build a search engine that returns the most relevant results to searchers. To do this, we need to have a comprehensive index that is as spam-free as possible. We also need to create ranking algorithms which are able to determine the value to searchers of a given site in relation to their query.

We build our index through these 4 simple steps:

Crawl the entire Web

Analyze the content of every crawled page

Build a connectivity map for the entire Web

Process this data so that we can respond to arbitrary user queries with the best answers from the Web in less than 1 second.

OK, so I am being a bit facetious when I called it simple. But if you had set upon this task yourself, you would need a sense of humor too. In fact, to accomplish this task we have had to build and manage the largest server farms the world has ever seen.

Search Engine Quality Goals

Like all businesses, we want to make money. The great majority of our money is made by selling ads within our search results and the rest of our ad network. However, we can’t make money on our ads if users don’t use our search engine to search.

So for search engines, relevant search results are king. In simple terms, if our search engine provides the best answers to users, those users will continue to come to us to search. And we’ll make money by serving them ads. So we try to gain more users by providing the best search results they can find, hoping all new users that come online will search with us and continue to use our search engine for the rest of their lives.

Of course, we want to steal market share from other search engines too, so we can add even more users, and serve even more ads. That’s a bit more tricky, since users are often set in their ways. While I have never seen a study on this, there is no doubt in my mind that getting a user to switch from another search engine to ours requires more than marginally better results. I would guess that most users would not switch their default search engine unless the results were consistently better by 20- to 30-percent, or more.

Making matters more complicated in our quest for providing the best results is that a large percentage of user queries require disambiguation. By that, I mean the query itself does not provide enough information for us to understand what the user is looking for. For example, when a user searches on “Ford”, they may be searching for corporate information on the Ford Motor Company, performance details of the latest Ford Mustang, the location of a local Ford dealer, or information about ex-President Gerald Ford. It is difficult for us to discern the user’s intent.

We deal with this by offering varied answers in the top 10 results, to try and provide the user the answer they want in the top few results. For our Ford example above, we include in the top 10 results information on the Ford Motor Company and its vehicles, as well as on Gerald Ford.

We also implement new programs to provide disambiguation. To see examples of this, try searching on “Cancer” in Google and notice the “Refine results for cancer” links, or try searching on “Beatles” on Ask.com, and see how they have formulated their Smart Answer with links to music, images, products, and a drop-down box listing each of the four band members.

To summarize, providing the best answers leads to increased market share. More searches means more clicks on our ads, which means more revenue and profit. And it all flows from having the highest quality (including the best disambiguation) in our search results.

Modeling Webmaster Behavior on the Web

The best ranking algorithms we can use depend on models of Webmaster behavior on the network, where the Webmasters are not cognizant of the effect that their behavior has on the search engines. As soon as Webmasters become aware of the search engines, the model starts to break. This is the source of the famous “design sites for users, not search engines” stance that you have heard us talk about.

The Webmasters who do not follow this policy range from those who are black hat spam artists that will try any trick to improve their rankings, to those who bend the rules gently. All of this behavior makes it more difficult for us to improve our index, and to increase our market share.

Links as a Voting System

As an example of this, let’s talk a bit about how links play a role in building our index. We use inbound links as a major component of evaluating how to rank sites in response to a particular user query.

If the search is for the term “blue widgets,” then we evaluate the number and quality of relevant links that each page in the index has pointing to it. While there are over a hundred other factors, you can oversimplify this and say that the page with the best mix of relevant (to the query), quality links to it wins.

However, this concept is very fragile. It is heavily dependent on the person providing the link doing so because they really like the quality of the content that they are linking to. Fundamentally, the value of this algorithm for ranking content is based on observations about natural Webmaster behavior on the Web — natural behavior in a world without search engines.

As soon as you compensate someone for a link (with cash, or a returned link exchanged for barter reasons only), you break the model. It doesn’t mean that all these links are bad, or evil; it means that we can’t evaluate their real merit. We are slaves to this fact, and can’t change it. This leads to the stance we take against link purchasing, and the corresponding debate that we have with Webmasters who buy links.

The Role of Trust

Unfortunately, Webmasters are aware of search engines — another fact that we can’t change. This means we have had to adapt our model, to reflect the behavior in this new world where they are aware of us. Making this adjustment improves our results, but it’s inherently harder to do.

One thing we do is try to evaluate how much we can trust a site. There are many tactics that fall into this category:

Value a site based on its longevity.

Catalog sites known to sell links, and discredit all of their outbound links, since they can’t be trusted.

Don’t credit a new inbound link to a site until it is many months old.

Once a site has received links from a number of highly trusted sites, give it extra credit in its rankings.

These methods are not perfect, but they go a long way in helping us to simplify the process of determining which sites can or cannot be trusted. This is just a sampling of things we can do to evaluate trust.

The Role of FUD

All probabilistic models work best when the subjects being evaluated are not aware that they are being evaluated. However, as we have already noted, we do not have that luxury. So the next-best thing is to make it difficult for the Webmaster to understand the nature of the algorithms used. Doing this still provides a certain amount of randomness, the foundation of all probabilistic models.

This is one big reason why we don’t publish lots of clear guidelines about how our algorithms work. A little bit of FUD (fear, uncertainty and doubt) improves overall quality. For example, you will never see anything that clearly defines how we identify a paid link.

This sounds a bit nasty, but we don’t mean it to be so. Once again, we are just trying to provide the best possible search results for our end users, and this approach helps us do that.

But Don’t I Need to Design Sites for Search Engines?

Our public stance is that you should design your site for users, and not search engines. Yet there are many technical elements you really need to incorporate in your design to make it easier for us to understand what your site is about. Sounds like a contradiction, doesn’t it?

To some degree it is, so let me clarify what we really mean. It would be better stated as: “Help me find your content on your pages, but other than that, design your site for users and not for search engines.” Of course, that just doesn’t roll off the tongue like a simple hard-and-fast rule should, does it?

Here are some examples of how we rely on Webmasters clues to index a page; and ways that you should design with us in mind:

Tell us what content on your site is important. If it’s 4 clicks from the home page of the site, how important can it be?

Give us other clues about your content, such as putting the most important words in your headers and page titles, as well as linking to relevant content on other sites.

Use page coding techniques that are easy for us to read. For example, please do not implement all your pages completely in Javascript or Ajax.

Don’t use techniques that are (or have been) really popular with spammers. One example of this is cloaking. Once upon a time, this was a favorite black hat tactic. While there are legitimate uses for cloaking, it’s hard for us to algorithmically distinguish between legitimate and illegitimate uses, so it’s likely we’ll err on the side of caution and say it’s bad. Please do not use this technique; we can’t guarantee the results you will get.

Since we can’t tell people to not sell traffic, and hence sell links, please use the “nofollow” attribute on the links you purchase. I know, this is a highly controversial idea, this one. We want you to do this, because the quality of our search results improves if you do.

Don’t duplicate your content. We want to present a particular piece of content only once in our search results.

These are just a few examples of how your behavior will help us do a better job of determining the value of your site. If you make it easier for us to index your site, we will in turn make it easier for searchers to find your site when they’re searching for a relevant query. That’s the deal.

Summary

So now let me jump back out of our fictitious search engine engineer’s mind, and explain why this is all useful. Simply put, it’s always useful to understand the goals and aspirations of the dominant business industries in your space. Without a doubt, when search engines roll over, there are lots of casualties.

You don’t have to endorse their mind set, just understand it. You should seek to protect yourself from becoming an incidental casualty. Have an idea of how the search engines think, and use this knowledge to evaluate new search engine marketing strategies.

This is the basic premise many grey hat SEOs follow: they strive to understand how black hat SEOs work, and how far the search engines can be pushed, so they can in turn go back to their clients and provide the best service possible. This way, they are able to stay clearly within the search engines’ guidelines, without missing any opportunities, or at least to know and understand the risks and rewards involved when they decide to venture outside those guidelines.

Understanding the way search engineers think can help you decide whether or not that new idea is worth trying. Is it at odds with the goals of the search engine? Does it help the search engine understand your site better? Knowing when you are taking risks, or making the decision to avoid them, can help scale your search engine marketing strategy to new heights.