RankBrain: A Study to Measure Its Impact

Study: Does RankBrain Actually Improve Search Results?

Google made a big splash in October 2015 when they announced the existence of a new ranking algorithm that they call RankBrain. The news broke in this article in Bloomberg. Google made some very limited comments about what it is, and has had little to say since then.

For that reason, we set out to do a study to see what impact RankBrain has really had, and to try and learn more about how it works. Note that we can’t prove that all of what we found was a result of RankBrain, but believe that at least some of the changes documented below are.

Understanding Machine Learning

RankBrain is a machine learning algorithm that learns over time the different ways humans express themselves. It then pre-processes Google queries, translating more difficult to understand ones into a form that the regular Google search algorithm can understand.

There has been a lot of misunderstanding about RankBrain, including a number of wildly speculative articles that assume RankBrain affects search rankings (perhaps an unfortunate consequence of its name), and/or that it will eventually take over the Google search algorithm, eliminating all other signals including links.

Google Webmaster Search Analyst Gary Illyes had this to say on Twitter:

Lemme try one last time: Rankbrain lets us understand queries better. No affect on crawling nor indexing or replace anything in ranking

There are many other uses for machine learning, but a couple of examples can help you understand some of the types of things that machine learning can do. One such use is in Google News, as you can see here:

In the above screenshot, the part I circled in red is where they show directly related stories. Google uses an “unsupervised machine learning” algorithm to find those related stories. Basically, the algorithm is able to detect a high degree of similarity, and based on that, it knows that these other two articles are on the same topic.

The second example is one that I learned about when I interviewed Google’s Peter Norvig way back in 2011. He shared with me the story about how they build Google Translate.

Basically, they initially tried to build the product using a more manual approach, but that did not end up working out well. Some of the problems included that most languages have many exceptions, so a rules-based approach was very problematic, and in addition, language is continually evolving.

Instead, they used a machine-learning approach that is much more dynamic, and can handle much more complicated types of problems, such as translations between languages. Instead, they leveraged the millions of examples of real-world translations to build the product.

Some Very Basic Language Processing Concepts

What’s interesting about that dialogue with Peter Norvig is that it contains some insight into the problem with the way language processing has been done in the traditional Google algorithms. What it came down to is that keeping up with the rules in translating languages was just too complex. Turns out, this is the case in Google’s traditional query processing.

For example, consider the example of stop words. These are “some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely.” In other words, when Google encountered a word like “the” in a query or on a web page, they simply ignored it.

This seems like a good rule, as “the” just does not seem that important to the content of a sentence. However, consider the query “The Office”:

As you can see, this query can be meant to ask about the TV show. Historically, this is an example of something that would require a manual exception rule to address. Since the show first aired in 2005, the rule was not needed before then, but suddenly a need would have come up as soon as the series started. A more recent example would be the new app “Fixed” which was just funded in this season of Shark Tank.

An algorithm like RankBrain should be able to see the relationships automatically, without requiring any manual adjustment. It would be able to do that by making observations similar to these:

Sometimes the phrase is shown in the middle of a sentence as “The Office” (both words capitalized, which is not a normal use case for these words

Sometimes the phrase is used in conjunction with words such as “TV,” “show time,” “episode.”

These are just a couple of obvious examples of patterns that could be noticed. Another interesting query to consider is “coach”:

When I first hear this word, I tend to think about a sports coach by default. However, some of the time it might mean this:

For this, a machine-learning algorithm might notice its use in the middle of a sentence as “Coach,” or its use in proximity to “bags,” “handbags,” “leather,” “women’s fashion,” etc.

This is where RankBrain comes into play. One of the notable quotes from the video included in the Bloomberg article was: “(Rankbrain) interprets language, interprets your queries, in a way that has some of the gut feeling and guessability of people.” In loose terms, it has a more dynamic ability to adapt to changing circumstances of how language evolves over time.

Dialogues I Had With a Google Spokesperson

Shortly after the news broke, I had a bit of a dialogue with someone inside Google. Here is what transpired:

Eric: Can you let me know if there is a near term plan to expand the use of RankBrain? I.e. in the Bloomberg article, it seems that you indicate that it’s being used in a “very large fraction.” Is the intention to increase that very large faction in the near term?

Google Spokesperson: We don’t have much more specific to share, but we’ll keep testing new machine-learning models and approaches as we go, and when we get improvements in search quality we’ll carefully roll them out. (These sorts of signals usually aren’t restricted to a specific portion of queries; it’s more that the effects are noticeable more for some queries than others.)

Eric: The example in the Bloomberg article (the predator query) was quite interesting, as it seemed to capture the notion of a query where it’s hard to determine the intent. It’s actually hard for humans to parse that one.

There was also the whole discussion of queries that Google has never seen before.

This seems to suggest to me that RankBrain is adding capabilities in parsing natural language queries, and in particular those that are longer and more complex in formulation.

Eric: Right, parsing is probably not the right word. More like having a better understanding of the overall intricacies and relationships in language, probably based on deep learning from analysis of its use across the web?

Google Spokesperson: Yeah, being able to represent strings of text in very high-dimensional space and “see” how they relate to one another.

What is High Dimensional Space?

In principle, imagine that you analyze all the English on the entire web (note that RankBrain is already operating in all languages). You start by taking all of the known words and converting them into a numerical index. So perhaps the word “Office” is assigned the number 345,675, and the word “office” is assigned 345,674. This step is taken for ease of processing purposes.

Then you start looking at and finding out what relationships these words have with other words across the web. You might consider things like these:

Note that the above graphic is a major simplification of the level at which this happens. The types of relationships that can be determined this way can be quite complex, as they need to be able to detect scenarios such as a famous female coach, who is often addressed as “Coach” and her going to a party with a leather handbag from the company Coach, and an article about her making a fashion statement.

Example RankBrain Queries Provided by Google

I have heard of two so far. One of these is from the original Bloomberg article:

Notice that I have added in purple some notes that show the way that the query might be more normally asked. Here is one that I learned from Gary Illyes in the recent Virtual Keynote that I did with him:

Gary had this to say about the query:

Our old query parsers actually ignored the ‘without’ part. RankBrain did an amazing job of catching that and instructing our retrieval systems to get the right results.

Our Study on RankBrain

Did RankBrain actually improve the quality of search results? Did it fulfill its mission to return better results for types of queries formerly difficult for Google’s search algorithm to handle?

At Stone Temple Consulting, we maintain a database of 1.4M query results as a result of the studies we have done on Google’s rich answers. As part of this, we keep a full snapshot of the results.

As luck would have it, we took a snapshot in late June/early July just as Google began to roll out RankBrain (the “Baseline Set”). We went through the query set to determine if we could find some queries that Google didn’t understand in the Baseline set that they appear to understand today.

After reviewing all of these queries, we found 163 queries that fit the following criteria:

The search results shown indicated that Google didn’t understand the query in the Baseline Set

There is, in fact, a reasonable set of results that Google should be able to find for the query

This latter point is an important one, as it’s not reasonable to ding Google for not understanding a query for which there is no decent result. Consider this example:

The query is not well put together by the user, so it’s hard to get a great answer for this one. In addition, we found queries where the user question was really easy to understand, but for which there is actually no great result to be found as far as we could determine. We also excluded those from the study.

The Results!

So here is what we found in aggregate:

Of the queries we found where Google didn’t understand them in the Baseline Set, they improved results 54.6% of the time. That’s a very strong score.

In the Baseline Set we got results with PDF files about why the Iraqi resistance to the coalition invasion was so weak. Clearly, not a fit. Google now has gotten the idea that “weak” probably relates to security, and shows a much better result in the number one position.

I also broke down the results into categories as follows:

This brings up some questions:

Is Google using RankBrain to impact selection of featured snippet results?

Could RankBrain trigger the delivery of a map where none was shown before?

Is it possible that the main impact of a given query would be an improved search results snippet?

These are all scenarios that we saw in the results I reviewed. My bet is that it does. Look back to the quote from Gary Illyes above: “… and instructing our retrieval systems to get the right results.” That sounds to me like that would feed into any of the Google algorithms for retrieving results.

Last, but not least, let’s look at some language specifics. Here are some of the categories of items we saw Google improve on:

The improvements we saw may, or may not, have been due to RankBrain. It’s possible that other algorithm changes could have driven some of the improvements. Nonetheless, I feel comfortable saying that at least some of the changes we saw were RankBrain related.

Summary and Impact on SEO

Predictably, one of the most common questions I get asked is how RankBrain will impact SEO. Truth be told, at the moment, there is not much impact at all. RankBrain will simply do a better job of matching user queries with your web pages, so you’d arguably be less dependent on having all the words from the user query on your page.

In addition, you still need to do keyword research so that you can understand how to target a page to a major topic area (and what that major topic area is). Understanding the preferred language of most users will always make sense, whether or not search engines exist. If you haven’t already (hopefully you have!), you can increase your emphasis on using truly natural language on your web pages.

The real impacts of RankBrain are:

An increase in overall search quality.

An increase in Google’s confidence that they can use machine-learning algorithms within the core search algo, which has already likely led to more such projects being launched.

Infographic summarizing this study:

Share this Image On Your Site

To embed the infographic on your site, copy the code in the box below and paste it into your HTML text editor.

Comments

It’s true that RankBrain has improved search results. It’s amazing how far along search engines have come since the early days, when they relied almost exclusively on meta information! The level of sophistication that a searcher is able to use when making a query would have been nearly unthinkable back then. Still, when it comes to SEO, the basics of a good SEO plan still apply. In fact, it’s now even more important than ever to add a lot of useful and informative content to our websites.

I couldn’t agree more, Nick, about the importance of creating informative content. The great thing about the ever-evolving sophistication of search engines is that it makes it much easier for writers to craft interesting content without having to “worry” about keyword usage the way we did in the old days. It’s fantastic to watch the progress search engines are making to better understand semantic language.

Ty, as Eric said in the conclusion to the post, there is really nothing you can do to optimize for RankBrain, because RankBrain has nothing to do with what is on your site. It never looks at your site. RankBrain acts as a kind of translator between difficult to understand or ambiguous search queries and the regular search algorithm. RankBrain tries to figure out what someone meant by an out-of-the-ordinary query, and then translates it into a query that the regular algo can understand.

Rankbrain helps Google to better understand ambiguous search queries, because of this improved understanding Google can now serve up search results that better match the meaning/intent of the searcher than before Rankbrain was introduced.

Because in practice this may lead to landing pages with good relevant content being ranked higher than before you could say that Rankbrain influences rankings without it being a ranking factor itself.

As if Google’s results weren’t good enough already, here they are stepping up their game even further. I don’t think Bing even has a chance at this point. Bing returns pretty much the same results that Google does for any sort of basic query, but try giving it something complicated or vague and the difference is like night an day.

Rankbrain has evolved since its inception. However, as a blogger and IM, I am very much afraid of it. So much of information in the Google search itself. Users don’t even have to go to the website to get certain information.

Google is steeping up its game and its domination in the search engine world.

The issue of Google answering questions directly relates to two major areas:

1. The Knowledge Graph where they directly answer questions such as: “how many quarts in a gallon”, or: “what is the capital of Massachusetts”. These are public domain facts that they are free to answer without obligations to anyone else.

2. Featured snippets, which is where the pull in data from 3rd party web sites and show it in the results. In our experience, these all provide a link with credit back to the publishing domain.

The query example “what is low in the army” is a great example of why RankBrain was needed, and why I often describe it as a natural extension to Hummingbird.

You assumed that the query might be about low rank, but I actually suspect someone heard a military acronym and wanted to know what it meant. LOW is Law of War, but they may have even misheard LO pronounced as low. RankBrain, currently, still does a very poor job with this, but it is still very young and learning. It might take it far longer to be able to think of checking for acronyms than to think of misspellings or simply unusual word order usage.

First and foremost, learn how to market and sell. It may sound obvious, but those are two vital skills to which people can devote their entire profession, and skill levels in those areas can vary hugely. You need to be at least competent in both, and the better you are in each, the better your SEO will fare. It is your version of RankBrain – better understanding the user, their motives, and what things actually persuade them to buy (or not).

Secondly, learn all the languages and protocols that your chosen platforms will be based on. So, you need to know HTML in depth, CSS, Javascript, and quite probably PHP, you need to know the HTTP protocol in depth, and so forth. You need these because you have to spot where things the professional developers did for ease can be improved and made better.

Third, I’d honestly place something like psychology. The better you understand people, the more effectively you can understand their motives, true intentions, and even the unspoken biases and instincts that come into play and can massively affect success (Recall Bias for one example of many).

Excellent advice, as always, Ammon. Parmod, some of the things that Ammon suggests, like the technical skills may be things that you need help on. If you can’t do these things yourself, then get help from someone who can.

Now that the dust seems to have settled about Rankbrain not being a ranking factor in itself but rather an algorithm that helps Google better understand a search query with which it subsequently can then choose the most appropriate algos in its arsenal to come up with the best matching answer I still am wondering about Rankbrain influencing rankings (sorry for the long sentence).

In your results examples are shown where Google completely did not understand the query before Rankbrain, e.g. the “why are pdfs so weak” query. Clearly two completely different sets of results were served before and after Rankbrain and it would make no sense whatsoever to suggest that Rankbrain has improved rankings for a certain landing page here.

But how about the situation where Rankbrain helps to better understand certain queries than before? So not just going from not understanding at all to understanding, but in fact going from “understanding to some extent” to “much better understanding”. In the data collected in your study where there any examples of the same landing pages served as search engine results to the same query both before and after Rankbrain. If so, did positions for these landing pages improve, or decrease?

Hi Richard, I don’t think that RankBrain is connected with personal search. My understanding is that it helps it better understand your search queries through language analysis, and that’s in a way that’s independent of who you are.

As for personalization, that’s something that I understand that Google ties to your being logged in. However, that doesn’t mean that Google does not use your IP address. For example, they will use your IP address to show you results near you, even if you’re not logged in. Just to clarify, Google doesn’t consider localization to be the same as personalization.