Learning the Alphabet : RankBrain and AI

Google’s Hummingbird algorithm recently got what seems to be a natural extension to its functionality, almost uncomfortably literally personified in a machine learning system dubbed RankBrain.

The system appears to rely on thought vectors and ability to make its own connection between entities based on huge sets of analyzed data. Quite typically for Google, all of the first hand information on what exactly the system does is coyly whispered through an enigmatic Sphinx smile, leaving us to try to decrypt and connect the shards of actual facts we have been half-presented with.

How RankBrain complements Hummingbird

According to Danny Sullivan of Search Engine Land, who had an email correspondence with Google on the topic, RankBrain is meant to help the algorithm better understand the intent behind people’s queries and deliver results accordingly. The confusing thing about this is how exactly RankBrain differs from what Hummingbird has already been doing.

For instance, Hummingbird heavily relies on context and implicit signals when trying to discern the user’s intent. It also uses co-occurrence as one of the ways to determine which of the possible meanings of a word is relevant in any given query. Combined with entity based approach and databases of triplets, this enables the algorithm to make the necessary connections and suggest the most relevant results.

However, the problem with this approach is that it greatly relies on human input. Namely, semantic markup is provided by people, as are sets of triplets, lists of synonyms and so on. The idea behind RankBrain is to eliminate (or, more realistically, reduce) the need for human-provided input and allow the algorithm to expand its database on its own.

Given that the search engine processes three billion searches per day, 15% of which have never been encountered before, automating the data gathering needed to “understand” these searches seems to be the only way to keep pace with the demand.

So far, RankBrain has been doing its learning offline by analyzing huge sets of historical queries, results provided and the reactions of people making those queries. Relying on Word2vec, it goes through vast databases of content it has been provided with, translating words into vectors, which enables it to create connections between entities and enhance its understanding of user’s intent. Of course, thanks to Google’s traditionally cryptic and laconic nature a lot of this is still pure speculation.

The impact so far

RankBrain has been slowly rolling out since the early 2015 and has impacted a large percentage of queries made since then, according to Google. That’s as specific as they’re willing to get.

To make things even more confusing, Google not only avoids referring to RankBrain as an algorithm update, claiming it’s a ranking signal instead, but also that it has become the third most influential search ranking signal. This statement comes directly from Greg Corrado, a senior research scientist at Google, who suggests this has happened in the few months that RankBrain has been active. Naturally, without knowing exactly what the first two signals are, this statement seems gratuitously specific. For what it’s worth, Corrado likens turning this feature off to “forgetting to serve half the pages on Wikipedia” in terms of how damaging it would be for the users. You can read more about RankBrain here.