A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. Theres likely a lot more to how Googles RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that http://bit.ly/1W6QSXM one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.

The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:

For example, learning that restaurants is a good synonym for food in the query [food in San Francisco] is relatively straightforward, because the volume of query traffic including the query term San Francisco is very large. For much smaller cities, such as Grey Bull, Wyo., the query stream may have never seen any supporting evidence for this synonym substitution.

That both cities are entities that fit into the same category, that of Cities means that they could potentially be good synonyms for each other. Thats what the inventors of this patent tell us specifically, using the San SEO Expert Francisco and Grey Bull example:

For example, if San Francisco and Grey Bull are both cities, and restaurants is a good synonym for food in queries about San Francisco, the synonym relationship may apply to queries related to Grey Bull as well. Thus, the category city may be considered a useful category when identifying synonyms for query expansion in circumstances such as this.

So, we are told that the process involved in this patent is to identify categories from a knowledge base involving a number of entities where other entities within that same category could potentially be synonyms for each other in similar contexts. The process from the patent involves identifying those entities from a query stream, and identifying the category as one that they call a coherent category.

The patent tells us that a coherent category is one in which a certain threshold of terms tend to co-occur in a query stream involving those entities. The patent tells us, for instance that a category that might include entities that are cities, villages, and towns might see a lot of co-occurring terms involving hotels and roads. If the number of co-occurring terms appearing in that query stream meet a certain threshold, it would be considered a coherent category, and the entities from the same categories could possibly then be used as synonyms for each other.

The patent in question is:

Synonym identification based on categorical contexts

Invented by: Zachary A.Garrett, Takahiro Nakajima, Tasuku Oonishi

Assignee: Google

US Patent 9,201,945

Granted December 1, 2015

Filed: March 8, 2013

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.

Take Aways

When I wrote about the query term substitution patent I refer to at the start of this post, I included a number of examples of queries that were re-written based upon some substitutions of query terms that might seem reasonable to a search engine looking at words that tended to show up, or co-occur, in a query stream involving those search terms.

For instance, someone searching for [New York Yankees stadium] was likely searching for results that involved baseball since queries that included New York Yankees and stadium also often included the term baseball.

That patent didnt use the term co-occur nor did it explain how a knowledge base might be used to substitute entities that might be in the same categories like this one does, but the idea that a shared context like entity categories can be used to trigger entity substitutions in a query is interesting.

Its worth spending time with both patents and reading through each of them multiple times and thinking about how they are being used.