A couple of months ago, I wrote about a Google patent that involved rewriting queries, titled Investigating Google RankBrain and Query Term Substitutions. There’s likely a lot more to how Google’s RankBrain approach works, but I came across a patent that seems to be related to the patent I wrote about in that post, and thought it was worth sharing and starting a discussion about. The patent I wrote about in that post was Using concepts as contexts for query term substitutions. The title for this new patent was very similar to that one (Synonym identification based on categorical contexts), and the more recent patent was granted on December 1st of this year.

The new patent starts off describing a scenario that is a good example of how it works. The inventors tell us:

For example, learning that “restaurants” is a good synonym for “food” in the query [food in San Francisco] is relatively straightforward, because the volume of query traffic including the query term “San Francisco” is very large. For much smaller cities, such as Grey Bull, Wyo., the query stream may have never seen any supporting evidence for this synonym substitution.

That both cities are entities that fit into the same category, that of “Cities” means that they could potentially be good synonyms for each other. That’s what the inventors of this patent tell us specifically, using the San Francisco and Grey Bull example:

For example, if “San Francisco” and “Grey Bull” are both cities, and “restaurants” is a good synonym for “food” in queries about San Francisco, the synonym relationship may apply to queries related to “Grey Bull” as well. Thus, the category “city” may be considered a useful category when identifying synonyms for query expansion in circumstances such as this.

So, we are told that the process involved in this patent is to identify categories from a knowledge base involving a number of entities where other entities within that same category could potentially be synonyms for each other in similar contexts. The process from the patent involves identifying those entities from a query stream, and identifying the category as one that they call a “coherent” category.

The patent tells us that a coherent category is one in which a certain threshold of terms tend to co-occur in a query stream involving those entities. The patent tells us, for instance that a category that might include entities that are cities, villages, and towns might see a lot of co-occurring terms involving hotels and roads. If the number of co-occurring terms appearing in that query stream meet a certain threshold, it would be considered a coherent category, and the entities from the same categories could possibly then be used as synonyms for each other.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training recognition canonical representations corresponding to named-entity phrases in a second natural language based on translating a set of allowable expressions with canonical representations from a first natural language, which may be generated by expanding a context-free grammar for the allowable expressions for the first natural language.

Take Aways

When I wrote about the query term substitution patent I refer to at the start of this post, I included a number of examples of queries that were re-written based upon some substitutions of query terms that might seem reasonable to a search engine looking at words that tended to show up, or co-occur, in a query stream involving those search terms.

For instance, someone searching for [New York Yankees stadium] was likely searching for results that involved “baseball” since queries that included “New York Yankees” and “stadium” also often included the term “baseball.”

That patent didn’t use the term “co-occur” nor did it explain how a knowledge base might be used to substitute entities that might be in the same categories like this one does, but the idea that a shared context like entity categories can be used to trigger entity substitutions in a query is interesting.

It’s worth spending time with both patents and reading through each of them multiple times and thinking about how they are being used.

Hi Bill <
I guees I read your recent blog about which you mention in this blog Investigating Google RankBrain and Query Term Substitutions.
Bill I would like to know if Google changed the way searching user queries again.

I would like to encourage all of you really like your blog. Did you design this website yourself or did you hire someone to do it for you? Please reply as I’m looking to create my own blog and would like to find out where u got this from. Many thanks bookmark this page to your most used service to help get the word out.

The theme I used is one of the default WordPress themes, though I made a number of tweaks to the CSS of the theme, and searched through the Library of Congress Website for the Japanese prints that I used as rotating masthead images on the site. There are a lot of great images there that are old enough to be in the Public Domain, and I’ll probably make more changes as time passes.

A big thank you for this article. It was a little bit difficult to translate for a french guy but I think I understood and learned a lot by reading this. First in english and hopefully on my new SEO job. Hope my english was not to bad.

The best thing which I really like about your articles is, you covers each and every thing in your articles which makes your article more helpful.
I have seen people love to read those articles more which are easy to understand and can help a lot. And you always write such kind of articles.

I would also like to suggest you one thing. You should try to keep your paragraphs short so that people can’t scare before reading that paragraphs.
Short and cute paragraphs increase interest of the reader to read the complete article.

I hope you wouldn’t mind my suggestion.
Either way, Thanks for this wonderful article.

Good God! Nearly every day it seems I learn of more ways we are spied on! Thank you, Martin, for bringing this to my attention.

Other companies doing this besides Silverpush appear to be Adobe, Drawbridge, Flurry (purchased by Yahoo last year), and Tapad.

The only defenses against this for now seem to be, as you said, muting the microphone, or putting enough physical distance between devices so that audible signals cannot be picked up by the microphones.

Silverpush was uncovered by the FTC as being involved in this. I found some other companies doing this type of stuff other than the ones you mention, doing things like combining the audio watermarking with cookies, and using similar tracking. These activities aren’t mentioned much on most SEO sites, and I felt I had to publish something when I say Google publish those patent filings.

Hi Bill Slawski,
It was a great experience to read your useful post.
It cleared my all doubts that I had before reading your most useful & precious post on what is google doing for the betterment of the search engine.
Thanks!!!!

I am not sure how google can patent the English language. They haven’t invented synonyms. This is just adding some sophistication to an otherwise rudimentary search. It will be interesting to see the weighting of an exact match vs synonym match.

This is a topic near and dear to my heart. When searching for “braces” there are two very different meanings. While orthodontists put on braces there are also neck braces and back braces. Interestingly, when you search “braces” in my city the organic search results are for metal braces for teeth while the images are neck braces.

It is interesting that Google chooses one type of braces for organic search and a different type for image search. You have me wondering from the way you stated that if those results are similar in my location and in other locations. I am seeing braces for teeth in organic results and for images here in San Diego.

Hello Bill Slawski, This is my first time i visit here. I found so many useful article in your blog especially this discussion. From the lot of comments on your posts, I guess I am not the only one having all the pleasure here. Thanks