Avoiding Misinformation While Learning from Search Related Patents

On May 1st, Google’s Head of Webspam Matt Cutts published a video in his series of Google Webmaster Help videos, answering the question, “What’s the latest SEO misconception that you would like to put to rest?”

For some reason, Matt decided to focus upon patents, with a video about people possibly placing too much faith in what is uncovered in patents related to search engines. To a degree, I agree with his response, but I was reached out to by a number of people who saw the video as something aimed specifically at me, since I write about search related patents so often. I felt that I had no choice but to respond. Here’s the video from Matt:

One thing that is very important to note here as well, is that while the version of the patent I wrote about was filed in 2010, the original version of the patent was filed in 2005, and it’s very much possible that even if Google might have considered implementing what is described within the patent filing, that it’s something they may have never decided to use, or tried and quite possibly replaced with a different approach.

One thing that caught many eyes at the time was that one of the named inventors on the patent was Google’s Head of Webspam, Matt Cutts, who was well known in the community for his interactions with forum members on behalf of Google, and his participation in conferences and with the press. (Actually the whole roster of inventors listed on the patent is like an all star team of search engineers.)

Another was that it said things like the amount of time a domain name was registered might be an indication of whether or not it was intended to be a spam site – with spammers usually only registering a site for a year, and people more “serious” about their businesses registering their sites for longer.

Matt went on to rebuff that assertion more than once since the patent was published, but hosting businesses such as GoDaddy caught wind of it, and used the FUD (fear, uncertainty, and doubt) behind the patent as a selling point to try to get people to register their domains for longer than a year. Regardless of whether it was true or not, they saw the possibility of using the information within the patent as a path to more profits.

Learning From Patents

One of the main reason why there are patents is to enable people to learn from them. As the USPTO notes about patents:

A patent is an intellectual property right granted by the Government of the United States of America to an inventor “to exclude others from making, using, offering for sale, or selling the invention throughout the United States or importing the invention into the United States” for a limited time in exchange for public disclosure of the invention* when the patent is granted.

Patents give the owners a right to prevent others from carrying out the invention (manufacturing or marketing) but not from learning from the invention.

The granting of a patent to Google can give the company the power to exclude others from following processes described in a patent, but one of the tradeoffs of that protection is that the patents must be published for the world to see, and learn from.

Google might not currently be using a process described in a patent, but they may have in the past, or might in the future. Regardless, don’t take the existence of a patent as gospel, but also don’t automatically tune patents out. They provide a chance to learn about assumptions from search engineers about search, searchers, search engines, and the Web. We can learn from them about research directions from Google and Microsoft and Bing and others, and what they found valuable enough to protect as intellectual property.

One of the first steps that I often take when learning about an acquisition made by Google or Microsoft or Yahoo or even Facebook or Twitter is to look at the USPTO assignment database, and see what patent filings might have been help by the company being acquired. Those assignments might not tell the whole story behind an acquisition, but there are often hints there. For example, see my recent post, With Wavii, Did Google Acquire the Future of Web Search?

Predicted results may even be presented before a searcher finishes typing and before they possibly select one of the predicted queries. As the patent application notes, that would make the search engine seem very responsive.

We didn’t start actually seeing instant results like that until five years later.

Google has also discontinued products before the patent behind that product was granted, such as the one for the Google Directory.

I’ve seen patent filings for things such as Google personalized results and Google Universal results before those were implemented, though usually not descriptive enough to let others know how to build those for themselves. The same with patents that provide changes to the way that something is displayed by Google. For example, how Google may sometimes assume that a query might be a request for a site search when the query includes an entity that Google has associated with a particular site – do a search for [spaceneedle hours], and you may see the first 8 results come from spaceneedle.com.

Sometimes it can be almost impossible to tell if a process described in a patent filing was or is implemented, especially if the process described in the patent is one that might impact rankings or results but doesn’t leave much of a visible footprint that it was involved in those results.

Many of Google’s patent filings involving Local Search seem to be very descriptive of how Google’s local search has developed over time as well.

Matt is right – don’t take the existence of a patent as present day proof that Google is actively doing what is described within the patent. But don’t ignore what you can learn from the patent, especially if it raises a lot of questions that you can explore and experiment with, and use to help understand the search engine better.

That patent that Matt mentioned that included a piece about the length of domain name registration discussed a large number of issues that Google might pay attention to, including how fresh pages in search results might be, what the implications might be if the anchor text to certain pages starts changing over time, and other signals involving identifying web spam and stale content online. It’s still worth exploring those topics, regardless of whether Google has implemented them.

As Matt noted, don’t take it as “a golden truth” that just because Google has a patent for something, that they are doing that in that particular period of time.

But don’t discount the value of patents to provide some insights into what the search engine might be doing, or had possibly been doing in the past, or to provide questions and ideas to explore.﻿

It’s not such a subtle hint when Google starts filing lots of continuation and related patents on certain topics either. For instance, The Agent Rank patent has had 2 continuation patents filed on it already, which tells us that it may just have some continuing value.

The Google “information retrieval through historical data” patent has had more than a dozen continuation patents filed in its wake, and while “length of domain registration” was one of many items it originally covered, that one isn’t listed in the claims of any of the continuation patents, but other things are, and some of them more than once.

Google’s Phrase-Based indexing patent has spawned three generations of continuation patents and related patents. It might not be implemented, but there are a bunch of people who have worked on patents that show a significant amount of detail in how they would implement different aspects of it. ﻿

It’s probably worth including the names of, and some links to some of the patent filings that Matt Cutts has been involved in at Google.

I like looking at patents and whitepapers and other primary sources from search engines to help me in my practice of SEO. I’ve been writing about them for more than 5 years now, and am putting together this series of the 10 Most important SEO patents to share some of what I’ve learned during that time. These aren’t patents about SEO, but rather ones that I would recommend to anyone interested in learning more about SEO by looking at patents from sources like Google or Microsoft or Yahoo.

There are others who write a fair amount about search related patents and white papers, and I wanted include them in this post as well.

Related

Reader Interactions

Comments

I found your blog through huomah. David Harry talks and respects a lot about your thoughts and discoveries of patents. Its been 4 years now, reading your blog, visiting your archives, re reading the blog posts.

I pretty much understood how search engine works behind the scenes.

One thing I didn’t understood is understanding the language of patent. If you have time please write a post on how to read the patent. Or any resource to understand the language please let me know…

Though the patents themselves may or may not be implemented in the search engine and are only being registered for legal reasons, It’s still a very valuable part of you’re knowledge as an SEO.

I’m study Computer Science myself and i always try to think about how will the search engine identify, know and react to what i’m doing. also how will they be able to distinguish and evaluate different Black Hat tactics that are going around all the time.

The ability to see the patents and they’re evolve over time really does give an idea to what they are capable of doing now, and where do they try and go. It helps you spot coming trends in the field and can re-confirm that what you do really is a part of the best practices for SEO.

and as final words, thank you bill! keep on the good work, we all enjoy it 🙂

Truth is we will never know if a patent is used – their overall strategy is to get as many patents as possible under legal protection, so studying it can be useful only if the technology is applied to the user’s experience

I also totally thought of you when watching the video.
Exploring patents is most definitely a must. However, you know how urban legends rise easily in SEO (LSA anyone ? 😀 ).
It’s too easy to say “look Bill showed a patent; therefore, it is used by Google”.
To absorb the information totally helps understanding how a search engine works, but the majority of people certainly don’t have the level for the right interpretation. Most people will only scroll throughout your post, and will only receive the information they “like”.

I’ve learned a lot about Google even from the patents that they’re unlikely to ever use. It’s not necessary to read patents to try to get an idea of how something they’ve implemented might work, if it provides examples of different approaches and methods Google might use for a lot of patents. I’ve also learned a great number of positive ideas from things like Google’s many Phrase-based indexing patents that benefit the rankings of sites and the quality of pages, and I’m still not sure that Google is using phrase-based indexing.

Thank you very much. I learn best when I’m trying to share with others, and trying to put something into language that others can understand.

The business analysis aspect of paying attention to patents can’t be underestimated. The approaches used within patents to address issues like privacy, or security against people who might attempt to abuse a method, or the ways that certain approaches might alter how things are displayed, and much more can be seen in patents – even those that might not be implemented. I love unveiling the assumptions behind patents, the history behind where they might fit, the philosophies about how the Web and search should work. That evolution is pretty interesting, too.

I’ve seen a lot of blog posts and comments and forum threads where people take ideas from patents and come up with ideas that place a little too much faith in those patents as described instead of possibly how they might be implemented, if they ever are. But I also felt like Matt was including me in his video.

One of the reasons why I include links to the patents I write about is that I have an expectation that at least a few people will click through and look at the patents themselves. I suspect that might not happen a lot, but I hope it does.

People do filter what they read, based in part upon their past experiences. Sometimes I’ll make sure that I use some hypotheticals or examples from the patents themselves to try to put them into a perspective that more people can get. Sometimes they can be pretty complex though. 🙁

Thank you very much for your kind words. I remember going to SES San Jose back in 2005, and having someone ask me about some of the big things on the horizon for Google. I mentioned named entities and Q&A results and how they could lead to something like the knowledge graph. I think that was a little too much for the person who asked me that. I wonder if he still remembers that answer. 🙂

I think Google gets so enamored with the path they’re following that it’s become personal to them. Maybe they aren’t as adversarial as they are invested, and to a degree defensive. I understand that. Just don’t like being told not to read patents and explore them, ask questions, discuss them, make guesses, and experiment. I agree that people shouldn’t take what they include as gospel though.

Like Richard said, your work is most valuable to us (THE most ?!). I can’t even count how many times your posts helped me understand how a search engine works. At the end of the day, we can’t write down the formula, but there is a strong sense of what and how we can feed the crawlers.
On my end, I go beyond patents to check research papers, etc. I believe even fewer of us do it.

Actually, it doesn’t matter if Google uses the patent or not. The truth is on the screen, and we make up our own mind. It’s not the scientific method, but more the approach of a craft worker.

I agree with Vince. It is just a race to obtain the most patents for future developments. Whilst we can’t always assume Google are using their patents straight away it is worth inspecting them for future reference.
Great Post

It’s also important to check out not just patents but also Google papers presented at conferences.

For instance, in the paper “Predicting Bounce Rates in Sponsored Search”, the authors mention that one of the techniques they used in their test is:

“Cluster membership shows the strength of similarity of a given piece of content to a set of topical clusters as determined by a mapping function… These topical clusters were found by a proprietary process similar to latent semantic analysis”

They’re probably talking about Principal Components Analysis or something similar, and yes, the paper is on paid search, but LSA-like techniques are certainly in their toolbox (sorry Laurent, I could not resist 😉

To be honest, patents are not in my line of nature so it’s a little bit hard for me to understand what you want me to understand. Well, I do have a little idea after reading the re-tweets between you and Mat. Well, at first, Google is messing around with SEO to make anyone’s lives miserable. But then again after Matt’s tweet said that “SEOs assuming that we were changing ranking to observe SEO changes”, then it hit me. We all accuse Google for messing around well in fact, they only want us to straighten our sites as well as ourselves. We shouldn’t be accusing them of anything because they only do what they think is best for everyone.

It seems the concern over patents might be less about the patents themselves and more about the broader issue of website owners getting so caught up in SEO and SERPs. Anytime Google or its representatives make statements about search, we all (myself included) hold our breaths to try to catch any detail that might help our websites to place better. So maybe the patents debate is just one example of many where Google thinks we’re missing the forest (user experience) for the trees (SEO tidbits).

Yes, I scour through Google Scholar for new whitepapers, and make sure that I do searches for the inventors listed on patents when I write about the patents their names are on. It’s really nice when they’ve written on topics that are similar to what a patent is about (I also look to see what other patents they’ve been involved in as well.) Sometimes there are other materials that are related as well, like presentation slides and videos – it can be really nice to get different views of the concepts being created in a patent when it has a face and voice associated with it instead of all the legal language that a patent contains. 🙂

I agree with you that it doesn’t matter whether or not Google actually uses a particular patent – usually there’s something there that we can learn from.

Thanks. When Google made a couple of patent acquisitions from IBM a couple of years ago (1,000+ patents each time), a lot of the patents acquired looked like they were for defensive purposes, and that’s not a bad idea. A lot of them likely covered Google’s own internal network building processes though. Many of the patents that Google develops internally do look like they actually describe things that Google worked upon and could potentially be using, though it’s not always easy to tell. I have seen enough patents from them though that have been implemented, that it’s clear they aren’t just filing things for future development. 🙂

The predicting bounce rates paper was really nice because it showed how signals from landing pages and from the advertisements themselves could be used to do a pretty good job of predicting the effectiveness of those ads and landing pages (http://www.bayardo.org/ps/kdd2009.pdf)

What bothered me about Matt’s video is that it came across as telling people to ignore patents completely, which I don’t think is something that he really intended. They may include information that Google may not want wide spread, but it’s not Google’s call to tell people to ignore them. The purpose behind requiring patent holders to publish patents publicly is so that others can read them and learn from the.

I don’t think that we can come out and state that Google is acting in everyone’s best interest, but I don’t think they have any ill intention.

My concern was about patents themselves, and the opportunity for people to use them to learn from them. When Matt Cutts comes out with a video telling people to ignore patents, and I write a good majority of my blog posts about patents, it’s like Matt smacking me. 🙂

In the past Matt Cutts have thrown so much misleading information that I can publish a book about that. Now recent question that he answer are as useful as “ashtray on a motorbike”. Watch and read official source, but have always something in mind that things are not as it look and no so much straightforward.

I don’t care what Matt Cutts says… i still suppot seobythesea :). As SEO professionals it’s our job to use the best information available to paint a picture of where Google is at and where it’s going. Patents are just one of the many ways to put strokes on that painting.

I’m just getting into SEO and stumbled across your blog. I have to applaud you on your research and the work you put into your content. The level of detail here with regard to use of patents is mindboggling! Quite daunting for us just getting started!

It reminded me of spending a fairly grey summer last year reading through patents on my iPad. I remember reading about some patents I considered a bit silly such as one for implementing a blinking image but also some fairly interesting patents related to display on the fly image optimisation. I managed to find the article you wrote about acquisition here:- Google patents from IBM

At the time I was thinking about how it could be used for the automatic processing of cheques or fraudulent bank notes from a banking branch network to be held in a central repository; the images could all be normalised automatically and then when their called to be recovered or viewed the disparity between the images would be minimal. In addition the patent clearly insinuates that it has a banking or finance application in mind.

Taking away the specific sector application – the model works where a number of images originate from a variety of different sources by a variety of operators and are then normalised or optimised to be displayed via a single distribution or repository location.

So – if we add a model – something user generated requiring on the fly optimisation for normalisation – we could construe something I read recently about something similar being developed at the time of the patents being acquired and possibly combine it with something we laughed about at the time

I understand this post relates to search related patents, but by looking at something a little less opaque than the implementation of search related patents and more about general patent applications at Google, we can perhaps infer that there may be a process for acquired and filed patents being incorporated into existing technology, or rolled into new strategy.

I don’t know the full gamut but I do suspect that Matt Cutts is throwing a bit of a curveball.