Time to Add Query Breadth to Your SEO Glossary?

Do the keywords in page titles on a Google search for [Digital Camera] carry more weight than the keywords in titles on a search for [Canon Rebel Digital Camera]? It’s a possibility.

How much weight do the keywords in your page title carry in search rankings? Or anchor text in a link pointing to that page? If I told you that the weight carried by different ranking signals can vary based upon a number of circumstances, that might be a little frustrating unless I could point out a good example of how a search engine might devalue the impact of some ranking signals, and in doing so boost other ones.

A Google patent granted today explains how something called “query breadth” can influence the weight of popularity-based ranking signals, and in doing so alter how much weight relevance-based signals might carry.

Ranking Signals and Search Engines

When someone performs a search at a conventional search engine, the pages that they see are returned in a specific order based upon a number of ranking signals. Google has told us that they consider at least 200 different signals involving such things as whether or not the terms used in a search query appear upon pages being returned, the quality of links pointing to those pages, and many others.

Some of these signals could be considered information retrieval or relevancy signals because the scores for pages depend upon the words used in a query. For instance, in a search for [Canon Rebel Digital Camera], a more highly relevant page might be one that uses that phrase and the words in that phrase a number of times on the page itself, and may have those words or very related words appear as anchor text in links pointing to the page.

Other signals might be considered importance or popularity based signals, such as the quality and quality of links pointing to a page. For instance, a link from the front page of the New York Times might carry more weight, and provide a greater popularity score than a link from the front page of my local bi-weekly paper, the Fauquier Times Democrat.

Still other ranking signals might combine both relevancy and popularity, such as the number of times a particular page has been shown to searchers in response to a particular query, or the number of times that page has been clicked upon when shown to a searcher after a search for a certain term.

How Broad Queries Impact Some Popularity Measures

Search engines often return a large number of pages when someone searches for more common one or two word phrases, such as [digital camera], often in the millions. A very large number of people tend to search for more common phrases like that one as well. Chances are that people performing that search may end up clicking a lot on links to web pages that show up in the top results for that search, and if the number of impressions or clicks for those results are considered as a popularity score, then those top pages may be over-represented in popularity counts.

That kind of over-representation of user-behavior data may make it so that other results for broad queries may have considerably lower popularity scores, and have difficulties outranking the more popular pages, even if the owners of those pages make changes to make those pages more relevant for those particular queries.

Because of that, when a query is considered to be a broad one, a search engine might not give as much weight to popularity-based ranking signals. Which would mean that relevance signals would in turn be given more weight.

Methods and systems for adjusting a scoring measure of a search result based at least in part on the breadth of a previously-executed search query associated with the search result are described.

In one described system, a search engine determines a popularity measure for a search result, and then adjusts the popularity measure based at least in part on a query breadth measure of a previously-executed search query associated with the search result.

The search engine may use a variety of query breadth measures. For example, the search engine may use the quantity of results returned by the search query, the length of the query, the IR score drop-off, or some other measure of breadth.

There are a few different ways that a search engine might measure the breadth of a query, and it may consider these in combination with each other:

The higher the number of results returned by the search query, the broader the query

The drop off in information retrieval (or relevance) scores from one result to another, for example, how much of a drop off there might be from the IR score of the first result to the tenth result, or the hundredth result, or so on. If there isn’t much of a drop off in those scores as you drill deeper and deeper into the results, then the query can be determined to be fairly broad.

How frequently people search for a particular query or very similar queries – the more frequently, the more broad the query could be considered to be

The smaller the number of terms in a query, the broader it might be determined to be

Some queries might be short, or searched frequently, but still be very narrow queries, so a combination of measures like those listed above might be used.

Take Aways

We don’t know how much weight Google might be giving to popularity based user-behavior measures such as impressions and click throughs in search results, but it’s possible that they may be giving some weight to those.

We also don’t know much of a difference there might be in the rankings between web pages that appear in search results for particular queries. When someone performs a search, the first result might be considerably more relevant for the query used than the second result. Or it might be a photo finish between the two. We also don’t know how closely the first search result and the 10th result, or the 100th result might be in terms of an information retrieval score.

We do know that it’s often more likely that the higher ranking results will get seen and clicked on more often, and may increasingly grow in terms of popularity-based aspect of ranking pages. That can potentially keep pages that increase in a relevance-based part of their ranking scores from being able to outrank pages that continue to grow in popularity.

To counterbalance that growing popularity, this patent provides a number of ways to see how broad a query might be, and for sufficiently broad queries, Google may devalue the impact of popularity scores such as clicks and impressions.

This is just one example where the actual weights of different ranking signals may vary based upon a specific circumstance, such as query breadth. Chances are there are many others as well.

So, when someone asks you how much weight a title element carries when it comes to search rankings, and you answer, “It depends,” and they ask you for an example, query breadth is one that you can now point to as a possibility.

This reminds me of Google’s reasonable searcher patent, which discusses ways that links on a page might be weighted on the basis of how likely they are to receive clicks. It sounds like Google has been seeking ways to temper crude algorithmic assessments for years — something we have always suspected but for which there has been relatively little public proof (excluding numerous vague comments from Google employees).

So if a 3 word search query has several results and they have good IR score and Page Rank, what else be the other factors that help you to get higher rankings? I think bill is right on CTR and impressions what else can be?

One of the most interesting things about this patent isn’t the patent itself – it’s how it hints at the ways that Google is using user-data, despite being exceptionally coy about this for a long time. That this patent is about ensuring that user-signals don’t have excessive influence over other signals demonstrates just how important a part of the algorithm they probably are.

And don’t forget that this patent was filed in 2004 (when I first got into SEO). Back then, Google would pretty much flat out deny that they were using any user data in their algorithms.

There are a great number of ways that Google may classify pages and other types of documents on the Web, and use different algorithms to rank those differently.

For example, different genres of pages most certainly follow different algorithms. Blog posts and news articles may see freshness playing a larger role in how they rank than product type pages, for instance. Some pages may rank differently than others based upon how well of a match they are for different types of searcher intent, such as navigational, transactional, and informational type pages. The signals on some pages may be looked at differently if there is potentially some kind of geographic relevance to them, and they match a geographic intent.

There have been a number of white papers and patents from Google that describe how they may create profiles for people (individuals and as members of different groups for certain interests), for queries (to understand the diversity of potential results, to understand different possible meanings in different contexts, and more), and for websites (for example, some domains are the “perfect” results for some navigational queries).

One of the things that Google likely introduced during the “Big Daddy” infrastructure upgrade a few years back was the ability to plugin different modules of their ranking algorithms, where they could test and try out multiple simultaneous ranking signals for different classifications of sites based upon things like topic and type of query intent. This module approach made it easier to turn on one collection of algorithms and turn off another with the pressing of a single button.

The following post that I wrote back in 2006 about Microsoft adopting a system like that is probably similar in a number of ways to what Google is probably doing:

We have been told by Matt Cutts and others from Google that often the ways that certain algorithms are implemented are more sophisticated than they may seem on the surface, and a lot of patent filings that described different signals include discussions of baselines and thresholds and alternative implementations that may help ranking signals involved be less prone to manipulation.

It is nice to see at least one approach spelled out explicitly that tells us how the search engine may be going about things to avoid potential problems with using a certain ranking signal. As the inventors tell us in the patent:

A search engine often retrieves a large number of documents for a broad query. For example, if a user enters a one or two-term query, such as “digital camera,” the search engine is likely to return millions of results. Also, many different users may submit this broad query initially when searching about material related to digital cameras. Accordingly, the documents returned by these broad queries are often over-represented in the popularity counts, and the popularity count for each one of these results is artificially high* because of the number of broad queries submitted. Also, documents returned in response to broad queries are often more abstract than results returned for more specific queries. The more abstract documents are then over-represented in the popularity counts, whether based on clicks or based on impressions.

Chances are that a lot of the standard ranking signals that we are aware of are still used in ranking pages, such as the relevancy of a page for terms used in a query, the link popularity of the page itself, and others. But, on top of those are a lot of different filters that Google may use to rerank search results such as country preferences, language preferences, customization based upon a previous search, personalization based upon individual and aggregated user-behavior signals, quality-type signals, and more.

The patent points out CTR and impressions as some potential user-behavior signals, but this patent was originally filed in 2004, and we don’t know to what degree those may have been used in helping to rank pages. It is quite possible that approaches like query breadth have been used in limiting the impact of some popularity-based signals though, even signals other than CTR and impressions.

Good points. There have been a good number of patents and whitepapers from Google that describe how they could be using user-behavior signals when ranking pages. What I thought was interesting about this one is that it may be one of the first that I can recall that specifically explores how to temper those types of signals and reign them in to avoid having them hold too much sway over search results.

Google may or may not be using query breadth, but chances are that they’ve explored other approaches that might mitigate an over excessive impact of some signals, and chances are that different ranking signals may carry different amounts of weight based upon a number of things, including a classification of a page into category, or a matching of the intent of a searcher, or other features of pages, of search results.

Another signal that I recall may influence the rankings of a page could be the average age of results that appear for a query. If a top certain number of results tend to be older, there may potentially be a boost for older results. If a top certain number of results tend to be newer, then there may be a boost for fresher results.

We’re given examples of signals based upon popularity like CTR and impressions in the patent description, but it’s possible that other signals might be viewed as well that to a degree rely upon popularity.

The new +1 buttons in search results for example, which we were told may now be worked into impacting search results (see: High-quality sites algorithm goes global, incorporates user feedback) is one of the latest popularity-based user behavior signals that we’ve been alerted to by Google. Are those being tempered by signals like query breadth?

Patents usually require that companies lodge specific, highly rigorous explanations of what a product or process does.

Companies try to make the patent as wide as possible so that it prohibits anyone using anything similar, forever.

So why would google give up its special sauce? If you are the leader, and your power rests in the secrets that keep your search engine at the top? What is the benefit in telling everyone else what you are about to do in your micro changes? Is it to impress investors, searchers or help seo people to have a better chance of rigging the system?

Do you see what I mean? Everyone I know in seo complains about how Google changes things without warning and that the secret technology makes it hard to game the system, yet here there are, spelling out a patent? Something that they could surely hide inside their code that would be impossible to reverse engineer, unless stolen.

Am I missing something? I mean from the abstract the actual code of what they do still remains vague and hidden, so is this any more than PR or will it make search results better? Bill?

So, if I understood correct, and if we connect that to users experience, example situation could be like this:

One searcher makes 2 searches, one by one.
1st search: “London, England, Europe”
and 2nd “SEO services”
on second, page with title “London England Europe internet service” could get better position then page with title “SEO services Belgrade Serbia”

Hi Guys
This is getting a bit too complex…Question what would happen if you remove the code for google analytics from your website? How could then Google use your data and will be able to determine your ranking?
Also i do not think google should be taking into account the +1 in SERP. If they do this will be hugely manipulated! Next thing you know it someone will be offering on the new +1 service. All you need is 1000 different IP and you are on your way!

@John – I think Google really likes the data from GA but I am sure that it is able to monitor all of the 13 major global routers and work out traffic, IP addresses and individual computers anyway. I think that GA is just to make us feel safe and warm and hook us further into the Google machine (adwords).

I would certainly agree that a “direct hit” – that is using less than the 60 or 70 characters allotted on the title tag for the exact match gets more consideration, in my experience, than the more expansive and inclusive longer ones that happen to “phrase” include the term.

I’ve been considering whether keywords are broad or narrow in scope for years. It not only helps in determining how competitive they might be, but also which pages might stand the best chance of competiting for those terms on a site as well.

I don’t think that Google gave away the secret sauce here, but they did reveal something about how they are attempting to address what might have been considered a flaw in an algorithm. I think the value of using patents for marketing is very limited. As for a red herring, I would suspect that would be most effective in leading a competitor to allocate time and resources towards a technology that likely won’t be developed, but which has some plausibility to it.

Chances are that the process described in this patent may work with a broader range of ranking signals than just impressions or clicks or click to impression ratios, and may involve other ways to mitigate artificially inflated popularity-based ranking signals.

The process described should potentially help enable good sites that get better to overcome the popularity gained by sites that have been showing up in search results ahead of them. I think the “closeness” of IR scores across a range of search results is probably a better way of gauging how much breadth a query might have than the length of the query or the estimated number of search results.

Does it give something away to people who are attempting to rank web pages higher in search results? We don’t have access to Google’s IR scores for pages, so anything we do to attempt to decide how much breadth a particular query might have is handicapped because of that.

Google has published a few whitepapers and patents that describe how they may use information from previous queries to influence what a searcher sees on their next query, but this doesn’t do that. The language in the patent is somewhat confusing, and I can see how you might have come to that conclusion, though.

What this is saying is that if you search for a particular query term, and then you select a particular page in search results on the basis of that query, Google may see that page as popular. If a lot of people choose the same page in response to that same query, then that page might be boosted in search results some because it may be using those clicks as popularity scores.

Where there are potentially a lot of pages that might be a good match for a query (a lot of query breadth), then allowing the first results that have been showing in the top positions for that query to keep on accumlating popularity points may make it difficult for pages that are pretty close to as good for that query to ever outrank those pages that have been at the top for a while. In effect, the rich keep on getting richer.

If someone who doesn’t rank as highly spends the time making their page much more relevant, and builds some quality links to it, because the other pages have been building up popularity ranking scores, the page that has improved may still have trouble outranking pages that have been ranking highly for a while, even if those pages are no longer more relevant than the one that has been improved.

That’s what the method described in the patent is trying to fight – where there are potentially a lot of good results for a particular query, the value of popularity ranking signals might be scaled back so they don’t count as much.

Chances are good that Google isn’t getting its user behavior data from Google analytics, but rather from its query logs, from the Google Toolbar, from cookies, and from other places as well. It doesn’t need the Google Analytics account information to collect user behavior data.

As for The Google +1, chances are good that if Google decides to use the +1 information as part of a ranking signal, it would give more weight to people with more fleshed out Google Accounts and possibly do so based upon some kind of reputation score. If you open 1,000 Google Accounts and vote one page up 1,000 times, chances are those votes would probably not count for much at all, regardless of whether they were on different IP addresses or not.

I used the relevancy of a title element as an example of one of many of the different relevancy based signals that a search engine might use to calculate a relevancy or IR score, but I really wasn’t commenting upon how best to optimize a title element or any other element with this post.

The point is, regardless of whether a phrase appears in a title as an actual phrase of as two words that aren’t necessarily adjacent to each other, relevancy signals might count more when there’s a lot of query breadth because any possible popularity signals that might be associated with pages in search results may count less.

I’m encouraged that they understand that the high rankings of some sites might influence those sites to continue to rank well based upon popularity signals when there might not be that much of a difference between the relevance between a fairly large number of sites.