Another common piece of advice is to improve your ad’s Quality Score. The reasoning is that a higher QS is a “reward” from Google that makes the ad “eligible” for more impressions, especially for broad-match terms. But what hard evidence is there that QS affects impression volume at all? Sure, very-low-QS ads can be labeled “Inactive for search”, shutting them off completely, but will allowing a QS to slip from 6 to 3 cause a partial drop in Impression volume? Does pushing it from 7 to 10 cause a rise?

December 2010 was a holiday month and in October and November Google announced two major bugs in QS reporting, making data before then suspect in my mind. So, only since the past few weeks has it been possible to collect reliable data that might shed some light on those questions.

One obvious way to gauge the relationship between Impression volume and Quality Score is to simply graph these two quantities for keywords whose QS changed over a short period. I wrote a program that looked in most of The Search Agency’s largest accounts for weekdays (Monday-Friday) from January 3 – January 28 at all of the Google Search Network keywords. (Google claims that Quality Score is only calculated when the search query is identical to the keyword, but if QS affects Impression volume, then the effect should be seen in words of all match types.)

Below are shown typical results, from a retailer of vitamins and nutritional supplements. One keyword is a broad match term for a well-known 7-letter chemical (like ‘calcium’) and the other a common 6-letter compound, also broad match.

In the first case, the daily Impression volume doubled when the Quality Score increased, while in the second, the Impression volume increased when the QS dropped. (It’s difficult to see, but for the second keyword 5 of the 20 data points are at QS=4, all with about the same number of impressions.)

Just under 17 million out of 25 million impressions in this account were in keywords whose reported QS changed at least once in that 4-week period. Among those keywords, 47.3% (representing 48.3% of the impressions) saw their average Impression volume rise as QS increased and the remainder saw it fall as QS increased. So, if you want to predict how one of your ad’s Impressions is likely to change with QS, just flip a coin.

The problem here is that Impression volume is much more strongly correlated with average position than with Quality Score: For the keywords in the graph above, the keyword received more impressions when the average position was higher, but it’s tough to see if the changes in QS actually caused anything, or if they were only an effect of something else changing. (It would be nice to see graphs of ‘Impressions vs. bid’ showing the entire curve moving up or down as Quality Score goes changes. But even then, that would only be meaningful if the Search volume remained constant and, unfortunately, Google doesn’t release that information.)

Perhaps there’s another way to detect the effect of Quality Score on Impression volume.

In some systems it has been noted that the frequency of occurrence of first digits is not uniform. That is, because it is easier for a keyword to get 1 impression than 2, the number of records with 1 impression should be greater than the number with 2 impressions. Similarly, the number of records with 10-19 impressions should be greater than the number with 20-29 impressions, and so on across many orders of magnitude. Therefore, the number of Impression records that have a leading digit of ‘1’ should be greater than the number whose first digit is ‘2’, which should be greater than the number whose first digit is ‘3’, and so on.

This relationship is called Benford’s Law after the man who found it applied for a great variety of systems, especially those where a ‘long tail’ is present. The histogram below comprises over 250,000 daily Impression records from January 2011 for keywords which got at least 1 impression on any given day.

For the reason described above, the plurality of Impressions records (about 39%) begin with digit ‘1’, the next most common first digit is ‘2’, and so on. Deviations from this pattern are used, for example, in a field called ‘forensic accounting’, developed by a UC Berkeley economics professor, to detect embezzlers who try to fabricate “realistic” numbers to put on forged checks.

Strictly speaking, Benford’s Law indicates that the fraction of records that start with ‘1’ should be about 30%, not 39%, but Benford’s figure is for the case where the data conforms to a straight-line power law distribution and Impression volume is known to be better represented by a different statistical distribution, one where the fraction of first-digits that are ‘1’ is often higher than 30%.

If it’s true that a QS of 10 results in a very high probability that an ad will be shown for a given search and lower QS’s are ‘penalized’ with a lower likelihood of being shown, then the fraction of Impressions records that start with ‘1’ should change as the Quality Score changes. (Typically, though not always, the fraction will rise as the probability of being shown drops.)

Below, we subdivide our Impressions records into 10 different ‘bins’ according to that day’s reported QS value and calculate the fraction of records that begin with the number ‘1’. A falling trend as QS rises would indicate that Google ‘punishes’ low Quality Scores with a lower chance of being shown and, conversely, that it ‘rewards’ higher QS’s with more impressions. (The fraction was not calculated for QS’s 1, 8 and 9 due to a very limited number of records available with those values.)

In fact, no trend like this is seen for this account nor any of the other major accounts examined in this study. This does not mean that Google is not using the reported QS value somehow in determining whether or not to ‘reward’ or ‘punish’ Impression volume, only that there’s no irrefutable evidence I have yet found that they are using it.

On the other hand, perhaps the engineers at Google are smart enough to make certain that the method or extent to which they are ‘rewarding’ or ‘punishing’ ads is not detectable using Benford’s Law. After all, the professor who helped developed forensic accounting was Dr. Hal Varian, who is currently the chief economist at Google.

6 Responses to “Does Google Reward High Quality Scores with More Impressions?”

Brilliant analysis, Bradd. We see the wiggles associated with QS as some sort of refactoring that G does from time to time that doesn’t really have any bearing on performance. A given ad was a 10, now it’s a 7 and nothing about the ad has changed — whatever. As your analysis demonstrates, the numbers seem to bounce around without impacting anything, and folks who spend time trying to “fix” QS might be better served studying the levers that matter, with bids being the fastest and surest way to impact performance.

I think too it depends which “quality score” we’re talking about. The primary quality score that gives you a “number” or metric, is”Keyword Relevance. There are other quality scores that Google is also using, or alludes to the use of, like account level QS, adgroup QS, etc…

I could see the possibility of broad-QS metrics like account and campaign QS affecting impression share, but it would likely be marginal.

It’s kind of like when Google says on the organic side that “domain age could be a factor we use” in rankings, it’s likely .05% of 200 other factors in the grand mix.