Subscribe to Marketing Day

Dark Google: One Year Since Search Terms Went “Not Provided”

A year ago, Google began going dark. Dark in terms of no longer sharing with publishers, in some cases, how people searched for and found those publishers through Google’s search engine. The “single digit” percentage of withholding that Google predicted at the time has turned into more than 50%, in some cases. If Google’s withholding were an eclipse, more than half the sun is being covered. “Dark Google” is upon us, and it will only grow darker.

I’m drawing the term “Dark Google” as a play off the “Dark Social” concept that Alexis Madrigal wrote about recently in The Atlantic. Dark Social refers to social visits that can’t be attributed to any particular social network, such as if someone shares a link with a friend via email. That’s arguably a “social” event, a socially-related visit, but not one that can be attributed to any particular source.

In a similar fashion, Dark Google refers to visits from Google’s search engine that can no longer be tied to a particular search term. This withholding of search data began a year ago (and a day) and has continually expanded since.

Passing Along Search Terms

First some background and history, to understand this huge change. I promise to keep it brief.

When people search on a search engine, typically the search terms they used to find a web site are passed along to the publisher.

For example, if someone did a search on Google for “dvd players” and clicked on a listing from Best Buy from among those in the results, Best Buy would be able to tell that visitor found them through Google by searching for the words “dvd players.”

This is all a consequence of how our browsers have worked since before Google existed. Browsers pass along what’s called a “referrer,” which is kind of like a Caller ID system for the web (and sometimes spelled “referer” due to a historic misspelling).

To understand more about how this Caller ID system works, I recommend reading the article below:

Google Begins Blocking Terms

This time last October, Google made a change to block this Caller ID system for anyone who was signed-in to Google when they searched. Why? Google said that it was designed to protect privacy. My article from last year covers this more:

Google was correct. This change did protect potential “eavesdropping” by others of what someone was searching for. However, Google deliberately left a hole in this privacy protection. Anyone clicking on its ads still had their search terms left vulnerable to eavesdropping.

Privacy Hole For Advertisers

Google has given, in my opinion, a fairly convoluted explanation about why this gap in privacy protection was left open. It’s said that potentially an advertiser could buy so many different ads that it could still see some of the search terms that Google’s privacy protection was meant to secure. Oh, and Google also says advertisers need the data to better know if their ads work.

My view is the search-side of Google wanted to better protect people from eavesdropping, especially in advance of Search Plus Your World, which potentially would expose more personally-revealing search terms. But the ad-side of Google demanded an exception so that advertisers wouldn’t be upset nor Google’s ad retargeting business harmed. The bottom line won. A privacy hole was left open for advertisers.

Suffice to say, Google and I don’t see eye-to-eye on this issue. Google finds it to be a reasonable compromise that brings greater security overall for searchers. I find it one of the most disturbing and hypocritical things the company has ever done.

As a marketer, I love search referral data, but I also understand having to lose it to better protect privacy. But if it’s about privacy, then Google shouldn’t leave a loophole that puts advertiser interests over that of its users. My article below has more about this:

The “Not Provided” Eclipse Begins

When the blackout began, Google predicted that for searches on Google.com, data would be withheld in the “single digits.” In other words, less than 10% of the search term data reported by Google.com to publishers would get withheld.

Those using Google Analytics were able to spot when this was happening because “not provided” started appearing as a search term in their reports, like this:

What was happening is that Google would report that a search happened but strip out the actual search terms. Any time Google Analytics saw one of these “blank” searches, it counted it as a “not provided” search (other analytics programs used different ways and methods for the same end result).

The darkness kept growing, nor was it some type of niche SEO issue. In April of this year, Poynter — a major site about journalism — found that 29% of its search term data had gone dark and that “not provided” was its top search term.

Today, that’s sure the case I see. By far. Consider the top five search terms sending traffic to my personal blog, Daggle, this month:

Not provided isn’t just my top “search term” but it’s also about 150 times the volume of the next biggest term, “3 monitor setup.” It’s crazy: 3,654 search terms blacked out, all bundled together as “not provided” with the next most popular term coming in with a count of 24.

The Darkness Grows

The numbers have risen because of a variety of factors. For one, as Google has continued to grow its Google+ social network, it has encouraged people to sign-in as much as it can. In fact, a new study found that people are three times more likely to be signed-in when they search on Google than on Bing. All those signed-in searches have keywords withheld. Except for Google’s advertisers, of course.

Publishers aren’t even sent “blank” searches so that they can at least tell someone was a search-related visitor. Instead, all these Safari searchers appear to publishers as visitors who came directly to their web sites.

How Much Withheld?

So where are we at? I haven’t seen any broad-based metrics recently (Conductor did a study in March finding 16% across 25 different sites), but stay tuned. I suspect we’ll see people sharing some of their own stats in the comments below, after this column is published. In the meantime, here are some stats from sites I have access to, for search traffic for the current month so far:

A friend’s entertainment blog: 18% withheld

My personal blog Daggle: 45% withheld

Search Engine Land: 60% withheld

Marketing Land: 62% withheld

Those figures are actually lower than what’s actually being withheld. That’s because many searching through Safari in iOS 6 are instead reported as “direct” visits, as I explained earlier. Consider this for my friend’s site:

Most people using iOS 6 appear as if they came directly to the site when actually a big chunk of those visitors probably came through Google searches. That’s especially true when you compare the stats to iOS 5 users, who aren’t routed by Safari through Google SSL Search. Direct is not the top source for them and far less a percentage in relation to “Google” traffic:

Finally, here’s how terms being withheld grew over the course of the past year, looking week-by-week at the number of “not provided” terms reported for our Search Engine Land site:

What Will Year 2 Bring?

There’s every reason to suspect that even more search term data will be lost going forward. Google’s field trials to test finding Gmail results or Google Drive results alongside regular search results means even more people will be signed-in when they search, so even more search referrers will get withheld. Meanwhile, with Firefox and mobile Safari providing secure searching, Google will feel more pressure that searching in Chrome should be made secure, by default.

Referrer data has been one of the things that has made internet marketing so accountable. Sadly, signs of it dying away have been out there since 2010, and it’s just getting weaker. It’s continuing to take body blows on the search front, and I suspect those will get worse both for search referrers and in general.

On the upside, there’s still enough data that’s not withheld to give publishers a sense of important terms people use to find their sites. Google also provides query data through Google Webmaster Central that can supplement what’s been lost. You can’t use that data to tailor your pages for visitors who arrive after performing certain searches, but at least you still have a sense of what you’re ranking for. More about gathering this data can be found below:

Can I Haz More Than 90 Days?

Then again, Google’s only providing that data for the past 90 days. Unless you’re constantly downloading it, you can’t see trends over time. I’ve desperately wished that Google would expand this data for a longer period of time. Google Analytics can import the data, but only the last 90 days will show. Google should allow whatever gets imported to stay accessible. It’s a fair thing to do given how much Google unilaterally took away from publishers. It’s also a secure way to do so.

And please. Please. Don’t give me excuses about how there aren’t enough machines at Google to store all this data. If anyone can have a Google Analytics account for free to store the huge amount of data a site generates on a daily basis, Google can find room to make this search term data accessible for longer than 90 days.

Publishers Have Survived; Privacy Loophole Remains

It’s also important to reflect that in the year since this has happened, this year of Dark Google, SEO hasn’t died from the blackout nor has web publishing collapsed. The remaining referrer data still getting through and alternatives like Google Webmaster Central seem to suffice. By the way, those seeking alternative advice should take a look at the articles below:

But make no mistake. Plenty of publishers feel frustrated with or resentful toward Google over the change. They’re also the ones who really understand that when Google talks about having done this to protect privacy, Google left a loophole to protect its advertising operations, to benefit its advertising customers.

At some point, Google’s other customers may realize this, those who search using Google. Then they might be resentful for a different reason, that Google decided privacy was worth protecting up until the point it put ad revenue at risk.

SMX Advanced is the only conference designed exclusively for experienced paid search advertisers and SEOs. You'll participate in experts-only sessions and network with fellow internet marketing thought leaders. Check out the tactic-packed agenda!

Sponsored

Analytics news and expert advice every Thursday.

http://www.brickmarketing.com/ Nick Stamoulis

“Not provided” data has crept to almost 50% for my company site. That’s a far cry from the 10% Google initially said. And while I can appreciate how they are trying to protect privacy, it doesn’t make sense that Google wouldn’t do the same thing for AdWords data. Either do it the same across the board or not at all.

http://twitter.com/Nathan_Safran Nathan Safran

We have plans to update our [not provided] study at Conductor soon.

Randy Stuck

This has been quite annoying. A majority of our clients show (Not Provided) as the #1 search term, which, of course, renders all YoY Brand and Non-brand reporting useless as we have 10+% of our search terms unaccounted for.

The idea that this increases privacy is quite ridiculous since it’s only from Google products and you can still get data from other analytics providers.

http://twitter.com/MartinLaetsch Martin Laetsch

We have seen this steadily increase on our site. [not provided] was 43% for the first 2 weeks of June. It is 56.8% for the first 2 weeks of October.

http://twitter.com/joetek Joe Taiabjee

You can’t even look at the remaining searches as a “representative sample” of the types of searches coming to the site. The result is too skewed.

You could argue that the people that search on Google while logged in are more “Google Savvy” in the way they interact with Google and the search terms they choose. It’s quite probable that the search terms hidden in the (not provided) line are very different than the ones that show up below it.

http://searchengineland.com/ Danny Sullivan

Yeah, it’s a good point. I was thinking the same, when I looked at that 150X difference between “not provided” that I showed and the next term being so low. When the numbers are that low, exactly, are you still getting enough of a representative sample?

Jonathan O’Brien

Wouldn’t encrypting the the search results also prevent the Bing toolbar from harvesting all those queries?? I ask in ignorance so correct me if I’m wrong, but it seems to me that these changes happened following Google’s sting operation and their fear that Microsoft was plundering their spell corrections.

http://searchengineland.com/ Danny Sullivan

Yes, the encryption also indeed does help with the whole Bing’s spying on us thing.

Rachel McEneaney

It would be great to hear what the percentages for “not provided” are common for certain industries, niches such as B2B industrial websites and per regions.

Jonathan O’Brien

The reason I bring that up is this: The collateral damage done to publishers would have caused much more general uproar if the cited reason was merely to block the scraping of its search terms by a competitor’s toolbar instead of “increased user privacy”. To be honest though, I think the privacy issue is kind of pointless because the search terms that are being passed along are unattached to any identity. Additionally, any given website would only be seeing one such query at a time, not a pool of terms from which your identity could be deduced. The general public however, is much more sympathetic to the idea of “increased user privacy” (even if it is minimal or insignificant) rather than a company protecting its valued assets. I believe that Google’s stated goal of better user privacy is only a secondary benefit to their actual goal of preventing scrapping from their main rival. In that light, I think it’s less relevant whether or not Google fails to achieve its stated goal of search term privacy (considering the loophole available to its advertisers) when its primary goal is to stop Microsoft from mining those terms to its advantage. Similarly, the U.S. fights wars abroad to “protect its interests” and it’s only a side benefit that it happens to topple dictators and promote democracy. Yet, the latter is the official goal while the former is the actual motivator. In both cases, it’s more meaningful to discuss the effectiveness of their main goals instead of the PR spin that is meant for the greater public.

Justin Brock

One more month and we’ll be at 50% of organic terms as (not provided), too. Have to use AdWords to do some keyword research now, which may be part of Google’s reasoning for making that change.

Just checked one of our larger accounts. 25% of organic searches not provided. 99%+ of paid search queries are being reported.

terlmaa

Well, Google seems to be jsut cool like that. What are you gonna do.
Over-Anon.tk

http://blog.truenorth.nu Mark Gannon

I’m at 35% not provided in Google Analytics. Everything else is less than %5.

http://rendion.myopenid.com/ render

I empathize with your concerns, but what will your complaint be when people stop using Google to search? Or hasn’t that hit you yet? Ive had it with them and stopped using them altogether duckduckgo does just fine. Im not trying to be sarcastic here, I seriously dont think google will have this monopoly for much longer. Search is way better now than it ever was, and guys in their metaphorical “garage” will put them out of business. I dont use a single google product for anything, and Im pretty happy with that…more will come to the same conclusion.

Joan62

Great article. I had no idea of this activity going on. Looks like Google has gone deep into the dark side young Skywalker. The megalomanics have taken over.

http://socialfreshacademy.com/ Jason Keath

SocialFresh.com is at 42% dark Google and another 11% direct traffic. The vast majority of that direct traffic looks to be search driven, as it is hitting our most popular search posts at the same ratios as the organic referrals we CAN see.

http://www.facebook.com/valentin.pletzer Valentin Pletzer

Thanks for the article! My point of view: Google is learning a lot by crawling the web (our content) but denies us access to the keywords (their content). I am very disappointed :(

http://twitter.com/StalkerB StalkerB

Are you suggesting other analytics packages can get around the (not provided) issue?

The problem is not with how Google analytics processes data but how Google search results pass it through.

No analytics package will be able to give you the keyword if it’s not passed along from the referrer.

Patxi Gadanon

Great post Danny. We can never forget that Google is a business and its aim is to make money with paid ads. We have recently published a post with some insights on (not provided) based on the data we have. We identified some interesting trends. It complements your point. What do you think?http://www.analyticsseo.com/google-analytics-not-provided-update

http://www.maximumferrari.com edkuryluk

It was just 10 years ago, the boys at Google informed the world they would “Do No Evil”. How times have changed

http://www.facebook.com/john.schulenburg John Schulenburg

I think Google will eventually make this data available for GA Premium. I think it’s been there plan all along to monetize the valuable data and stop competitors from poaching. The privacy argument is BS. It’s all about monetizing a valuable asset.

http://twitter.com/sworobec85 Steven Worobec

This is without a doubt has been one of my biggest frustrations over the past year. As a marketer on a budget, understanding the landscape of a small industry through how people find the company and the work we do is vital. The “Not Provided” search term is now our top referral keyword and I’ve lost a bit of understanding about how people come to us. Even though I know what many people may look for within our industry, the internet is constantly evolving and people’s behaviours change, not being able to see this will ultimately lead to opportunities being missed.

As a marketing professional I understand the need for privacy, especially with us being in a social age where everything about us is somewhere online. However with reduced budgets, smaller departments and more people going online, these changes and restrictions only seem to damage our understanding of our customers. I guess it’s a new sort of challenge with internet marketing and one which I’m confident we will find a way to adapt too – after all, we always do.

Jim McDonald

Bingo!

http://twitter.com/TexDesignStudio Tex Design Studio

I’v noticed this as well, but you can adjust the cookies for a longer time periods. Also, from the content drill down report select keyword detailing. Then apply bounce as a secondary metric measure. No you don’t get insight into the KW, but it indicates how much importance (not provided) played in conversion.

Tom Homer

Great article. I have been puzzled over the last 2 months why my direct traffic has been growing steadily. I found nothing to explain it until now. Way to go Apple…

Google has profit in mind when they allow Adwords to see the not provided searches and not Analytics. Both should have access or none of them should.

Irene Blaauw

Very interesting article. I checked this out for our website and our Dark Google is more than 31%, but still Google shows a lot of organic terms which we can use.