How to Analyze Google Analytics (not provided)&nbspData

This YouMoz entry was submitted by one of our community members. The author’s views are entirely his or her own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.

It has been over three months since Google announced the (not provided) update that would protect privacy hide search referral keywords for organic traffic. Since (not provided) results only show up while logged in to Google, the increase to over 60 million Google plus users – and estimates of 400 million by the end of 2012 – indicates that the problem of search referral data is not going away anytime soon.

While Google’s own Matt Cutts estimated that “even at full roll-out, this would still be in the single-digit percentages of all Google searches on google.com,” SEOs and web marketers are seeing that percentage grow steadily past initial estimates. While this might hold true when looking at total Google searches, likely it is not true for you or your clients. If you don’t like this kind of uncertainty in a career, then maybe SEO isn’t for you. However, if you love a challenge and finding creative ways to overcome obstacles, read on.

Which (not provided) results are branded?

One of the most immediate problems that the Google update caused was the inability to differentiate branded vs. non-branded search traffic. Before, simply excluding the branded keywords in Google analytics was enough to see how much traffic was coming to the website directly from SEO. Now, you get something more like this when you exclude brand keywords:

This accounts for 14.6% of the total (non branded) search referral traffic for this particular site, or 373 keywords that remain behind the mysterious (not provided) veil. That means that only 2177 search referrals are known keywords. While not all of these 373 visits are branded, there are enough that are to skew the data making it harder to track growth over time of SEO efforts.

Before you go cursing the Gods of search, take a deep breath and count to ten. After you find your inner Zen, let us look at how to solve this problem. Follow these three easy steps:

Keyword Landing Page - As you probably know, there is a little option to check the landing pages that each keyword was linked to from the SERPs.

Filter not provided - Once you click this option and turn it on, scroll to the bottom and filter the results by typing in “not provided” (with our without the parenthesis).

Look for “/” – Once you have your results filtered, you can easily see which (not provided) results went to the home page and which went to internal pages. The number of (not provided) results that went to the home page (showing up as “/”) indicates roughly the number of branded keywords.

Conclusion:

In the above example, there are 92 results that went to the home page which indicates that 281 of the (not provided) results are non-branded keywords. This means that there are now 2,458 referrals that are non-branded.

One draw-back to this technique is that you cannot conclusively say that the 92 results to the homepage are ALL branded. In this particular example this may not be a huge deal since 92 visits is only 3.6% (92/2550) of the total known non-branded keyword traffic. However, with a site that gets much more traffic, 3.6% could easily turn into hundreds or thousands of unknown keywords driving traffic.

One potential way to estimate the number of branded keywords that are coming from logged in Google users is to look at historical data. For example, if before the Google update your site received 30% branded search traffic, then you can take that number and multiply it by the (not provided) searches that went to your home page. In the above example, that would mean that roughly 30 visits of the (not provided) were branded.

Differentiating (not provided) keywords

Once you have discovered how many (not provided) search referrals are coming from non-branded keywords, you are still left with gaping hole in data. With a little bit of patience and creativity, you can segment (not provided) keywords into general categories of search traffic making it especially useful for any type of analysis that depends on categorical keywords rather than exact match keywords. This method will only work if the keywords that ARE provided by analytics are matching up nicely with the correct landing pages.

So you have a nice list of (not provided) keywords and which pages they landed on. At a quick glance, you can quantify roughly how many keywords of a certain category were searched while logged in.

In the example above, on the second and third rows one can tell that 32 visitors found the website by searching for the broad match keyword “zorro zoysia grass,” and 28 for “how to lay sod.” While there are a number of keyword searches that could land on /zorro-zoysia-grass-sc (like “zorry zoysia grass”, “zoysia grass”, or “zoysia sod”), you now know that over this period of time 32 keyword searches related to “zorro zoysia” landed on your website.

This is wonderful information by itself, however sometimes you need more than at-a-glance data. When you have hundreds or thousands of keywords and landing pages, having an automated approach is necessary for analyzing large datasets. An entire new post could be written on how to do this using Excel or Google Docs. However, if you read Richard Baxter’s post on how to do keyword research using categories, you can take a lot of the same logic and formulas to segment (not provided) traffic into keyword categories to measure the change in categorical traffic.

As Google Plus continues to grow, organic search referral data will continue to decrease. How will you adapt to the lost data of Google Analytics?

You can take this a step further by looking at Landing Pages vs. Keywords and seeing what keywords are driving traffic to each landing page.

That is to say if I know KW A lands on Landing Page X and I know that 92 visits landed on Landing Page X then it's very likely one of the same keywords that was actually reported. You could then apply a percentage based on actual data to all the visits on the not provided section of the landing page based on the actual keyword data reported. If 40% of the keywords landing on Landing Page X are KW A then apply that to the total of not provided visits.

That is a good idea as well. With the (not provided) hiding KW, you will never have a perfect fix, but it is better to be able to retrieve at least some of the data. The categories you can get from this method will give you "broad match" keyword referrals vs "exact match" for the (not provideded) results

I think the attempt was solid but there are far too many assumptions made here for me to find any value in it. I've yet to figure out a way to get any definitive data from not provided and I have some large volume clients seeing over 20% of the data being (not provided). Very frustrating to say the least.

Really agree on this. Although the idea of using landing pages as an indicator of brand searches is nice, it's so far from reliable that it shouldn't be used. Using landing pages as a measurement is the same as saying that your home page only ranks for brand terms, and if that's the case you are most likely not exploiting the power and authority of it very well. Usually you would be trying to rank for some of your vanity terms with it as well.

I've seen quite a few posts like this myself, and they all tend towards the same flaws. They either have far to meny assumptions to provide useful data, or are basicaly 'You can take your average keywords and presume they apply to this data'.

You can analyse how google users tend to arive at your site, you can still see your landing pages e.t.c. But any way of trying to get keyword data 'back' will either leave you with your averages of your non-provided, or data too unrealiable to be used.

It's not to say analysing this kind of data elsewise is pointless. But it's going to be far more useful if you discover an odd number of non-provided going to certain pages (implying a lot of people with google accounts going there compared to the usual) then trying to 'guess' at keywords you still have 70% of your data for.

Agreed. The underlying assumption that brand terms lead to the root url is too big an assumption. Take "seomoz seo guide" for instance. I'd class that as a branded search, but the landing page is "/beginners-guide-to-seo".

To understand whether you can make this assumption on your site or not, create an advanced segment for branded terms and then select 'Landing Page' as your primary dimension. Check out the percentage of visits that hit your root url vs deep links. On our website only 12.55% of branded terms landed on the homepage.

Indeed awesome post Adam, as we all know , everyone who uses GA is suffering from the problem of (Not provided) data set, and consequently there were many information about it but they were rather informational and predictive type. So I can say this is something which I can actually execute and see the results. well , it gives a rough estimation about differentiating Branded and Non-Branded keywords but it's Good to have a Single Apple rather than waiting for whole lot. Landing page technique is quite smart way to figure out some rough estimates. Thanks Adam!! Would love to read more curative type of posts of GA from you!!! Thumb up to you!!

My company has 28% of traffic coming in from this method. Due to Firefox 14 automatically securing all searches by default, this number is going to rise by at least another 10-15% over the next year. Pretty soon, we will have very little data to work with from GA. It will all be webmaster tools. It sucks.

This is a smart way of getting around Google's nasty iron curtain of hidden information. I myself have seen a few sites with up to 30% of the keywords not provided which just leaves a gaping hole of emptiness in your heart. Thanks for the tips, I will apply this right now.

I am having a hard time wrapping my head around this. For one of my sites over 30% of the views are coming from unknown referrers to over 16K landing pages making it hard to pinpoint how readers are getting to my content.

This problem has become astronomical. Now with Firefox defaulting to HTTPS, keywords are becoming a thing of the past. I logged into my Stat Centric analytics account to see that only about 50% of my keywords are coming back now. Oddly google analytics showed only 1 of 39. Yup its that crazy. Not sure why Google is showing less, but i've looked into the issue and don't see any way around it. Google webmaster tools doesn't fill the gap. Without conversion and performance data, the keywords are almost useless. Segmenting by landing page is clever, but not really a replacement in my opinion.

Not provided data is important to the SEO side, today your suggestions, a good interpretation of the q(uery) parameter and a strong SEO analysis can make us discover the 99% of the keywords not provided in google analytics.

Thanks for a good explanation of this. I've just been reviewing some stats for the first quarter of this year, and was slightly shocked/confused to find nearly 2,500 keyword referrals "not provided". Not very useful! Your solution should help make some sense of it.

But you'd think there would be a way to show the search terms while still keeping the searcher's info private...

I am slightly confused by this action by Google. Surely Google Analytics will suffer as a result?

At first I thought they will actually show the secure data on Google's products (analytics) and remove it for 3rd parties, essentially forcing everyone onto their products. But it seems like they are hiding it even from themselves and will contrinue to do so?

I agree with Evan in that this is a fair old stab, but there's a few too many assumptions to make this definitive. It's the kind of thing that'd work reasonably well for some websites and not at all well for others - depending on everything from traffic volumes to market niche.