Briefly, Google Analytics uses sampling in much the same way a research company does – to make assumptions about your site based on a sample of activity.

It only does this once the traffic on your site gets to a certain volume (around the million mark), or if you start changing the standard reports (creating segments, custom reports or secondary dimensions).

Mostly it does it for practical reasons: it takes an enormous amount of time and power to process high volumes of data, so analysing a sample that will still give you a true indication of what’s going on within your site is a good solution.

The way Google Analytics samples works well for most sites. If you have a fairly straightforward site with a relatively small number of sources, and regular flow of traffic, then Google Analytics reports will give you a very accurate picture of your site’s performance. That’s probably enough for a smaller site.

For very big sites however, especially those that have a lot of rapidly rotating content and an elaborate traffic profile, the sampling methods that standard GA uses may not be adequate when you start to use segments, custom reports and secondary dimensions, and can start to throw up some inaccurate results.

We pulled the data from one of our larger e-commerce clients to demonstrate how big a problem this can be (data has been anonymised).

The ‘s’ columns show you the results from the sampled report, the 'us' columns show you the results when that same report was unsampled. The really interesting parts are the % difference columns. Those tell you the percentage by which the sampled reports were off.

Have a quick look, and you can see that the sampling is doing a reasonable job for some traffic sources, but with others it’s all over the place.

If you’re playing around with Google Analytics and you notice that you’re having results change as above when you start to customise reports, that’s often a pretty good sign that you’re experiencing the effects of heavy sampling in your account.

If Google Analytics ever needs to sample, there will be a yellow box in the top right of your Google Analytics report telling you so (it’ll say something like ‘this report is based on 249385 visit, 13.46% of visits’).

If the percentage sampled is lower than 15%, it could be too small to give you a true picture of what’s happening on your site. You can change the size of the sample (but there’s still an upper limit on the free Google Analytics).

In some ways, sampling is a nice problem to have – it means that your site is seeing high volumes of traffic. But if you need a more accurate picture there are two basic things you can do to improve the accuracy of your reports: invest in an industrial strength analytics package (like Google Analytics Premium) or use one of these workarounds:

Use only standard reports. GA doesn’t start to sample until you modify reports with segments, secondary dimensions, or use custom reporting, so if you can get away with only using standard reports, go for it.

If you need to look at just a slice of your data, create a filtered profile (for example, to show ‘email only’ traffic). This will show you unsampled data. But of course, you will need to pre-empt your need and set the profile up proactively.

If your traffic is fairly consistent, try looking at a smaller date range. The smaller the range, the more data within that range will be analysed. But be careful that you don’t pick such a small range that you miss things like seasonal trends. You can use this method to pull the data out into a spreadsheet and piece the data from multiple periods together, though you’ll still need to be careful with this because you’ll need to recalculate calculated metrics and your unique visitor counts won’t add up properly.

If none of those things work for you, you should probably upgrade to something like Google Analytics Premium. The most common reason among our clients for using Google Analytics Premium is to get more accurate data for big or complex sites (particularly e-commerce companies).

It gives you faster data processing (in practical terms you get reports every four hours instead of every 24), and has a data sampling limit of around 200 times higher that of Google Analytics (at the moment – that’s likely to increase), enough for a large ecommerce site.

Google Analytics is a great tool for small to medium websites and even many large sites operating in very stable environments, but a paid tool will give you the flexibility and speed required to gain a more accurate view of a site with heavier-volume data in fast-moving markets.

Comments (9)

Comment

Send me notifications of follow up comments

Save or Cancel

Nicholas Redding, Head of Web Analytics at William Hill Online

Hi Ben

That's a very interesting article, and it's good to see this limitation of the free Google Analytics highlighted. It can be a real problem when comparing small segments on a high-traffic site. Often the margin of error from sampling is greater than the difference shown between segments. Not ideal when you're comparing conversion rates!

One thing, though (and I'd love to be corrected on this). My understanding is that data sampling happens on the full data set, before any profile filters are applied. So I don't think setting up a filtered profile helps.

In a sense, you are correct. Sampling happens at the web property (not account) level so if you customise a report by adding a segment, filter etc the resulting dataset will be sampled based on the data at the web property level.

The aim of creating profiles is to eradicate the need for customising reports by pre-empting that need and using a pre-aggregated profile instead of custom report.

For example, if you commonly apply a segment to view mobile only traffic you could create a profile which only contains mobile traffic. Thereby removing the need to apply a segment and invoke sampling.

Of course it is impossible to anticipate every need like this so it can only be applicable to those customisations you use regularly.

Actually you are try to measure a lot of sources (variables) using sample so it's quite obvious that the results are inaccurate.
GA is sampling let's say the first xxx rows of data or rather a random xxx rows of data collected in a period so the result it will display is related to sample.
If you are going to measure two variables the on a sample of 1000 rows the result it will be obvius more accurate than measuring ten variables because the single variable sample is 5 times lower.
Much deeper you are trying to perform the analysis more inaccurate it will be.
It seems a oxymoron but is sadly true.

This is a very interesting comment. The limits of traffic sampling are easily reached when you have traffic and conversions that change with the time of the day or the location for example. Depending of the reports you run you could indeed easily mix carrots and apples.
There are alternatives that provide analysis on all the data. At CANDDi we provide a real time analytics that goes to the individual visitor level and allows full analysis on unsampled data in addition to real time intervention on the website based on the visitor.

about 4 years ago

Ander Jáuregui

Great Post...

Also you should check if the e Commerce Tracking code is correct and appears on every single "success" page, and verify if the time zone of your CRM and GA are the same... among other thousand factors that can cause these differences... and you most know that GA will improve the sampling %

I've been looking at this issue this week, it's very useful to see your direct comparison between full and sampled data. I think it's a shame that some data gets so heavily sampled, but how else are Google going to sell their Premium product than removing data from their free one. We can't expect everything for free and I'm quite excited about Premium.

I think Google holds a lot of information back from us quite frequently. If they didn't they wouldn't be as powerful as they are. I tend to use Omniture whenever Google Analytics doesn't give me the information I need.

@Anna, it's a common misconception that the sampling is a new thing post-premium. However it has been there for ages. Google have just made it more obvious when sampling is invoked so more people notice it.

@Marc - thanks for the comment, I'm always looking for examples of areas other tools win on. Care to share any of the uses you have for Omniture over GA?

Enjoying this article?

Get more just like this, delivered to your inbox.

Keep up to date with the latest analysis, inspiration and learning from the Econsultancy blog with our free Daily Pulse newsletter. Each weekday, you
ll receive a hand-picked digest of the latest and greatest articles, as well as snippets of new market data, best practice guides and trends research.