Improving data quality in Google Analytics

This article is a brief run through some of the steps that we take on the Google Analytics accounts we work with to tidy up the data and make analysis that little bit easier.

I’ll cover:

Tidying up content reports

Tidying up referrals

Reducing upper/lowercase inconsistencies

Reducing unwanted referrals

…and I’ll end with some final (important) notes.

One thing to note - this article is aimed at people who work in the cultural sector, at museums and performing arts organisations. However, many of the points apply more broadly too.

1. Tidying up content reports

Excluding query parameters

If you go to the ‘All Pages’ report (under Behaviour, Site Content) and put a ? into the search box, you may see a list of URLs with all sorts of additional stuff added on the end. Some of this will be useful - often for showing which filters have been used on a listings page, or to show pagination.

However, ID numbers added to URLs by email service providers (Mailchimp, Dotmailer, etc), social media management tools (Hootsuite and so on), or Facebook Ads are unlikely to help you analyse your content. Instead, you end up with metrics for the same URL spread over several rows.

Happily, Google Analytics allows us to remove these query parameters.

In the admin area, go to View Settings and paste the following list into the ‘Exclude URL Query Parameters’ box:

If you see anything else in your own reports that you’d like to remove then you can add it to the list.

2. Tidying up referrals

Google Analytics will take a traffic source and categorise it as best as it can. However, the list of rules isn’t exhaustive and you may see things that are categorised as referrals that you would rather were categorised another way.

Recategorise search engine referrals

There are two ways of dealing with search engines that show up in your referrals.

The first is to add them to the ‘Organic Search Sources’ list. Go to the admin area and then click ‘Tracking Info’ to find this.

You can then add to GA’s list of default search engines. Here’s some to get you started…

Some search engines can’t be dealt with in this way. For those, we recommend using a filter.

Head back to the admin page and, in the right-hand column, click on Filters. Click on ‘+Add Filter’ and copy our example here…

In the ‘Campaign Source’ field, put the following:

^duckduckgo\.com|searchlock\.com|uk\.search\.com|search\.aol\.com$

You can add additional ones to this list.

Recategorise webmail referrals

Google Analytics treats traffic from webmail providers such as Gmail, Hotmail, and Yahoo Mail as referrals. I’d rather categorise them as email.

Again, this list can be added to if you find anything else in your referrals.

Group together referrals from Facebook

You may also notice that traffic from Facebook shows up under several names. To make this easier to understand, we group these together using another filter. This one looks like this…

The full bit to enter under ‘Search String’ is as follows:

^(en\-gb|m|lm|mobile|l|web|touch)\.facebook.com$

You can do the same with Instagram too.

3. Reducing upper/lowercase inconsistencies

Google Analytics reports are case-sensitive. If you have the same word written in upper and lowercase, these will appear on separate rows, meaning you often have to add them together.

This can be annoying, so to cut down on this, we use filters to ensure that the following dimensions (which could potentially involve human input) always show as lowercase in Google Analytics:

Request URI

Search Term

Campaign Source

Campaign Medium

Campaign Name

You could also do this with Event Category, Event Action, and Event Label.

Here’s an example of one of these filters…

4. Reducing unwanted referrals

As part of their user journey on your website, a person might sometimes be directed to another external domain. This would be the case if you’re using:

payment providers, such as PayPal, SagePay, or WorldPay

waiting room solutions, such as Queue-It

online shop or ticketing providers, such as Shopify or Spektrix

if these domains are showing up in your referrals they are probably taking all the credit for any transactions, harming any analysis of which channels and campaigns are actually sending you valuable traffic.

You can tell Google Analytics to ignore these types of third-party domains by adding them to the ‘Referral Exclusion List’. You’ll find this in the admin area. Click ‘Tracking Info’ in the middle column and you should see it. Here’s an example…

You may also see some voucher code websites showing up here. Some of these use (what I’ll diplomatically call) aggressive tactics to attract users. If you’re not using vouchers and coupons in this way and want to ignore traffic from these domains then you can add them to the list.

Please note: If you’re seeing referrals from your own website or ticketing providers then there might be a bigger problem to address and we’d recommend you get in touch with us to fix that.

A few final notes

Data quality isn’t a one-and-done type of job. It’s like weeding a garden and is something that’s best done little and often.

Also, this article is intended as a starting point. It’s likely that you will have slightly different requirements for making your own data easier to work with, so please build on the suggestions above.

We also recommend using a degree of caution - especially when setting up filters. We always recommend setting up a View that has no filters applied so you can always refer back to the raw data. Just in case.

You may want to set up a new View to use for testing filters. After a few days, if it looks like they’re working as intended then you can replicate them in your main reporting View.