How to Remove Google Analytics Spam (and What NOT to Do)

Ghost spam is a problem for a lot of Google Analytics users. Website owners that aren’t taking steps to rectify the issue may be making decisions based on skewed data. What started as unusual spikes in referral traffic has evolved to include spam in direct, organic and referral traffic reports. A 2014 study by Incapsula found that 56% of website traffic is from bots – not good news. If that much of your traffic could be ghost spam, how do you identify it, and more importantly, how do you remove it from your data?

Removing Spam in Google Analytics

There are three types of Google Analytics spam, and each requires a different solution. Some methods are as easy as checking a box, but others require proper research before implementation. If not implemented correctly, you may inadvertently exclude valid traffic from your reports.

Problem #1: Bots and Spiders

There are some bots and spiders that are fairly simple to exclude from your reports. These are good, or “well-behaved” bots that you may knowingly have crawl your site. An example of a good bot would be Screaming Frog.

Solution: Update Your Analytics Settings

There is a setting in Google Analytics at the View level to “Exclude all hits from known bots and spiders.” All you need to do is check the box, but you must do this for every View you have setup.

Problem #2 : Ghost Referrals

Ghost referrals show up in your traffic reports, but these are not true site visits. These referrers never actually visit your site, but post fake hits to Google’s servers using your tracking ID. You cannot block these visits because they don’t actually visit your site.

Solution: Add Include Filters for Valid Hostnames

The best way to exclude this traffic from your Google Analytics account is to add a valid hostname filter to include only your valid traffic. The first step is to determine your list of valid hostnames. Valid hostnames will include all places you have placed your Analytics tracking code.

What NOT to Do

Don’t assume you have a simple analytics setup and only include your main website domain. You could exclude valid traffic if you miss a hostname. If you have traffic from hostname (not set), you need to investigate. This could be spam, but if it’s tied to revenue or conversion data, it may be a sign that your tracking code isn’t setup correctly. Check with your marketing and IT departments to ensure you don’t leave something out. Let’s say your email marketing team is using MailChimp and have integrated with Google Analytics to record traffic from newsletter archive and subscription preference pages. These pages are hosted by MailChimp, so the hostname is going to be different than your main domain.

Once you have a list of valid hostnames, create an include filter

For multiple hostnames, use a regular expression and separate with a pipe bar (|)

Don’t worry about adding subdomains; stick to the root level domain

Problem #3: Spam Web Crawlers

While ghost spam doesn’t actually visit your site, some spam web crawlers do. Never visit a site that you believe to be spam because you could get a virus.

Solution: Add Exclude Filters for Spam Referrals

The valid hostname filter will not work for this type of spam. Instead, you’ll need to create exclude filters. To verify that a site is indeed spam, Google it. You can usually find others who have identified the site as spam just by skimming the first page of search results.

Segmenting Spam from Historical Reports

All of the methods outlined above will only affect your Google Analytics reports after implementation. But you wouldn’t be reading this article if some of your data wasn’t already compromised, so we also need to talk about updating your historical reports. The best way to do this is by using a Segment.

Create and name a new segment

Navigate to Advanced > Conditions

Include your valid hostnames

Exclude your spam referral sources

A Note on Unfiltered and Test Views in Analytics

Once you filter data out of Google Analytics, it cannot be restored. Best practice is to always have a View of unfiltered raw data in your analytics profile. It’s also a great idea to have a Test View in place to experiment with filters and see how they impact your data before altering your main data View.

Why Do We Have Spam in the First Place?

Why is spam even an issue? Some offenders are trying to get you to visit their site. Some are bots that are scraping your site for content. Other are just trying to mess with us and make us pull our hair out.

Wouldn’t it be a lot easier on everyone if Google had a solution in place for this? Good news is that they are working on it. When it will be solved is unknown, so for now we’re forced to follow these steps and continually update our filters to ensure accurate data.