How to Filter Referral Spam in Google Analytics

Editor’s note: “Ask an SEO” is a weekly column by technical SEO expert Jenny Halasz. Come up with your hardest SEO question and fill out our form. You might see your answer in the next #AskanSEO post!

This week’s question is from Kenneth F., and it’s one of the most common analytics questions I hear.

How can I filter out spammy referral traffic to my site? I heard Google started filtering them out but still see them. –Kenneth F. via Twitter

If you’re just getting started, I suggest checking out the Google Analytics Solutions Gallery and search for referrer spam or block bots or similar. You’ll discover some great resources there.

But the real answer to this question is a lot bigger, and it has many parts:

Understand what you’re dealing with. It’s not just bots.

Filter wisely: Set up a separate view.

Block Bots in Analytics.

Discover referrers manually.

Create a “bad referrer” filter.

Block bad bots from the website. Carefully.

1. Bots Aren’t Bad, They’re Just Drawn That Way

Not all bots are bad.

Many, like Googlebot and Bingbot, make our search world go ‘round. Plenty of other bots belong to companies like Screaming Frog, Deep Crawl, and SpyFu. These bots are respectful (not dangerous) to the sites they crawl, and not bad for your visitors.

The bots you want to block are those that seek to hijack your traffic, find loopholes in your CMS to exploit for hacking, and scrape your content for their own nefarious purposes. Depending on what industry you are in, some forms of bot traffic may be worse than others.

But it’s not only bots you should be worried about. There are plenty of referral sources that send lots of traffic your way that you may not want to muddy the waters of your data.

2. Filter Analytics Traffic Wisely

When you’re just getting started, you should be fully aware of what is being taken out of your data set. To understand that, you have to compare.

What I recommend to clients is creating a separate view in Analytics and naming it something like “bot traffic filtered”. To do this:

Click on “Admin”.

Then in the right column under “View”, click on the drop down menu.

Select “Create new view”.

On the next screen, be sure you set your time zone to what’s appropriate; Google defaults to Pacific Time. If you forget this step, you won’t be comparing apples to apples in your new view.

Creating a new view in Google Analytics

3. Block Bots in Analytics

Google gives you an “easy button” to block known bots. This will eliminate 75 to 80 percent of your work (vs. doing this manually) and it’s regularly updated as Google finds new bots.

For your new view only, select the “view settings” option and click the checkbox to “Exclude all hits from known bots and spiders” as shown below:

How to Filter Bots in Google Analytics

This will give you a clear picture about what’s going to happen to your traffic once you turn on the bot filtering. You can make sure that none of your important traffic sources are in Google’s known list of bots (they make mistakes occasionally) and you’ll be able to prepare other people who view your analytics for the change if/when you decide to roll it out to the main profile.

If/when you decide to roll it out to the main profile, help yourself and everyone else out by adding an annotation to explain any changes. For example: “Started Filtering Bot Traffic”.

To add an annotation, simply click on the little arrow under any analytics chart in Google Analytics and follow the simple instructions:

Creating an annotation in Google Analytics

4. Add Spam Referrers Manually

No matter how good Google’s bot filtering system gets, there will inevitably be other referrers that send high volumes of low or no quality traffic to your site.

To spot these, open the referrer report in Google Analytics:

Viewing referral traffic in Google Analytics

Sort the data descending by bounce rate, so you bring the 100 percent bounce rate to the top.

Finally, filter the data by using the advanced filter to only show a number of sessions over a certain threshold. This will vary according to your traffic volume; I used 50 for this example.

Now you can scroll through the list and find sites you may want to add to your referral exclusion list. I say “may” because you need to check with other stakeholders in your company to make sure none of these are just a failed advertising attempt. This is another reason why you should test this in a separate view first.

Once you have your list of sites to filter, cut them down to just the main TLD (top-level domain). For example, af401e8c.linkbabes.com is probably a specific affiliate of linkbabes.com. So it’s better to just add linkbabes.com to your potential referral exclusion list.

By the way, this isn’t for the faint of heart. You may find some risqué websites in these lists. I strongly recommend you do not visit any of them to “check them out” or you may find yourself the recipient of some unwanted malware or spyware.

Once your list is fully vetted and you’re sure you won’t be blocking any important traffic that someone else in your organization wants to see, go ahead and create a custom referrer filter.

5. Create a Bad Referrer Filter

Once you have a list of bad referrers that you want to block, create a new filter in the view you set up earlier specifically for “bad referrers”. Be sure to do this in the view screen (the one on the far right under admin) and not at the account level!

To set up the filter:

Select “Admin”.

Under “View”, select “Filter”.

Click on “Add Filter” and give the filter a name.

Click on “Custom” and “Exclude”.

Select “Campaign Source” as your “Filter Field” and enter the domains you want to exclude in the box.

As Carlos Escalera of Ohow.co explains, if you use “Referral” the filter won’t have an effect on spam since those referrals usually don’t have the HTTP Referer value. If you use “Campaign Source” the spam traffic will be filtered out no matter if the referral has the HTTP value.

Do this in a notepad or Word doc first and then paste it in; it’s too easy to mess something up using this tiny little box.

To enter multiple domains, use regular expressions. Use the “/” to escape (make it function as text) the “.” in “.com”, and separate multiple domains with a pipe bar “|”.

Be sure and test your filter and update it frequently as you find new domains to exclude.

6. Block Bad Bots from Your Website

This one isn’t for the beginners. It involves using .htaccess or web config in IIS, which is the backbone of your entire site.

One wrong character can bring your entire site down. So make a backup copy, make sure you have access directly to your server (through WordPress doesn’t count) and tread lightly and carefully.

Disclaimer aside, the .htaccess file is a powerful tool at your disposal, because for bad or high volume bot traffic, you can block it from accessing your server entirely. The command to use is

Rewrite Engine On

Options +FollowSymlinks

Deny from 123.45.67.89

Allow from all

You will have to integrate this code into your existing .htaccess file, so don’t just copy/paste it. Remember, one wrong character, and it’s lights out.

This is an effective way to block bot traffic that is placing a high load on your server, but it shouldn’t be used for just anyone. The longer this list gets, the more load it puts on your server, and the more it can actually slow your site down.

So don’t use it to block former employees (go ahead and laugh, this actually happened to me!) and remember that IP addresses change. If you’re having a serious security issue, contact your web host or system administrator for help.

The effect of blocking bad bot traffic at the server level is two-fold. It will help reduce load on your server, and it will also take these visits out of Analytics, because the traffic will never resolve to your website.

Key Takeaways

Bots and referrers are different, but have the same effects: slowing down your server and muddying your analytics data.

You can block bots and referrers by IP address or by top-level domain depending on what blocking solution you choose.

You can block bots and referrers in .htaccess or web config, or you can filter out their traffic in analytics either with Google tools or with a custom filter.

Be careful about what you filter and make sure other stakeholders know what you’re up to. Don’t filter at the account level; you always want one view that has all traffic just in case.

Annotate, label, and inform as much as possible about changes you make and the dates that you make them.