Are ghost referrals haunting your analytics data? Are spam referrers causing you to miscalculate the performance of your referral traffic? Fear not, for you are not alone. If you’re tracking your website traffic with an analytics tool, you are pretty much guaranteed to end up with spammy/fake traffic in your monthly analytics reports. In this article we’ll learn more about ghost referrals, how they affect your analytics data, and how to filter spam traffic out of your analytics data and monthly reports.

What are Ghost Referrals?

Ghost referrals are visits to your website that technically do not exist. These referrals will show up in your Google Analytics reports, even though these visits aren’t real. It’s important to identify these visits, remove them from your reports, and develop a strategy for dealing with these ghost referrals before they haunt your monthly reports for years to come and cause you to make erroneous content decisions.

Why do Ghost Referrals Exist?

Ghost referrals are a form of spam that comes from spam referrers. Spam referrers are people, websites or services that send unwanted traffic to websites. This unwanted traffic is blasted out in volume in an attempt to help sell a product or service. The response rate for this tactic is typically low because the message is poorly crafted and the position and targeting are too broad. As a result, spammers use this strategy to broadcast their message to as many people as possible.

What does this have to do with ghost referrals? One method for spamming a massive number of site owners is by placing their “ads” somewhere every site owner will look: their analytics data. Ghost referrals use the Google Analytics Measurement Protocol to send large volumes of fake hits to websites using analytics. I won’t go into much detail on how this accomplished, but in a nutshell spammers send visits to random analytics account IDs. There are a lot of methods for harvesting lists of analytics IDs, but it’s just as easy to guess. Think of ghost referrals as prizes from really crappy lotteries.

The “why” behind ghost referrals is shockingly simple. These spammers are trying to sell you junk. In their ideal scenario a site owner sees a visit from a URL like “premium-junk-lol.com” in their analytics data. The site owner becomes curious, and decides to visit the premium junk purveyor, becomes interested in said junk, and ultimately buys some premium junk of their own. The best thing that happens is the site owner purchases snake oil and nothing happens. The worst thing that happens is the site owner visits the site and ends up with a trojan or other malware. Key takeaway: do not visit the ghosts!

How do Ghost Referrals Affect You?

The effects of ghost referrals are subtle, so it’s easy to overlook how they affect your website or business. These fake visits pollute your analytics data, specifically your referral data. Most ghost referrals produce a session with a single pageview and a one-second session duration. This distorts your analytics data on referral traffic by showing a bounce rate that is higher than reality and a session duration that is shorter than reality. As a business owner or marketer, you likely use multi-channel reports to compare how different sources of traffic perform. If ghost referrals are haunting your data, then your comparison, analysis, and recommended strategy are all flawed.

I’ve seen polluted referral data lead to a lot of flawed approaches. The most common misstep like this I’ve seen is business owners or marketers cutting budgets for social media and content marketing and then seeing a significant drop in conversion volume. That said, I understand how easy it is to be the victim of this sort of deception. Large volumes of ghost referrals tank performance on your referral traffic and make the channel appear worthless. In light of that sort of observation it makes sense to shift budget to more effective channels/campaigns. Once you’ve shifted budgets, the damage is done, and it can be very difficult to figure out where things went wrong.

Remove Ghost Referrals & Clean Up Your Analytics Data

Common Myths & Mistakes

If the last part scared you, there’s no need to run–I have some good news. Ghost referrals are, for the most part, completely harmless. In this situation, the only thing to fear is fear itself. First, let’s dispel some common myths:

Ghost Referrals Do Not Hurt Organic Rankings – contrary to popular belief, spam referrals will not hurt your search engine rankings. Google does not use spam referrals as a ranking factor, and is fully aware of this problem. In fact, they even include a feature in Google Analytics to help filter spam (more on this later)

Ghost Referrals Do Not Mean Your Site is Hacked – this kind of spam is not a sign that your site has been hacked. It’s important to keep in mind that this traffic really doesn’t exist. Users aren’t really flooding to your website from an online seller of floating share buttons. The whole point of the referral is to get you to notice their URL, and visit their site.

Using .htaccess to Block Ghost Referrals Does Not Work – you may run across a few articles suggesting you block ghost referrals by modifying your .htaccess file. Unfortunately, this method will not work. Since these visits never actually hit your site, the traffic will never interact with your .htaccess file. The only thing this might accomplish is blocking spammy web crawlers, but the risks outweigh the rewards. I do not recommend using your .htaccess file in this way.

Do Not Use Referral Exclusions – some articles may recommend using the Referral Exclusions feature in Google Analytics. This will almost always create a dumpster fire out of your analytics data. Referral Exclusions are designed for sites with interactions that pass between servers like a 3rd party shopping cart. This feature is designed to make sure sites that take users from domain to an offsite cart/checkout and then back to the site do not record as two sessions (pre-sale session and post-sale session). I recommend using this feature as intended, and avoiding any methods to hack it into a filtering tool.

There is No “Set It & Forget It” Solution – there’s no easy fix for this problem. This issue has been developing over the last few years, and if anything has actually gotten worse. As a result, you must regularly monitor your traffic and update the filters discussed below.

Start With a Strong Analytics Foundation

Before we start ghost busting, we need to have a good analytics foundation in place. Google Analytics records all of a user’s pageviews and interactions into sessions. Once these sessions are recorded, they’re stored and available for you to analyze in your analytics dashboard. A good analytics strategy requires constant upkeep and grooming of your data. This means you’ll make changes to your Google Analytics views in order to improve the accuracy and quality of your reports. This brings us to rule number 1:

Always Keep an Unfiltered View

If you’ve only created one view in your analytics account, you probably already have an unfiltered view. This view should not have any filters, exclusions, etc. whatsoever. It’s important to keep one view unfiltered so you have a control group to test filters as well as a backup of all of your website data.

Create a Reporting View

Next, create a reporting view. This will be your main view for analysis and reporting. This is also where you’ll install filters after you’re finished testing them.

Create a Testing View

Finally, create a testing view in Google Analytics. This is where you’ll create, test, and optimize your filters and customizations before rolling them out to your reporting view.

Bonus Analytics Lifehack

You may have noticed that the user ID on each analytics property you create has a dash and number at the end. That number increases by one each time you create a new property. Most analytics accounts have three or fewer properties, so ghost referrals only visit the first three properties in an analytics account. If you’re creating a new site, trying creating a few dummy properties until you get to property number five and use this one as your first real Google Analytics Property. This little trick can greatly reduce the amount of ghost referrals to your site in the first place.

That said, if you already have historical data in your first view, I recommend sticking with that property and simply updating the filters below to keep your data clean. If you already have historical data then creating a new property will cause more problems than it solves.

Enable Google Analytics Bot Filtering

Google has been monitoring this problem with ghost referrals for the past few years. As a result, they developed and rolled out a feature to help you filter bots and spam from your analytics data. To enable this feature simply follow these steps:

Go to Google Analytics and navigate to the Admin tab.

Select a “view” in the drop-down on the far-right.

Click on “view settings,” below the drop-down menu.

Scroll down and check the box that says “Exclude all hits from known bots and spiders”

This feature won’t remove all of the ghost referrals, but it will help. I recommend enabling this on every view except your unfiltered view.

Create an “Valid Hostname Filter” in Google Analytics

The easiest ghost referrals to remove use invalid hostnames. When a link is placed on a page that points to your site, the hostname will always be your site. The least sophisticated ghost referrals have a blank or mismatched hostname. These are easy to remove from Google Analytics by installing a “valid hostname filter.” To install this filter, do the following:

Go to the Admin tab in Google Analytics, select a view, and click “Filters”

Click the red button to add a new filter, and switch filter type to “custom.”

Name your filter and replace .*example\.com.* with your domain. For instance, chow-bryant.com becomes .*chow\-bryant\.com.*. This kind of code is known as RegEx. If you want to learn more about RegEx I recommend checking out RegExr.

Create a List of Valid Hostnames for Your Site

Next, you need to create a list of valid hostnames. For most websites, this will just be your domain name. Other sites with 3rd party applications or shopping carts will have a few valid hostnames other than their domains. Making a list of valid hostnames should not be difficult. Valid hostnames are sites where you control and place links or code that interact with your site. For instance, I cannot place code on Huffington Post’s domain nor do I have any control over their site. As a result, if I see “Huffington Post” as a hostname in my referral traffic, I know the hostname is spoofed and the visit is a ghost referral.

To view your referral hostnames simply go to the reporting view in Google Analytics and navigate to your Referrals Report (Acquisition –> All Traffic –> Referrals). The default primary dimension is set to “Source.” You can either change the primary dimension to “hostname” or add it as a secondary dimension. Check the hostnames for valid sites where you’ve actually placed code.

If you’re having problems with this step you should consult an expert. Leaving behind a few ghost referrals will still pollute your data, but we’re going to discuss some methods for dealing with the stragglers below. However, removing a valid hostname in the next filter can result in blocking legitimate traffic from your reports. If this traffic was from your 3rd party shopping cart, you will have a very difficult time reporting on shopping cart performance on the next reporting cycle. This is why I strongly recommend starting with your test view, and then rolling out to your reporting view.

Remove Sessions with Spammy Campaign Sources

Some ghost referrals are more sophisticated and spoof the hostname so it looks like it’s valid. These won’t be caught by our last filter, so we’ll have to try another method. and create some more RegEx filters. You can try building the RegEx conditions for this filter yourself using your list of valid hostnames. Simply export a referral report from the previous year and remove all of your valid hostnames. The remaining list are ghost referrals. If you don’t feel comfortable working with RegEx, then copy/pasta the snippets below the instructions and screenshot. To implement the spammy campaign source filter follow these steps:

Go to the Admin tab in Google Analytics, select a view, and click “Filters”

Click the red button to add a new filter, and switch filter type to “custom.”

Set the filter to exclude traffic and use “campaign source” as the filter field.

Finally, paste in your filter pattern and hit save.

There’s a limit to the max length on the filter pattern of 255 characters. There are too many ghost referrals to fit into one filter pattern, so you’ll need to do this step six times if you use the filter patterns posted below:

Spammy Campaign Sources – 6

Filter Out Sessions from Spam Crawlers

Some ghost referrals come from spammers that also have web crawlers (one method for harvesting contact info and analytics IDs). These visits are highly problematic because they really do visit your site, but tank performance metrics just like ghost referrals. They still won’t harm your rankings, but they will pollute your data. We’ll remove these using RegEx filter patterns just like the last filter. While these technically use the same method, I recommend keeping separate filters for ghost referrals and spammy web crawlers. This will help you test your filters and also keep track of how each spammer’s methods are evolving. Over time this kind of information will help you predict a spammer’s actions, which is invaluable if you want to maintain clean data.

There are a lot of spam crawlers out there, so for this example we’re just going to start with one of the most persistent and annoying crawlers: semalt. To implement a spammy web crawlers filter simply do the follow:

Go to the Admin tab in Google Analytics, select a view, and click “Filters”

Click the red button to add a new filter, and switch filter type to “custom.”

Set the filter to exclude traffic and use “campaign source” as the filter field.

Finally, paste in your filter pattern and hit save.

Fix Your Historical Analytics Data

Convert Filters to Custom User Segments

We learned how to create filters to remove traffic from your future reports, but what about historical reports? Luckily, you can create Custom User Segments in Google Analytics to solve this problem. This will allow you to convert your historical reports to reflect your new filters. The results will be valid year-to-year comparisons without data distortions from ghost referrals.

The best part about these custom user segments is we’ve already done all of the heavy lifting by creating the filter patterns in the previous two filters. Now all we have to do is setup a Custom User Segment to work with our RegEx filter pattern. To accomplish this simply:

Click the red button to add a new segment, and in the advanced section select “Conditions”

Set the filter to exclude sessions with a campaign source that matches your RegEx filter patterns.

Add your filter patterns from the Spam Campaign Sources Filters and save your new segment. Note: the filter pattern on custom user segments also has a max limit of 800 characters. As you update your filter it will grow beyond 800 characters. When this happens, simply click the “or” box to the right of your filter pattern and start a new expression.

Once you save this filter, it should be usable on every Property and View in your Analytics account. This means you only have to update this custom user segment once, and it works for every property/client in your portfolio!

What Happens After Installing the Filters?

Once these filters start working, you’ll start seeing what your referral traffic really looks like. For some sites, this may mean a decrease in overall volume, but also a significant increase in performance and quality. If you’re managing analytics for a client, or must report to a supervisor, this entire project requires an extra step. Give your client or supervisor a heads up on this project. Explain what you are doing, why, and how it will help the business accomplish its goals.

Next, apply your filters to this month’s data using the custom user segment you created. This will help you estimate the drop in volume on referral traffic. I highly recommend telling your client or supervisor about this, as well. Be completely transparent, and show them a simulation of how the data will look moving forward. This will help you avoid any surprises on the next reporting cycle. If a client or supervisor is used to seeing a certain volume of referral traffic and it suddenly drops off, they are likely to have some questions. Avoid this dilemma by arming them with the right information before rolling out ghost referral filters.

Keeping Your Data Clean

Update Your Spam Filters and Custom User Segments Regularly

Spammers are incredibly resourceful and tend to updated their methods quickly. Once a spammer sees evidence that their ghost referrals are being filtered they typically change the hostname and are back to the races–some of them even update their hostnames daily.This means you’ll need to regularly analyze and filter your data to make sure it stays clean.

Come up with a strategy for dealing with the evolving ghost referrals. Put together a simple process and a schedule that allows you keep up with this project without letting other things slip. I recommend cleaning up your analytics data the week before every reporting cycle.

How to Automate Updating Spam Filters and Custom Segments

If you manage a lot of properties, or your marketing team is really small you may want consider automating this process. There are a lot of methods for accomplishing this, but the most common follows these basic steps:

Script to detect and record new referral names in Google Analytics

Write application to post new referral names to a queue for moderation

Once referral names are clear or black-listed, write an application to convert black-listed hostnames into regex

Hire an Analytics Consultant to Fix Your Analytics Data

If you’re having a hard time keeping up with ghost referrals and spam, it may be time to outsource or look for help. If you’re look for an agency to help clean up your analytics data feel free to reach out, and we’ll be happy to get started cleaning.