Tagged with "wordpress"https://www.addedbytes.com/feeds/tag-feed/
enWeb Development in Brighton - Added Bytes 2006120Block Referrer Spam (Updated)https://www.addedbytes.com/blog/block-referrer-spam/
Log files are a useful tool for webmasters. It helps to know how people are finding your site, and what software they are using to view it, among other things. A strange decision by a small group of bloggers, though, has given unscrupulous marketers another window of opportunity to manipulate search engines to increase their traffic.

The decision made by these short-sighted bloggers was to display, on their site, a list of recent referrers to each page. I can't imagine any reason why a visitor might be in the least bit interested in seeing this, but nevertheless many sites now display referrers on every page.

As search engine spiders visit sites, they grab the contents of each page they visit. They use this snapshot in their index - meaning that although a page may change every minute or two, a search engine may be using a single copy of a page for several days, or even weeks.

So a referral URL that is on a page when the spiders come to visit can have quite a bit of value, if the search engine visiting uses link popularity in any way (Google uses link popularity, as do many others).

So marketers have started to use programs to visit pages using a fake referral header, to get their URLs listed on as many sites as possible, in the hopes that this will increase their traffic.

However, this renders log files almost completely useless. These fake visitors usually visit from search engines, having searched for a keyphrase relevant to their own site. They skew statistics relating to number of visitors received, the countries used to visit, the technology used to view the page, how users found the page, how long they spent on the site ... and so on.

A webmaster may find their search engine rankings dropping because of this, and they may find search engines have removed them completely. Many sites that use spam techniques are quickly identified and penalised, and penalties will often be applied to sites that link to them as well.

There are plenty of techniques available for blocking referrer spam, and everyone has their favourite. Personally, I use a combination of two techniques.

The first is fairly simple - my referrer log is not indexable. I don't display referrers on the pages of my site. My referral log is publicly available, but search engines are instructed to ignore it. This removes the main incentive for people to referrer-spam my site (the other reason for this type of spam - the hope that the site owner will themselves visit the spamming URL - is less common, because it has such a low response rate).

Second, I use an .htaccess file to block requests from whatever I've managed to identify as either a crawler designed to find URLs to spam or a spamming URL. This is a relatively simple blacklist, and though it cannot work as a long term solution to this problem, it keeps me happy for now.

To implement this technique on your own site, first make sure you are running Apache with mod_rewrite. If you are, create a file called ".htaccess" (just that, not .htaccess.txt or anything else) and paste the following into it:

Update: 14th September 2005

The list below has been expanded substantially over the last year, and now covers much more spam than before. As stated before, this is not a practical solution to the problem in the long term, as this list can only ever get longer and longer, and may become unmaintainable, or even (eventually) slow a site to a crawl as all the rules are processed. However, as of now, it is still a useful tool.

The above will block just about all of the most common referral spam that I've seen so far. I'm adding to the list constantly (last addition: 14th September 2005) so do check back and see if there are updates if you're using it.

One potential problem with this technique, other than that it will, in time, become useless as too many URLs are added, is that there is always a possibility authentic visitors will be blocked. So, on this site, instead of the last line above, I've actually used something a little more user-friendly:

And there we have it. With minimum effort (for now), referral log spamming in my site has been almost entirely removed. Before adding this set of rules and scripts, I was seeing around 200 fake referrals per day in my log files. Now, I see about 3 or 4 a week. Hopefully, this will continue until I can devise a better way of protecting against this kind of problem - before blacklists become an impossibility to manage.