For the uninitiated, in teh language of teh Web, a referrer is the online resource from whence a visitor happened to arrive at your site. For example, if Johnny the Wonder Parrot was visiting the Mainstream Media website and happened to follow a link to your site (of all places), you would look at your access logs, notice Johnny’s visit, and speak out loud (slowly): “hmmm.. it looks like the Mainstream Media website referred my good pal Johnny to my Alka-Seltzer sales page.” In such a bizarre case, the Mainstream Media website — or specific page — is referred to as (no pun intended) the referrer.

Sounds like a totally radical concept, right? I mean, who doesn’t want other sites sending them traffic? Not many, of course, unless the referrals are in actuality a type of spam known as, well, referrer spam. Eh? Referrer spam, you say? How does that work? Well, I’m so glad you asked. Allow me to explain..

Referrer spam is actually a barrage of URI requests from a fake referrer. Just imagine some pathetic dillweed out there, sitting alone in his bedroom, running a borrowed script that does something like this:

targets your site from some randomly generated hitlist

begins making hundreds of URI requests for random pages on your site

leaves fake referrer information for each request, claiming to have arrived by way of “harrypotterdogpanties.net”

continues making hundreds of requests with the fake referrer information

ad nauseaum

ad nauseaum

ad nauseaum

In the process of doing this, the spammer is draining your resources, consuming your bandwidth, decreasing your site’s performance, and clogging your access and error logs with hundreds or thousands of bogus requests. This in turn may skew or obscure accurate statistical information and result in additional service charges and other headaches. In other words, referrer spam sucks donkey dong.

For the spammer, referrer spam pays off because it serves as a cheap way to get garbage spam sites to rank in the search engines. This technique is also referred to as “spamdexing,” which refers to spamming that is directed at the search engines. By artificially accessing your site via their fake spammy web pages, referrer spammers effectively populate your server’s access logs with hundreds of links back to their stinky spam site.

The actual payoff occurs as a percentage of spammed sites publicizes their access logs on the Web. This may not sound like much, but with a free, easily accessible referrer-spam script, referrer spammers can hit hundreds of thousands of sites. If even a tiny fraction of these sites publicizes their access logs, the number of links back to the spam site can be significant.

Unfortunately, there aren’t many options for stopping this sort of nonsense. Referrer spammers are targeting actual resources, so blocking malicious request strings is not an option. We could block individual IP addresses or even user-agents, but that also would be futile because of the easily faked nature of such variables.

So how do we keep these armpits from hitting our sites? Easy. Blacklist the fake referrer sites themselves. And fortunately, there are many resources on the Web for obtaining extensive lists of spammy referrers. Including this one. Below you will find the convergence of two excellent lists of spammy referrers: one containing 276 referrers and another containing 7998 referrers(404 link removed 2013/01/13). This is well over 8000 referrers, so please use these lists wisely, according to your own well-formulated security strategy.

Note and disclaimer: these lists are provided “as-is” and with no guarantee of anything. If you decide to implement these lists, please be advised that I probably won’t have time to troubleshoot requests and diagnose issues. For the most part, I am providing these lists as a sort of novelty, and suggest that you build your own referrer blacklist based on your actual access logs. I do not recommend simply copying and pasting either of these lists in wholesale format. Hopefully they will serve as a comparative resource and as examples of potentially useful blacklisting accomplishments.

@Jonathan: Yes, this time I have set Requiem as the default theme. I’ve been having some search-engine crawl issues that I can’t seem to pin down. After trying everything else, I decided to see if it was the Quintessential theme that was causing the issue. After a few weeks I should know for certain and will restore good ‘ol Quint if the problem lies elsewhere.

Correct, placing such blacklists in the root .htaccess file means that their rules are applied to the entire site. If, on the other hand, you place the rules in the .htaccess file of a subdirectory, then its rules will only be applied to all subdirectories of the subdirectory, and thus any directories above the subdirectory (such as root) will not be protected. Also, this is the 4G Blacklist, make sure to use the more current 5G instead:

Books

Links

About the site

Perishable Press is the work of Jeff Starr, professional developer, designer, author, and publisher with over 10 years of experience. Check out some of Jeff's books and projects, follow on Twitter, or learn more »

Fun fact: Perishable Press has been online since 2005, and now features over 700 articles and more than 11,000 comments. More stats »