Site Errors

All content on this site has recently been emergency moved to WordPress, after havin run on Movable Type for eight years. This means that content might be broken or missing. Restoring content is currently in progress, but content might still be missing for some time

Referrer spam: rel=”nofollow” doesn’t work

Like everybody else, I get horded by referrer spam. In sheer numbers, referer spam is much, much worse than any comment spam attack I’ve ever seen.
And, rel="nofollow", the search engines intended fix does not work. It does not work against weblog spam, and it most certainly does not work against referrer spamming.

h3. Stat tools
There are a number of excellent tools for generating visitor statistics.
* “AWStats”:http://awstats.sourceforge.net/
* “Webalizer”:http://www.mrunix.net/webalizer/
The problem is: people start using these tools. And that is all they do: They don’t keep these tools up to date, they don’t change the config, and they don’t limit access to them.
Which is why spammers love these tools even more than the actual users do: They’re excellent vectors for referrer spam attacks. Guess what: None of these tools use @rel=”nofollow”@. And they all display referrers. And the installed base is not very likely to be patched either.
Searching for “AWStats installations”:http://www.google.com/search?q=allinurl:awstats.pl and “Webalizer installations”:http://www.google.com/search?q=%22Generated+by+Webalizer+Version%22 reveals that hundreds of thousands of these stat pages have made it into Google. Which means they’re free-for-all link farms for Blackhat SEOs.
Since spammers are write-only, they can’t be bothered actually searching for attack vectors: Instead they brute-force their way through the Internet, setting up simple shell scripts that collect URLs in documents, spam them and move on. In these days of relatively fast connections, this is much faster than actually searching for vulnerable installations.
Which is why this should not be the responsibility of users. Recognizing an AWStats or Webalizer installation programmatically is trivial.
At the very least, search engine vendors should treat all links on these pages as they all had @rel=”nofollow”@ set. Ideally though, these search engine vendors should simply _drop known statistics tools from their indexes._ Apart from a small group of spam and security researchers, crackers and refererrer spammers, these pages aren’t useful on the public web.
If Google and others prevented the latter two groups from getting their kicks, the “researchers” would hardly need to research.
So, Google, MSN Search, Yahoo! and other search engine vendors: Could you please drop useless stats pages from your indexes, and be as vocal about it as you were about the useless rel="nofollow"? While it might drop that Searching 8,058,044,651 web pages number by a few hundred thousand to a million, it will leave search engine results more useful, and you will spare site owners a lot of agony.
Please.

9 Comments

While I can see the benefits of the SEs dropping stat pages from their results, I question the site owner’s own responsibility in locking off their own pages. If you are technically able to install AWSTATS or Webalizer (considering the number of help requests I’ve seen that’s debatable), you should be more than adept at putting
bc.
into the resulting report pages. You can also disallow all user-agents from going to the directories in which these programs reside by the simple use of a robots.txt file.
I agree the multitude of referrer stat pages that come up everytime I try to find a spammer is a huge problem, but while the idea of the Search Engines fixing it is good, site owners must take responsibility as well. If you put something up online, your responsibility to maintain it is paramount.

Hi, I think Google does handle this software correctly; the pages effectively get a nofollow even though the software doesn’t yet.
I don’t think we should drop these pages from our index entirely, though. Lots of people get helpful information from Google in ways that are hard to foresee, and we try to provide as comprehensive a copy of the web as we can.
One thing that *does* make sense is to make it harder for spammers to scrape Google, because to really mine for pages to spam, you often often have to do a lot of queries. We’ve recently made improvements on that side of things, if you want to register and read about it here:
“http://www.webmasterworld.com/forum30/28417.htm”:http://www.webmasterworld.com/forum30/28417.htm
For most of this quarter, we’ve worked on getting blog software makers to endorse nofollow and use it, but I think it’s time to start contacting makers of stats packages too. Ah, looks like AWStats is making progress:
* “http://sourceforge.net/tracker/index.php?func=detail&aid=1113534&group_id=13764&atid=363764”:http://sourceforge.net/tracker/index.php?func=detail&aid=1113534&group_id=13764&atid=363764
* “http://roub.net/blahg/archives/2005/02/referer_spam_aw.html”:http://roub.net/blahg/archives/2005/02/referer_spam_aw.html
If you want to ping them too, it might not hurt. 🙂

GoogleGuy, if you are actually silently adding nofollow, that’s really, really great.
Since I doubt all the world’s spammers reads my humble weblog, could you say something about this in some very public place, some place the spammers will read, so blackhats will know that referrer spamming is/will become useless?

bq. While I can see the benefits of the SEs dropping stat pages from their results, I question the site owner’s own responsibility in locking off their own pages.
In theory, site owners should be responsible. So should hosting outlets that provides them with web space and bandwith. In practice, neither web masters or hosting outlets are responsible, and the only ones with the _power_ to stop these people are search engines.

Truthfully, the smart spammers already know this. You’re dealing with either naive spammers or spammers who are honestly hoping just to get enough curiosity/click traffic without needing traffic from search engines. We’ll try to make it more clear so that at least the naive spammers know not to bother.
Best wishes,
GoogleGuy

Dropping the statistics from the indices is not a good idea. I regularly search for new User-Agent strings that I find in my logs to see if anyone else have been visited by it. Finding it listed in statistics summaries is usually quite helpful for finding what progress they have made through the web.

Web site operators are subject to different sorts of spam than the regular public is. Spam isn’t just the unsolicited email that gets all the attention, just as the spam talked of on this page is not the packaged meat product by Hormel Foods. (SPAM on th