Ma startup

Mes livres

dimanche, avril 15, 2007

Splogs: The invasion

Splogs are back, with a vengeance. Splogs are fake blogs (splog = spam + blog) designed to sell you Viagra, medicine, pornography, etc, or simply to make money through Google ads. I mentioned a first wave of splogs back in September 2005, which was largely down to Google’s free blogging platform Blogger (and the place where I host my own blog, it should be said). A rough estimate showed that around 60% of all blogs created on Blogger at the time were spam, built by automatic robots (see here and here). The result was a catastrophe for Google searches, since most search requests ended up returning splogs instead of real results.

Blogger and Google reacted quite quickly. Blogger put measures in place aimed at preventing the creation of fake blogs, including the far-from-perfect (but now standard) system of “captchas” (those twisted little letters you’re asked to type in to prove you’re not a robot).

For its part, Google applied some quite dramatic spam-detection algorithms (to the detriment, in fact, of real blogs that also found themselves penalised – but desperate times call for desperate measures). The firm, which has a business model based 98% on advertising revenue that’s linked to the search engine (very much the company’s Achilles heel), was seriously under threat. Users have shown in the past (Altavista, anyone?) that in the space of a few months they can quite happily turn their backs on the market leader to go with a more capable competitor.

And now it seems to be starting again. From about the middle of March, the blogosphere is once again being invaded by splogs. This time, it’s not Google juice that the spammers are after, since the search engine has packed its defences. No, the goal now is trackbacks. Welcome to the age of “trackback spam.”

You know what a trackback is, I’m sure. When someone writes a blog post that contains a link to a post of yours, and provided you’ve set it up to do so, your blogging platform will automatically create a link to the post that has quoted you. Very handy! I make use of this system myself, here on my blog: in the left-hand column (“Ils en parlent…”), and also at the end of each post. For me, this is one of the real strong points of blogs, since the system enables the blogger (along with his or her readers) to discover some new and often unexpected sites. You may have noticed that I don’t have a blogroll. I’m often asked why. On the one hand, it’s because I read a lot of blogs, and I don’t want to upset anyone by not mentioning every single one, but also - and above all - it’s because I believe that blogrolls reinforce one of the dangers I see in blogs: this tendency we all have to read and trade links with the same half a dozen blogs again and again. Certainly, it’s easier to remain cosseted in a community of blogs we identify with than to go out and read new things – especially things we don’t always agree with. For me, however, trackbacks are a sign of openness: they provide surprises, they are often a source of contradiction, and they are a way of preventing us from hiding away in our gated communities.

The thing is, spammers quickly realised that there was some money to be made here too. By linking to your blog, they cause a link to appear there that points to their sites: online casinos, Viagra sales, and all the other types of junk and rip-offs that litters the web. Since the middle of March, this practice has shot through the roof, to the point of becoming a serious nuisance. Imagine you have a charming little blog that details the adventures of your offspring or what’s going on at your local school, and suddenly it is inundated with links to zoophilia sites… Yet again, the number one culprit behind this invasion is Blogger.

Let’s take my blog as an example. I create the list of trackbacks in my “Ils en parlent” section based on a request carried out on Google Blogsearch (link:aixtal.blogspot.com). Until recently I displayed the first ten results, and only rarely did I find any unpleasant trackbacks. So few, in fact, that it wasn't anything for me to be concerned about. And then, in the middle of March, they began to arrive. One a day, then two, three… until finally, a few days ago, every single item on the list of the top ten trackbacks was spam.

For instance, here are the first ten results returned on Friday 13 April. They are all splogs that linked to my blog (amongst others):

Decrank ringtone aa href http www motorola Krzr K...

Com mpringtones N nice To Know You ringtone middot...

Pl Sport, friant California cars. milwaukee casino...

Full tilt poker login window

Load mp3 songs to motorazr

Shoshone Indian Shoshone nba Championship Odds To ...

Pm Sunday Brunch am - betsoff Gambling Addiction, ...

Mp3 ringtone studio 6600

Apply online for a loan from Halifax Personal Loan...

Gambling akes

If we go even further and look at the first 100 results, only one is anything other than spam! And all the links come from Blogger:

http://letthatbeenoughblog.blogspot.com

http://dooney-alto-ring-flap-info.blogspot.com

http://black-jack-clubs-more.blogspot.com

http://full-tilt-poker-login-window-reports.blogspot.com

http://load-mp3-songs-to-mo-insights.blogspot.com

http://casino-in-daverport-blog.blogspot.com

http://soaring-eagle-casin-info.blogspot.com

http://mp3-ringtone-stud-reports.blogspot.com

http://cash-advances-for-peopl-blog.blogspot.com

http://gambling-akes-posts.blogspot.com

The defence measure I set up was quite drastic. Now, I no longer display any trackbacks coming from Blogger (i.e. blogspot.com), but even if I search for 100 results, there are still almost no interesting trackbacks to display.

You’re probably thinking: who controls all this? That’s what I’m wondering too. Why have Google and Blogger allowed the situation to degrade so dramatically? From a technical viewpoint, filtering these splogs doesn’t seem especially complicated. Their characteristics are easily recognised: they have a profusion of key words that stand out in the text (ringtones, loans, cars, cash, XXX, casino, etc.), their URLs contain these same words (often in very long names separated by hyphens), and the blogs typically contain one single post stuffed full of links. Cleaning them up should be child’s play. If they would hire me, I’d do the job for them in two days.

So, the next question is: why aren’t they doing it? Pure negligence? Or perhaps they have an ulterior motive in seeing the trackback system fall apart, for some odd reason? Maybe they’ve got something new to sell us?

3 Commentaires:

Anonyme a écrit...

Very good idea to publish the English version. This will help many of us to get more acquainted with technical English (or American).

Mr. Veronis, I have noticed that you like funny formulations (as I do also). Do you know this one :"I speak English very fluently, but confidentially, I am not English !"This was the favourite sentence of one of my comrades, more than fifty years ago. He had also another one: "C'est moi qui est le nouveau professeur de français qu'on vous a causé !"

I would just like to compliment you on your English. I am a native English speaker, and a professional translator (French to English). I also teach English to professionals. It is very difficult to detect your non-native status... ("If they would hire me..., for example!).