Memes and Musings of an IT Engineer Turned to Management

Tag: filtering

In the early 80s, I thought the two greatest things in the world were email and USENET. I was totally addicted. Carrying on conversations with far-away people without the worry of telephone or time zone, or sharing a passion with fellow rec.sports.this or comp.sys.that afficianados was refreshingly liberating, especially for a quiet guy that was barely audible in public. I spent countless hours in the computer labs, and when I found out that I could dial into the school network from my own home using just a $100 Televideo tty with modem, it was truly “game on”.

As a student, I had limited access to the inner-workings of the system and I was only a ‘consumer’ of the information stream. And a glorious stream it was. But by the late 80s, I had the fortune of seeing the belly of the beast. Now I was an admin, and I could see the way the network worked, the transmission between hosts, the dance of the cron jobs to keep everything in balance. When most others were mesmerized by shared commercial email services like Compuserve and Prodigy, I had a *personal* .UUCP node on my Mac IIfx and my own USENET feed of selected newsgroups. We were a self-policing organism. AND IT WAS ALL FREE!

But money changes everything, and like absolute power, corrupts absolutely. Commerce trolls had made their way onto USENET, and it was no cost to them to ply their wares in open forums. They had a ready-made audience of sitting ducks, and all they needed to do was drop their spam bomb and move on to the next newsgroup. Now the beautiful synchrony of USENET propagation was interrupted by the staccato of “cancel” messages that back-propagated to kill the discordant posts. In the end, it became more trouble than it was worth. The bang-for-buck was no longer there for me, and it faded into distant memory as the next open frontier – WWW – became my new passion.

My love for email still lives on, but management of it too is more burdensome than it was back in those idyllic days. First came the unwanted spam for “viagra”. Then simple keyword filtering stopped being effective when “V1@gra” entered the scene. Then “V I A G R A”. Then embedded image spam. And so on. Whitelists, blacklists, procmail, DNSRBL, Bayesian filtering, spam firewalls. It’s actually quite a battle, though the bang-for-buck is still there.

So, as you’ve weathered my typical long story, what is my point? Twitter is where email and USENET were 20 years ago. Isn’t it wonderful that we can establish relationships with people we may not otherwise encounter, share passions, let links go viral, proclaim “what’s happening now”, and converse (albeit in a 140-character unthreaded style)? I’m very “greatest thing since sliced bread” excited about things like Twitter and Flipboard right now. But it’s all too eerily familiar…

And like the Sentinels from The Matrix, “here they come”. The spambots that search the entire twitterverse for your mention of “solar” so they can hit you with a renewable energy resources tweet. The ones who offer the best way to stop smoking if you mention anything even remotely relevant. Or the ones who simply suggest you buy a certain TV because you mentioned, well, nothing relevant at all.

Twitter is self-policing like USENET and email were in those early days. You can block a user and/or report them for spam, at which point you assume that the Twitter gods will banish them from the kingdom. But it is trivial to start a new account and begin spamming all over again. Eventually the cacophony is going to be overwhelming, or the self-policing will become too burdensome. Either could be the death-knell for Twitter. If Twitter had the same RBLs and Bayesian filters and other tools that evolved for email de-spamification, I might be inclined to use them. But even the need for those could be signaling the beginning of the end.

I think Twitter is in its naive era, where everything is good and nothing can corrupt. They seem to be worried more about their 3rd-party clients and API usability, and less about the people that will be (ab)using them. It’s well-known enough (compared to something like Orkut) to elevate itself into everyday culture, but not yet under the weight of its own gravity like Facebook. Usage will undoubtedly continue to increase, probably exponentially thanks to things like iOS5 deep integration bringing it rapidly to the fore. And thus Twitter needs to grow out of its naivete quickly.

There’s that old line from superhero movies, “if only he’d have used his power for good instead of evil”, and there is potential to do some major Twitter evil out there. What we’ve seen to date has barely scratched the surface. Many people are intent on working within the boundaries of the system for their own commercial best interests, and many more still have no qualms about abusing. (Let’s hope it doesn’t get as bad as email, where 78% of all messages are spam.)

So if Twitter is listening, please: let’s set those boundaries firmly and build in processes that will weed out the offenders. Twitter needs to take a much more proactive and hard-line role in preventing tweet spam from overtaking the community.

Like this:

Mailing to a friend, I just had an encounter with his Challenge-Response mail system. I was curious enough to look at the marketing material for this particular commercial product, and noted that it claimed 100% accuracy for anti-spam. Well of course. That’s because C-R is not an anti-spam system, it is an anti-email system.

A C-R system requires the email sender to verify their legitimacy as a human being (rather than automated spammer) by using some Turing-like Test such as CAPTCHA (a common verification technique found on web sites, such as the GuestBook link above). It does this for all mail, regardless of content. It is something akin to email call-screening, but really has very little to do with anti-spam. It is a whitelist/blacklist system based entirely on sender address that builds up the respective filters via the screening process. Proponents argue that C-R is 100% accurate while other methods that constantly tweak content filters are not. To some extent, this is true. But is it truly worth it to never see a spam again?

Users love C-R because they no longer receive spam. But what else are they not getting? Senders – legitimate ones – tend to not like dealing with the business end of C-R systems because it has the appearance of being slightly rude: not only is it like saying “here’s my email address, maybe I will allow your message in”, but it automatically paints everyone as a spammer until proven otherwise. Not to mention that the CAPTCHA process, one-time or not, can be slightly irritating. More times than not, senders just walk away with a “why bother?” attitude.

There are other ways. My ISP provides SpamAssassin as part of their Exim front-end mailer. On the back-end, I use Thunderbird’s Bayesian filter. Between the two, I have about a 97% anti-spam accuracy and have not experienced a single false-positive in 3+ years. Sure, the occasional spam may get through, but I just use it as an opportunity to further train T’bird. At work, we use a Barracuda spam firewall in front of our Exchange server. The false-positive rate [dropping into the Quarantine area] is decidedly not zero, but when you look at the millions-per-day of spam that were successfully blocked, one has to admire the overall efficiency of the product.

I don’t mind the occasional spam that does get through these protections, because I know they are working in my favor the majority of the time. I am not so offended by the stray advert that I desire to “terminate with extreme prejudice”, and I will not risk alienating legit senders just so my delicate nature is never uglied up by the outside world.