Posted
by
ScuttleMonkey
on Monday January 25, 2010 @03:43PM
from the sounds-evolutionary-not-revolutionary dept.

A team of computer scientists from the International Computer Science Institute in Berkeley, CA are claiming to have found an "effectively perfect" method for blocking spam. The new system deciphers the templates a botnet is using to create spam and then teaches filters what to look for. "The system ... works by exploiting a trick that spammers use to defeat email filters. As spam is churned out, subtle changes are typically incorporated into the messages to confound spam filters. Each message is generated from a template that specifies the message content and how it should be varied. The team reasoned that analyzing such messages could reveal the template that created them. And since the spam template describes the entire range of the emails a bot will send, possessing it might provide a watertight method of blocking spam from that bot."

Sure, it will work "perfectly" for about 2 days, until the spammers change their methods to work around it. This is an arms race; there is no "final solution" (although modifying the email protocol to allow authentication of the sender's address would be a big help.)

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses
( ) Mailing lists and other legitimate email uses would be affected
( ) No one will be able to find the guy or collect the money
( ) It is defenseless against brute force attacks
( ) It will stop spam for two weeks and then we'll be stuck with it
( ) Users of email will not put up with it
( ) Microsoft will not put up with it
(X) The police will not put up with it
( ) Requires too much cooperation from spammers
( ) Requires immediate total cooperation from everybody at once
( ) Many email users cannot afford to lose business or alienate potential employers
( ) Spammers don't care about invalid addresses in their lists
( ) Anyone could anonymously destroy anyone else's career or business

( ) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical
( ) Any scheme based on opt-out is unacceptable
( ) SMTP headers should not be the subject of legislation
( ) Blacklists suck
( ) Whitelists suck
( ) We should be able to talk about Viagra without being censored
( ) Countermeasures should not involve wire fraud or credit card fraud
( ) Countermeasures should not involve sabotage of public networks
( ) Countermeasures must work if phased in gradually
( ) Sending email should be free
( ) Why should we have to trust you and your servers?
( ) Incompatiblity with open source or open source licenses
( ) Feel-good measures do nothing to solve the problem
( ) Temporary/one-time email addresses are cumbersome
( ) I don't want the government reading my email
(X) Killing them that way is not slow and painful enough

Furthermore, this is what I think about you:

(X) Sorry dude, but I don't think it would work.
( ) This is a stupid idea, and you're a stupid person for suggesting it.
( ) Nice try, assh0le! I'm going to find out where you live and burn your
house down!

Fine with me. Most spam I get is obviously a template, since I get the same one for weeks. This would stop those additional sent copies. The false positive rate on this kind of thing is effectively 0%, so I'm willing to have it be an additional check on my email.

If it can stop a lot of this kind of spam, that's fine with me. Let it be an arms race. If the spammers have to make up new templates every 4 hours, that's going to make things a lot harder.

This isn't a cure for all spam, it's a fantastic filter for one (of the biggest) kinds of spam. Only headline makes it sound like it will solve all spam.

There is a final solution: make sending spam more expensive. Spammers will only spam so long as it's mind-blowingly wealthy. If you can raise their operating costs and bump them down from "mind-blowingly wealthy" to only "obscenely wealthy", they might switch to other lucrative immoral industries like manufacturing printer ink.

What this does is increase the computational power required to generate a spam email. The method they described sounds like it's self-learning (just hook it up to a spambot "oracle" and it'll figure out the new template), so spammers will likely have to abandon the use of templates altogether. If you increase the amount of computational time required to generate spam, you decrease the amount of spam sent and really decrease the profitability of it.

We keep pushing the requirements for spam further and further up the computational totem pole (or Chomsky hierarchy, if you will) and you get closer and closer to a point where spammers are going to have to create strong AI to write spam. If they fail, we don't have spammers anymore and if they win, well we have spam, but we also have strong AI! Win-win, I say.

approach to fighting spam. Your idea will not work. Here is why it won't work. (One or more of the following may apply to your particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses(x) Mailing lists and other legitimate email uses would be affected( ) No one will be able to find the guy or collect the mone

That doesn't work since it's been ages since spammers used their own machines to send spam, these days they just use whatever botnet they control. Increasing computational complexity only means they make their victims PC's work harder thus harming the environment.

We keep pushing the requirements for spam further and further up the computational totem pole (or Chomsky hierarchy, if you will) and you get closer and closer to a point where spammers are going to have to create strong AI to write spam. If they fail, we don't have spammers anymore and if they win, well we have spam, but we also have strong AI! Win-win, I say.

I agree with nearly everything you've said, but I don't consider the invention of strong AI by spammers to be a "win". Previously [slashdot.org], I've argued [slashdot.org] that individual rights aren't related to human genetics, but rather to the organism's sapience. In other words, roaches have more rights than yeast cells (but not much more), cats have more rights than roaches, cetaceans/hominids/humans/"strong AI" have more rights than cats.

Allowing spammers to create beings who should be treated as citizens but are actually used as slave labor is wrong. Note that I'm specifically referring to strong AI; weak AI wouldn't qualify as sapient under most definitions.

Why do the spammers have to be on one particular side? It's an arms race, which is more like a game of cat and cat; we both (the good guys and the bad guys) want end users to get just the messages we send. Each will do whatever it takes to get in the others' way. In my experience, it's just as fun (and a lot more gratifying) to stay on the good side.

I think you're forgetting that the criminals who run botnets aren't as worried about damaging the normal operation of the Internet as the rest of us might be.

We start detecting their templates; they start making their templates more and more flexible. We chase, giving our filters broader and broader definitions of "bad" email. Clever spammers start sacrificing the percentage of thier mail that's coherant just to increase the output range of their templates, forcing the template-recognition filters to get looser. Eventually the filters become useless because they can't pick out every variation that could come from a template without also capturing a lot of legitimate messages.

Or something else happens that renders the filters useless. THe point is - yes, it's a win in that it fights techniques used today. No, it is not the grand victory proclaimed by the headline.

And since most devices will download updates and things automatically, new templates could be discovered and pushed out as well. I'm sure there will be some work around that the spammers will figure out, but hey, I'm up for most anything that will cut down/stop/prevent spam. I am also still a fan of the 'kill them until they die from it' club when it comes to spammers.

So it still needs to see a certain volume of spams in order to figure out the template. Then it reacts to the template. Then when the spammers figure out it's uncovered the template, they change the template.
Spam will exist until the fundamental nature of e-mail operation changes.

I don' tthink that's where it will fail -- yes, some will get through in that windows before the system learns the new template, but it could drastically reduce the problem for a short time. But it introduces a new kind of issues: what happens when this runs for a month, and the spammers come up with a way to auto-generate new templates and change it once every few minutes. The net results is that the filter apps will need to compare each email against millions of potential templates... and it becomes fast

A team of hackers from Russia are claiming to have found an "effectively perfect" method for countering spam blocking technology. The new system deciphers the templates Spam Blocker is using to filter spam and then teaches spam generators what to write.

Err, what if I, as a corporation, blew out a spam that effectively incorporated a template unique to that which my largest competitor uses in their newsletters or customer communiques (or at least close enough to get my competitor blacklisted far and wide)?

(it would take a shedload of doing, but certainly not impossible, and if it could be done, would make for one hell of a cheap and easy DoS).

Heuristics is great and all, but go too deeply, and I can see it opening up a small but pretty scary can of worms.

and then the researchers discovered the Halting problem and pretended it didn't exist.

I don't quite see your point - the halting problem proves that you cannot create an algorithm that will tell whether an arbitrary program will ever halt. It has no significance for this particular program, since it would be trivial to ensure that it does halt.

As long as there is money to be made in spam, spammers will continue to send spam. This "discovery" does nothing for that. Indeed it just dedicates more CPU time to trying to identify spam, which is just another way that internet users shoulder the cost of the profitability of spamming.

I've said it before, and I'll continue to say it - spam is an economic problem. Until something is done to address the money that spammers make, they will continue to find ways around these "effectively perfect" "discoveries".

Spammers send spam because it makes them money. It makes them money because people are stupid. The question is: why are people stupid, and how can we make them smarter? I would argue that spam is an educational problem.

Not directly. The spammers themselves are paid by moderately smart people who are selling products online that are often of questionable legitimacy. While some of those customers are stupid, there are generally fairly crafty individuals making money off of the customers along the way.

The question is: why are people stupid, and how can we make them smarter?

You could ask the same question in the light of why 419 scams work, why old-school pyramid schemes work, etc. Money can make smart people pretty dumb at times.

I've said it before, and I'll continue to say it - spam is an economic problem. Until something is done to address the money that spammers make, they will continue to find ways around these "effectively perfect" "discoveries".

There is always a demand to get a message out to n% of x hundred thousand people for cheap. You can't realistically stop that. What you can realistically do is increase the cost of getting those messages out. Treating spam as simply an economic problem won't work.

What you can realistically do is increase the cost of getting those messages out.

The proposed "Spam Blocking Discovery" doesn't do jack shit to accomplish that goal. The people who install the spam filters aren't going to buy anything that was spamvertised, anyways. Meanwhile the spammers will continue to adjust their methods to get around the filters that are installed at the ISP level so that they can get their messages out to more people who would be interested.

This craptacular "Discovery" is just another round of whack-a-mole. Hopefully at some point people will finally get ti

As long as there is money to be made in spam, spammers will continue to send spam.

But if the US government was to threaten the US based credit card companies that process every single one of these transactions there would be no more money, and no more spam.

Which transactions should they block?

It's also important to keep in mind that spammers don't make money from selling V1AGRA. Spammers make money from other people who want to make money by selling V1AGRA. The distinction is important because it doesn't really matter whether money can be made by selling shady products or not. As long as there's a sucker who *believes* they can make money by selling the shady products, the spammer has a customer. When that one wises up, there are 10 more waiting.

I, too, have designed a flawless spam filter. It works under similar principles, will filter 100% of incoming spam, will generate 0 false positives, and it's super easy to use:if(is_spam(message)) { delete_message(message); }

Had there been no spam filters, we'd all receive about the same amount of e-mail spam as we receive in the postal mail world. Instead, the spam industry spends it's time trying to break through spam filters -- and they do so with volume. Upping the ante further just doesn't help. So now you'll encourage spam without templates. My grandmother's just never going to have a chance.

Had there been no spam filters, we'd all receive about the same amount of e-mail spam as we receive in the postal mail world.

I can't imagine what you base that statement on. Real-world junk mail is limited by the fact that it costs money to print and mail junk mail. Neither applies to spam.

Spammers aren't just competing with spam filters. They're also competing with each other for attention. Even in the absence of spam filters, the spammers would continually seek new ways to get more of their spam into your inbox than their competitors.

In fact, they might well invent the spam filter, with a deliberate back door so that their spam sails through while their competitors are dropped.

As a researcher in the academic side of the Information Security field, I can't help but notice a significant increase in the level of puffery and misleading promotion of research results. Self-promotion obviously isn't new, it's just that as the amount of newspaper-assisted promotion increases, the level of accuracy has dropped significantly. And more importantly, researchers seem much less apologetic about it. It's generating some real blowback.

The best recent example I can think of is Vanish, a cryptographic system for "destroying" data that was proposed out of University of Washington. It's not just that the system was broken [utexas.edu] a few days after it was presented, it's that this relatively minor result got more press than all of the perfectly legitimate crypto-systems research that was going on at the time. In fact, during the same time period a guy named Craig Gentry solved [techtarget.com] a major open crypto problem --- namely, how to compute on encrypted data --- and it got a fraction of the press coverage.

Not that I'm saying these researchers specifically asked to have their invention described as an "effectively perfect" solution to preventing spam --- which I guarantee you 100% it is not --- but that by going out on a University-encouraged PR junket, they've more or less encouraged this kind of coverage. This kind of stuff is damaging; people should describe their work as what it is. They've developed a technique that is highly effective at filtering/current-gen/ spam generators, in the lab. It won't stop all spam, and it's not effectively perfect, since spamfiltering is by nature an arms race. But of course that's not how it's going to be presented. In the long run this'll just make people more jaded with our field.

As a former manager and an "email direct-marketing" firm, I should point out that the spammers can increase the amount of complexity/variation in the templates by a wide variety of techniques, including rearranging paragraphs instead of just letters, making parts of the message optional, performing syntactic modifications of the included text,... Each new minor modification starts a research effort on the detecting side. The cost of detecting spam will rise much faster than the cost of generating spam.

Honestly, I have to say between all the various filters I have or have written, I don't get a whole lot of spam. What I -want- though, is a way to identify it more reliably before my mail server even has to accept the message. With the current protocols, you can simply only block so much based on IP ranges or whatnot. There's a point where you have to accept the message to analyze. Sadly the only way we're likely to increase the chance of dropping the connection before receiving the message now is for t

As a co-author of this work, I should be clear that we never suggested that we have a perfect spam filter per se, simply a new tool that has the benefit of being orthogonal to existing techniques. For _existing_ botnets, our filters are extremely good, but the paper is also quite clear about the variety of ways that spammers might try to evade the approach.

Spam filtering isn't very hard, if you see the email for a large number of accounts, as Gmail does. The one characteristic that spam must have is that it's sent in bulk. The commonality across receiving email accounts gives it away.
The only hard part is recognizing the commonality, which is already working rather well. This is just a new technique for recognizing commonality.

Recognizing spam for a single account is tougher, because you don't get to see the "bulk" property.

Amen to that.... we moved our email accounts to Gmail a few years back.

Currently I get maybe two or three spam emails a week across three accounts, two of which have been in active use on the Internet for more than a decade.

Of course if I look in the spam folder, I see that in actual fact anywhere up to 50-100 a day per account. Not my problem. Possibly a problem for Gmail. But they seem happy to undertake to offer the service and remove it for me.

This is actually quite simple once you've got the basics in place. It reminds me of a program I once wrote that could crawl a website and it would find out the templates used, identify the actual content, title and other blocks. Some postprocessing was required though, but since most e-mails are a lot simpler than webpages, I suppose this can be done completely automatic for spam. And probably indeed "effectively perfect". As long as spam is template-based, that is.

Yeah, this idea is great. . . until it starts blocking out legitimate emails which really are confirming orders shipped by Amazon or other retailers, newsletters that people really were wanting to get, and other info that 'looks' like spam, but isn't.

This is why, while I use spam filters, I would never rely on them to delete email. All I want filters to do is punt suspect spam off to the Junk folder, where I can review it later, or find the email I was expecting which got mis-classified.

approach to fighting spam. The idea will not work. Here is why it won't work. (One or more of the following may apply to the particular idea, and it may have other flaws which used to vary from state to state before a bad federal law was passed.)

( ) Spammers can easily use it to harvest email addresses(X) Mailing lists and other legitimate email uses would be affected( ) No one will be able to find the guy or collect the money( ) It is defenseless against brute force attacks( ) It will stop spam for two weeks and then we'll be stuck with it(X) Users of email will not put up with it( ) Microsoft will not put up with it( ) The police will not put up with it( ) Requires too much cooperation from spammers( ) Requires immediate total cooperation from everybody at once(X) Many email users cannot afford to lose business or alienate potential employers( ) Spammers don't care about invalid addresses in their lists( ) Anyone could anonymously destroy anyone else's career or business

Specifically, your plan fails to account for

( ) Laws expressly prohibiting it( ) Lack of centrally controlling authority for email( ) Open relays in foreign countries( ) Ease of searching tiny alphanumeric address space of all email addresses(X) Asshats( ) Jurisdictional problems( ) Unpopularity of weird new taxes( ) Public reluctance to accept weird new forms of money( ) Huge existing software investment in SMTP( ) Susceptibility of protocols other than SMTP to attack( ) Willingness of users to install OS patches received by email( ) Armies of worm riddled broadband-connected Windows boxes(X) Eternal arms race involved in all filtering approaches(X) Extreme profitability of spam( ) Joe jobs and/or identity theft( ) Technically illiterate politicians( ) Extreme stupidity on the part of people who do business with spammers( ) Dishonesty on the part of spammers themselves(X) Bandwidth costs that are unaffected by client filtering( ) Outlook

and the following philosophical objections may also apply:

(X) Ideas similar to this are easy to come up with, yet none have ever been shown practical( ) Any scheme based on opt-out is unacceptable( ) SMTP headers should not be the subject of legislation( ) Blacklists suck( ) Whitelists suck( ) We should be able to talk about Viagra without being censored( ) Countermeasures should not involve wire fraud or credit card fraud( ) Countermeasures should not involve sabotage of public networks( ) Countermeasures must work if phased in gradually( ) Sending email should be free(X) Why should we have to trust you and your servers?( ) Incompatiblity with open source or open source licenses(X) Feel-good measures do nothing to solve the problem( ) Temporary/one-time email addresses are cumbersome( ) I don't want the government reading my email(X) Killing them that way is not slow and painful enough

Furthermore, this is what I think about them:

(X) Sorry dudes, but I don't think it would work.( ) This is a stupid idea, and they're a stupid people for suggesting it.( ) Nice try, assh0les! I'm going to find out where you live and burn your house down!

Exactly, this will force spammers to just slightly get off their asses and tweak their templates. If I were them, I'd harvest actual personal email from compromised accounts which had images attached, and replace those images with Viagra ads. I get messages like this:

OMG, take a look at this adorable picture of Jake playing with Mike's puppy!

[attached jpeg]

Mary

Now suppose my account were compromised and you got this exact message from my personal email, where the jpeg is a Viagra ad. There is absolutely nothing there for your spam blocker to latch on to, unless it parses the content of the jpeg itself. Anyway, blocking stuff like this would lead to unacceptably many false positives.

Not in the same level of detail; but, when your business model is spamming, you inevitably end up sending thousands of samples to loads of ill-vetted email addresses, some fraction of which are either being operated as spamtraps, or are in the possession of users annoyed enough to forward samples on.

Your algorithms can, and often do, remain secret(unless one of your black-hat buddies cracks one of your cracked machines); but you'd be a lousy spammer indeed if the results of your technique weren't widely available.

I RTFA and they tested it by giving it 1000 spam e-mails by the same bot and after that it recognized the spam sent by that bot with 100% accuracy. This means NOTHING. I could bet a nice sum of money that if you give a traditional, learning spam filter 1000 e-mails sent by the same bot and flag those all as spam, it can then recognize the bot's further e-mails as spam. Real enviroment doesn't work like that, however. You have a large amount of very different spam bots and their templates which is what makes

I could bet a nice sum of money that if you give a traditional, learning spam filter 1000 e-mails sent by the same bot and flag those all as spam, it can then recognize the bot's further e-mails as spam.

If that were true, then by now Thunderbird's filter would stop missing all the Russian spam I get. I have no idea what the spam says, as I don't know Russian, and I never get legitimate mail in Russian; all the Russian spam I get appears very similar in format and length. I'm quite certain that Thunderbird has had over a thousand such e-mails marked as spam over the last few years, and yet it consistently fails to flag them.

I'd say it's 'effectively perfect' against the templates it's targeting, not against all of them. Since templates are the best way to get around a bayesian filter, you 'could' limit spammers to manual spam again, which is a big crap-shoot. Until they develop a new method (which isn't the target the filter is 'perfect' against).

Has anyone ever suggested all of these? The government offers a contract and clears the legislative barriers to a company making vigilante robots which would hunt down and kill the families of all spammers while making the spammers watch?

Assuming these robots can fly, have powerful metal claws, and cannot be stopped, I can't see any problems on your checklist.

( ) Spammers can easily use it to harvest email addresses() Mailing lists and other legitimate email uses would be affected( ) No one will be able to find the guy or collect the money( ) It is defenseless against brute force attacks( ) It will stop spam for two weeks and then we'll be stuck with it() Users of email will not put up with it( ) Microsoft will not put up with it( ) The police will not put up with it( ) Requires too much cooperation from spammers( ) Requires immediate total cooperation from everybody at onceMany email users cannot afford to lose business or alienate potential employers( ) Spammers don't care about invalid addresses in their lists( ) Anyone could anonymously destroy anyone else's career or business

There are currently laws expressly forbidding the construction and operation of mass murder machines, but that's why I suggested we get rid of those laws.

( ) Ideas similar to this are easy to come up with, yet none have ever been shown practical( ) Any scheme based on opt-out is unacceptable( ) SMTP headers should not be the subject of legislation( ) Blacklists suck( ) Whitelists suck( ) We should be able to talk about Viagra without being censored( ) Countermeasures should not involve wire fraud or credit card fraud( ) Countermeasures should not involve sabotage of public networks( ) Countermeasures must work if phased in gradually( ) Sending email should be free(X) Why should we have to trust you and your servers?( ) Incompatiblity with open source or open source licenses( ) Feel-good measures do nothing to solve the problem( ) Temporary/one-time email addresses are cumbersome( ) I don't want the government reading my email( ) Killing them that way is not slow and painful enough

I do realize some wouldn't trust the company controlling the deathbots, which is why -I- would be the governing authority once they were operational. You can trust me because I promise to only kill you if you're related to a spammer.

The truth is that spam has been successfully fought by filters without compromising legitimate email. Furthermore as Paul Graham had stated, spammers have been forced to yield in smaller text-based messages or in-line images.

In particular,

(X) Mailing lists and other legitimate email uses would be affected

Possibly but the probability of losing legitimate email by modern heuristics is (proven) smaller than the probability of accidentally deleting it when it is mixed with spam.

(X) Users of email will not put up with it

They do, sometimes without their knowledge

(X) Many email users cannot afford to lose business or alienate potential employers

They would lose more without filtering. See 1st argument.

(X) Asshats

How ?

(X) Eternal arms race involved in all filtering approaches

(X) Extreme profitability of spam

And also extreme profitability in having a working e-mail address.

(X) Bandwidth costs that are unaffected by client filtering

This isn't the mid 90s anymore.

(X) Ideas similar to this are easy to come up with, yet none have ever been shown practical

The practicality of heuristic filtering (SpamAssassin etc) is proved by its transparency. Even old e-mail clients such as Outlook 97 can filter out email marked by X-Spam headers. Gmail and the rest of the privacy traders do it for you automatically.

(X) Why should we have to trust you and your servers?

Run it locally. Mozilla Messaging does.

(X) Feel-good measures do nothing to solve the problem

Age old forms copied from the newsgroups can't be used as arguments anymore. Time to be creative again!

Exactly. They just make the subtle changes in templates less subtle. They have a reason (money) to get around the blocking, like they already do. This isn't going to be some effectively perfect solution.

Effectively perfect, no. If nothing else, for certain classes of spam(especially phishing) the money or perception of money can be good enough to keep actual humans at the keyboard.

However, the reason you use templates, rather than word salad or the first 100kb of/dev/urandom, is that you both need to peddle whatever it is you are peddling and look vaguely like a human constructed message. If the researchers can, in fact, target messages that bear signs of being generated from a given template, the spammers will be forced to be looser in generating messages from templates(which increases the risk of garbling beyond comprehension, or being flagged by filters looking for highly non-human output) or step up their game in terms of natural language synthesis.

how about the spammers using fragments from Gutenberg books ? Or fragments from blog posts ?... What is spam, after all ? I am trying hard to send David Horowitz the the spam bin, but then the guy manages to get out of it after a while... I have tried unsubscribing, tried "spam"-ing him, even tried to beg him to let my mailbox live peacefully... for me it's spam, for him it is enlightening the dumb masses and the work of his life...