Subscribe (via RSS) to this post's comment thread. (What does this mean? Here's a quick introduction.)

January 12, 2004

Another spam attackPosted by Teresa at 08:03 PM *

Making Light and Electrolite are under attack again (see previous round), and it’s aggressive—over 50 hits in the last hour, from 21 different IP addresses. The format’s the same for every post: a porn URL, followed by some bit of text, usually from a computer manual.

Anyone else getting hit? What’s the scoop?

Addendum, 8:17 p.m.: Kip Manley, of Long Story Short Pier, got hit by these last night. He’s posted the IP address blacklist he compiled during the attack. Patrick is adding it to our defenses right now. If you think there’s any chance you’re going to be targeted, you might want to do the same; Kip Manley took 400 hits.

If you have MT Blacklist installed but don’t know how to use these lists: Copy the list to your clipboard. Go to the main MT Blacklist screen. Right under the title there’s a little row of gray boxes. Click the one that says Add. This will take you to another screen that has a large empty box labeled Import blacklist. Paste the entire list into the box. Or paste in some fraction of it, if you know what you’re doing and don’t want the whole thing; but just pasting in the entire list is easiest. Finally, click the button underneath the box that says Import entries. That should do it.

Don’t worry about pasting in duplicate entries. MT Blacklist will automatically strip out any duplicates.

Addendum, 8:54 p.m.: A comment from Kip

Add it and add it now, is my advice. It takes forever (well, an hour, but it seemed like forever) to comb them out by hand—there were 50 or 60 URLs total, used over and over again.

My working theory (not that I know much about this stuff at all) is that somebody’s randomly generating names, email addies, IP numbers, lorem ipsum text, and URLs—most of the URLs don’t go anywhere, but are chaff, to delay and discourage you from cleaning it all off until Google has a chance to register the link to the one or two “real” sites buried in the onslaught. —So this will work until the next iteration of this spam bot. And then we’ll have a new list of fucked-up URLs we’ll have to add.

Just wait: the next wrinkle will be chaff that are legitimate URLs you like, culled from people’s blogrolls. —Though my heart is heavy at the idea of comments registration (to be available with MT 3.0), I’ll probably be leaping to upgrade and implement it.

My working theory (not that I know much about this stuff at all) is that somebody's randomly generating names, email addies, IP numbers, ipsem lorem text, and URLs--most of the URLs don't go anywhere, but are chaff, to delay and discourage you from cleaning it all off until Google has a chance to register the link to the one or two "real" sites buried in the onslaught. --So this will work until the next iteration of this spam bot. And then we'll have a new list of fucked-up URLs we'll have to add.

Just wait: the next wrinkle will be chaff that are legitimate URLs you like, culled from people's blogrolls. --Though my heart is heavy at the idea of comments registration (to be available with MT 3.0), I'll probably be leaping to upgrade and implement it.

I'm not an expert, but I do know a little bit about this stuff, and I don't think that's possible. To put it as simply as I can: in order to post here you must be able to send information to the server, and the server must be able to send information to you. In order for this to happen you (or your computer) must know the IP address of the server, and the server must know the IP address of your computer(1).

If the IP address you provide the server with is not the true IP address of your computer, any information the server tries to send to you, will never actually reach you. (Instead it will be delivered to the computer at the IP address you provided, if such a machine exists.) This would make opening an tcp connection with the server impossible, and opening an tcp connection is a necessary prerequisite to doing things like viewing a web page, or posting via a web form. (See http://www.grc.com/dos/drdos.htm for a nice graphic.)

(1)Ok, in practice this might actually be the address of your NAT router or proxy server, but the principle remains more or less the same (and if you don't know what a NAT router or a proxy server is, don't worry about it).

While Jonah's right that it's really hard to fake IP addresses with TCP, there's apparently a whole little underground economy of people who break in to machines, set up various remote control/forwarding servers, and sell the IP addresses of the captured machines to spammers. The spammer then bounces all their junk through the unwitting victims. So in fact the IP addresses might be those of innocent dupes. Even if you figure they deserve to get blocked to encourage them to protect their machines better, blocking those IP addresses isn't going to help cut down the spam since the spammer will be using a whole new set of captured IPs tomorrow.

Well, whoever was doing this was certainly contriving to make dozens of comments, posted over a very brief period, appear to come from a wide range of IP addresses. Over a half-hour, before I stopped writing them down, these are the IP addresses I logged the spams as coming from:

Maybe so, Rich, but I won't have to clean out this batch all over again. I understand MT Blacklist is slated to add the ability for users to post spammer IP addresses to a central registry. That will have its own problems, of course, but it'll greatly diminish the utility of comment-spamming. The trick is to make the practice more troublesome and expensive than it is rewarding.

The rub. If Rich is correct, that this guy is using cracked machines to forward, then blocking by ip will be exactly as effective as the Maginot Line was.

A few tests, using PNH's handy list of IPs, will tell me more. However, a few quick lookups show me these machines are scattered all across the net. One appears to be a school district machine in Idaho (www.d59.k12.id.us=199.104.82.18) One's in japan (exp110rb.nsz.co.jp = 210.197.88.11) One, well, I guess Comms Resources won't be including Mail. (mail.commsreources.com=194.154.176.242)

More on this in a bit -- this sort of looking about takes a bit of time, and the machine is also running "make buildworld" for FreeBSD 5.2-RELEASE, so this might take a bit.

Hmmm. I wonder if this is related:
from http://www.internetnews.com/ent-news/article.php/3297661

"Another day, another virus.

Unsuspecting Internet users were greeted Friday with an e-mail message purportedly from windowsupdate@microsoft.com to update their computers. The message has the subject line: Windows XP Service Pack 1 (Express) - Critical Update. Problem is, the message isn't from Microsoft and the patch is actually a back door Trojan."

This particular Trojan, called Xombe, downloads a file that launches DoS attacks. I heard about this today from my EDUCAUSE subscription, and when I saw you were getting hit again, it rang a bell.

While I don't have a weblog, I run a few wikis, and some of them got recently hit by spam attempts as well. And as with weblogs or guestbooks, it is hard to keep wikis from being abused by bots. Now, I have developed some techniques to keep them out, several of which are probably also used by weblogs.

Most importantly, however, I am formatting all external URIs so that they go through a redirector instead of referencing the target page directly. The URI for the redirector (such as http://www.example.com/redirect?uri=http://target.example.org/) is not accessible to Google and other search engines (blocked in /robots.txt). Thus, links to external sites (except where explicitly approved) do not do anything for the pagerank of the external sites, rendering googlespamming ineffective.

Naturally, that doesn't help a lot if only a few individual sites do it. But if such techniques were widely deployed, comment spamming would become essentially a pointless exercise. It would not prevent the spam, but there wouldn't be anything to be gained from it, either.

The downside is of course that URIs become longer, less readable and require an additional server access (unless you augment the A element with Javascript). There are probably also other problems that I haven't figured out yet. But at least I have the satisfaction that any spammer who gets through my defenses won't gain anything from it.

Not being savvy with this stuff, I have to ask: what about those of us who don't have MT Blacklist? Or don't blog with MT at all? Does anyone know a way that we (I use greymatter, for the record, but can't speak for anyone else) might be able to take advantage of the lists that Kip and Patrick and Teresa have generously provided with?

This latest attack seems to be a wave of crapflooders using a tool called "FloodMT". I didn't realize that this particular practice had migrated off of Slashdot. Their CVS page on Sourceforge seems to be down, or I'd take a look at it and see how it works, but Erik's diagnosis is right. A lot of techniques that would be dandy against comment spammers, who have the defined goal of increasing their PageRanks, are going to be useless against these sorts of attacks, which are just vandalism.

I strongly suspect that the next version of Movable Type will include options to enforce logons or some sort of Bayesian filtering. Possibly both.

assuming that you have an appropriately configured Apache, the following lines in your .htaccess will block the given addresses from submitting any forms (that includes posting comments). They will still be able to read from your site.

<Limit POST>
Order allow,deny
Allow from all
Deny from 61.11.26.134
Deny from 63.226.96.246
Deny from ...
</Limit>

Addresses should be numeric IP addresses. You can also block entire subnets. Consult the Apache documentation on how to do that. Access control can be even more fine-grained, blocking only access to certain URIs (cf. the Location, Directory and Files directives). This may be necessary if the spammers use GET instead of POST requests to insert their spam.

Blacklist updated, although my comments channels are generally so quiet that an uptick in traffic of such magnitute would be perversely exciting. Almost. Thanks for providing such a valuable public service.

Wistfully replying, with no hope, of course, "I'd love some o' that clementine and cranberry marmelade, Theresa," but it's not offered to me. (and I'm the only one in my house that would consume such....) Yikes Sometimes I want to do more than Live Journal, sometimes not. these times not.
Paula

Shelley Powers at Burningbird came out of vacation to post on this:
http://weblog.burningbird.net/fires/technology/mt_comment_help.htm
It's an entry called "MT Comments Help" with specific instructions/ suggestions for how you may be able to counter-attack this problem.... Hope this helps!

If i have to register somewhere to make comments HERE, I'd gladly do it. There's several places where I'd just go, "Whatever, I don't need it." but here is not one of those places. We had so much irritating personal shit on Disturbing Auctions Daily that the owner had to take it down for an undetermined while (I'm going into withdrawl, where will I find tipsy martini glasses and angelic taxidermied mice?-- yeah, I'm weird....)

Wow, I don't know Shelley Powers at Burningbird from a hole in the ground, but does that post ever reek of cranky attitude and old arguments of which I know not wot. I am, as ever, only an egg.

Unfortunately, our MT installation isn't configured to use MySQL. Increasingly, I gather that this was a mistake. I'd correct it, save that I'm not sure how to do it without rebuilding the world from a frightening level of scratch.

You saw this MT Berkely DB-to-MySQL script, though, right? I cannot vouch for its frightening scratchitude or lack thereof. (I expect the biggest problem would be preserving post IDs, but if busting all your permalinks or writing a PHP/mod_rewrite workaround is acceptable, it doesn't look too bad.)

Patrick, whom I also don't know from something in the ground: I am not a techie -- to me, all this stuff is mumblefoo. But I'm hardly "bemused" by how anyone could interpret Shelley's "tone" as "score-settling" or as being in any way in an emotional register. *I'm* writing right now in an emotional register, ok, that's clear, but it's because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky! How original! I just don't get that. I thought that, if I had MT and had these problems, Shelley's post would be helpful, and I thought it was a serendipitous coincidence that I found it on her site (which has been quiet for a couple of weeks now) tonight, just as Theresa is posting about spam attacks. Sorry I bothered.

If Burningbird's moniker were Flying Fallus and her name were Sheldon instead of Shelley, I wonder if your adjectives would have been different...

And no, I'm not having a PMS attack or anything, but sometimes something just kinda snaps.

but it's because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky!

If this isn't the single most fatuous and ignorant thing anyone's tried to pin on the guy since I've been hanging around these blogs, it's a shoo-in for first runner up, and it's competing for a special technical achievement award, too.

Honestly, it's like accusing him of being a potted cactus or a time-traveling Chinese spice merchant, which is to say it's left "merely fucking ridiculous" far, far behind (though both of those arguments are actually more defensible than yours).

Maybe, just maybe, you should consider reading a bit deeper into the blog-thoughts of someone "whom [you] also don't know from something in the ground" before you start conflating a simple statement of opinion with deep-rooted misogynism?

well I came in late but are all those IP addresses the attackers used just the ones hitting this weblog?

The distinguishing patterns of the attacks were:

1. use of wide variety of IP addresses. Correct?
2. A large number of comments in a short period(this makes sense, if you're gonna automate an attack you probably want to be hitting harder than one caffeinated neocon on a rampage)
Question: How many attacks were logged per IP?
Were these attacks randomly distributed over MT posts. That is to say comment number 1 is on an old post (if your blog has been running for a while it follows that older posts will be hit before more recent if the attack is truly random) followed by comment two on a new post, comment three somewhere in between. I seem to remember the spamming phenom discussed earlier posts began in older comment threads and gradually moved up.

I'm sorry that Yule and you all got into a comment quarrel here over the tone of my posting. I've written about a dozen notes on this problem, and yes, I guess I am getting tired of repeating myself and getting ignored.

Blacklisting is dangerous and won't solve the problem, as the script kiddies are showing with the new crapflooders, just recently pointed out. I'm sorry for the folks not using MySQL with MT, but if you are using MySQL, I really do suggest you consider shutting down comments on older posts. Might help a little, but a rewrite of the MT comment system is desperately needed now.

Look, I'm sure Shelley Powers, who writes Burningbird, is a perfectly fine human being, and as I said I certainly don't disdain any technical help or pointers. However, examples of the "score-settling tone" I was referring to include:

Did mt-blackllist work? No. As I've said before, spammers have better habits then so-called legitimate developers, because they listen to their 'customers' and adapt accordingly. [...]

The spammers have gotten smarter. Eventually if you restrict their access enough, you'll shut down comments to everyone. The only true solution to this problem is better comment management in MT. However, if you feel as clever as the spammers, perhaps you need to attend a smart people conference, come up with nifty, neato, just gee wiz smart solutions (put into the public domain of course, with the cutest little cc brand). [...]

For all the mt-blacklist users, if you're using global lists and not checking that legitimate URLs have been inserted, then chances are you're opening your system up for a poison pill attack -- causing your system to filter common, legitimate URLs, and hence making the mt-blacklist less reliable. The technique is common in email spam, as outlined by Ken Coar. Something to think of next time you import several hundred entries, depending on technology when the spammers depend on their brains.

However, makes no nevermind to me what you do. I'm just passing through.

Now, I've been crankier than this on a thousand occasions; it doesn't make Shelley Powers a bad person any more than it does me. What I said was that I was bemused by it--it seems clear that Powers is engaged in an ongoing philosophical dispute over the design and implementation of anti-spam measures, and it's a little hard for a newcomer to sort out the issues through the sarcastic comments about "so-called legitimate developers" and "gee wiz smart solutions (put into the public domain of course, with the cutest little cc brand)." For all I know Powers is 100% correct on all issues and 100% justified in being impatient and cranky. Nonetheless, the response of the newcomer is still bemusement.

Yule Heibel says "I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and...what else did you say? Oh, yeah: cranky!" First, I never called anybody "reeking" or "old", and I applied the adjective "cranky" only to Shelley Powers's post. Yanking the word "old" out of the context of my reference to "old arguments" in order to impute that I applied it to Shelley Powers's person is a rhetorical trick that would be disreputable if it weren't so pathetic.

However, Yule Heibel's root charge, that I wouldn't have remarked on Shelley Powers's post if she were a man, founders on the plain fact that I didn't know Shelley Powers was a woman until Yule Heibel said so. The name Shelley is not exactly gender-determinative; the one Shelley with whom I'm currently socially acquainted is definitely a man. In other words, while I have nothing against Shelley Powers, Yule Heibel is invited to go fly a kite.

P.S. To the gentleman who thought this might be a crapflooders attack, not it wasn't. This attack was much more sophisticated than the rather primitive script kiddies one shown in Slashdot. The mt-blacklist code should stop this one, though it may not be able to throttle the requests fast enough to not impact on the CPU, temporarily.

Last night's attack was quite ingenious for its work around of mt-blacklists capabilities, and wasn't meant to necessarily take down a machine, though enough MT users on a machine, all of whom have comments emailed, would have been enough (hence this does make this into a DDoS attack). No, this was spammers trying for enough of a hit to get scrapped by google bot before the cleanup was finished.

The actual Shelley Powers posted while I was writing the above. Welcome. As I said, I don't think your tone was anything to be bent out of shape about; I just couldn't quite figure out all the issues. I've certainly written more crankily than you when I felt I was "repeating myself and being ignored."

I seem to recall seeing, somewhere, reference to tools to automate the shutting down of posts on older comment threads. Are these tools only available for MT installations that use MySQL? As for a revamping of MT's comment system being "desperately needed," it was my impression that such a thing is slated for the upcoming MT 3.0.

Uhm, Patrick, Yule, let's leave this be, shall we? Yule, you're an angel and I appreciate so much what you did, but if folks don't read me they won't be aware of the past writing I've done, and the flack I've received because I've been critical of MT's comment system and using blacklisting technology because of the dangers of banning legitimate people.

Patrick, perhaps you might have tried making an assumption that there was a reason for whatever tone you disliked in the posting and focused instead on the fact that I was trying to help people out of a situation.

To answer Bryan: Yes, that list of IP addresses represents those attached to a portion of the spam comments posted to Making Light and Electrolite last night, up to the point where I stopped bothering to note the IPs. The answer to your question "how many attacks were logged per IP" is in the list, where it says "2x" or "6x", etc. Yes, the attacks appear to have been pretty much randomly distributed across comment threads, new and old.

closing comments on older threads isn't the solution, since the comments can be on newer or older threads, if older threads are closed then newer threads are bound to be hit more often (unless, dare one hope, the robot doing the posting leaves a site as soon as it tries to post and gets refused).

Here's an idea (don't know if it can be done in MT and it's temporary anyway)

If the comment form on an older post was inside an area the display of which was set to none, an automated solution would still find the form, post number etc. and make a post.
(note that this makes problems for older browsers, more work can be done to take their problems into consideration though)
Any IP number posting to a form the display is set to none on, is a temporary bad IP number, to be closed down for a period of 1 day, if it does the same thing three times in a row it is shut down permanently unless the person trying to post from that number contacts you and explains the problem.

I consider this to be a temporary solution because if I was writing one of these things I would want it to figure out what were the newest posts and hit those exclusively. That they are not doing this yet though, suggests that they might have problems doing it.

How does it know where to post I wonder, is there a get done first which feeds info to the post process - what do you all think?

Most all my hits are on older threads--I have taken to turning off comments after I get repeated hits on a moribund thread--a compromise that I don't like, but can live with.

I know Yule from elsewhere as a nice, helpful person, Patrick--don't let one post brown you off to her permanently.

Shelley, there was a suggestion on this weblog some time ago that (paraphrasing here) we need a way of marking some links so that Google doesn't use them in Page Ranking. I sent the suggestion on to the only person I know who knows people--what do you think?

Bryan, closing down comments on old posts will help, at least until MT 3.0 is released. The comment spammers aren't hitting the same post with a 100 comments -- they're hitting individual posts with three each, lower than a lot of thresholds. Closing down older ones for now will throttle back the problem -- most of this last batch of comments was on comments over 10 days old.

How they're finding posts is using Google to query on weblog entries based on the fact that comment forms have the same labels, i.e. URL, Name, and so on, in addition to the word 'blog' somewhere. Easy.

There's another comment spammer or spammers at work using recent updates in weblogs.com, but this person is putting comment spam in manually, changing what they write to fit your topic. At least they're not overloading the system.

Adam, it would be great if Google did use meta tags so that we could mark up links this way, but as far as I know, I don't know if Google is considering this. This does put a burden on a very lightweight bot to get more sophisticated in its processing, which tends to defeat bot technology.

Plus, I like my good commenters to get the link buzz. I hate to have to turn off for all just to get the few spammers that hit (and they are few, but annoying).

Personally I like Sam Ruby's new approach -- all comments have to go to preview first, and the person has to then accept the preview and move on. This not only eliminates comment spam, but acts as a tiny cool down period for people writing nasty, nasty things. Really elegant solution.

( http://www.intertwingly.net/blog/1682.html )

If MT 3.0 comes out, soon, with really good comment management as well as decent comment throttling (protection against script kiddies) (and I'd like to vote on including a button to pus that forces comments to go to preview mode if we wish) I think that's our solution. We can then drop the blacklists and the tweaky stuff and get back to writing. But we really are at this point dependent on that new release. Anything else is a stop gap.

You all have to remember something here -- for most of the comment spammers, they would rather not to be noticed so they would prefer to write comments on older threads. Why? Because googlebot finds them regardless of the age of the posting, and if the comment is on an older thread, the weblogger may not be as inclined to delete it. So older posts are preferable, not newer ones.

However, last nights spam commenter has all the markings of an old friend we've tangled with before, who got really pissed at the actions of some webloggers who got fairly aggressive in pushing back at him. This recent episode was more in the line of thumbing his nose at us, saying "You can't stop me no matter what you do". Ouch. But we knew this was coming.

Wow. For the first time I'm not completely unhappy that my copy of MT is busted and I'm writing my blog with straight HTML, no comments.

How do you force comments into Preview first, before posting? I'd like to do that anyway, but am not savvy enough with MT to figure it out myself (which explains why my blog has been busted for so long.)

Thanks.

And I hope that someone comes up with a good solution soon. I love reading here, and would hate to lose y'all.

Dave Shea (CSS Zen Garden/Mezzoblue) posted this (http://www.mezzoblue.com/archives/2004/01/12/mt_comment_s/index.php) yesterday. It may shed some light on what's going on with the sudden increase of comment spam.

To the gentleman who thought this might be a crapflooders attack, not it wasn't. This attack was much more sophisticated than the rather primitive script kiddies one shown in Slashdot. The mt-blacklist code should stop this one, though it may not be able to throttle the requests fast enough to not impact on the CPU, temporarily.

That was me, Shelley -- a friend reported that a tool called "MTFlood" or "FloodMT", I forget which, was used to target his Typepad blog last night. It doesn't appear to be precisely the same tool as in that Slashdot post; there is, of all things, a SourceForge project for it. If this wasn't the same group of people that hit the Nielsen Haydens (and actually it seems to be an entirely different group of people than the ones who went after Pandagon), it's a rather unpleasant coincidence.

I was looking at fooljay's MT-blacklist code, and it seems like something that could be adapted to include throttling capability fairly simply; the problem is that MT-blacklist overrides the base comment functions, so you'd either need to patch MT-blacklist directly or choose between the plugins.

And Shelley's right that the real solution isn't going to arrive until MT3, although Sam Ruby's solution seems like a good one.

By banning any robot not honoring your robots.txt, you should get rid of every spambot, leaving only the average cretin dong it manually (against whom I'm afraid you really can't do anything anyway). More interestingly, set the default file to post comments to ban people accessing it and block it from robots.txt as well. There are several ways to do this, but the important thing is: enforce your robots.txt: if they can't play nice, don't let them play at all.
Posted by: Effovex at January 12, 2004 10:22 PM
----------------- Here, a possible nightmare.

I like the "force preview" idea; can I do that in MT? I couldn't see a way to do it from the "blog author" interface (bear in mind that I am a tech-moron).

The people who attacked my site are from many of the same IP's Patrick posted earlier. I removed a link that came in from a Russian ISP around 5am, and an hour later the flood happened using many many of the same host urls.

I like the force-preview idea too, but I haven't yet found a site that explains how to implement it in a way that I quite understand.

Just to bring newcomers up to speed, I'm not afraid of code, command lines, HTML or CSS, but I start getting a bit worried when confronted with great wodges of incomprehensible Javascript with minimal explanation for us laggards.

I haven't read through the rest of the comments, because my mouth is still hanging too far open after reading this:

[...] because I'm pissed off that when a woman writes factually, she's *perceived* as reeking, old, and ...what else did you say? Oh, yeah: cranky! How original!

And all I can think of is the sheer amount of gall it must take to say that to Patrick, of all people. On Teresa's weblog, no less.

So, stick this in your pipe and smoke it: I'm female, and I thought the writer was cranky, needlessly sarcastic, and holier-than-thou. How's that grab ya? "I told you so, I told you so, and you're too stupid to understand why!" is the overall tone I take away from that piece, and I don't care if the person who wrote it is male, female, or a trisexual creature from the planet Antares, that's still annoying.

Implying someone is obviously just bashing a writer because she's female just because the person with the complaint is male is, in my book, pretty darn sexist, which would, I believe, qualify as irony in this instance.

(Now I will go back and read the rest, and probably find this has all been settled peacefully while I was away, with my luck, but... honestly!)

I had the same result from the mt-close plugin. So I started closing them manually. A chore almost as much fun as sorting grains of sand by colour. While I was thus occupied, I recieved, deleted, blacklisted and banned two spam comments and noted MTBlacklist deflect a third, all apparently aimed at an entry of mine titled "Countering Blogspam" from last October.

By the way, for what it's worth, I read all of the older Shelley Powers posts to which she provided links, along with more recent posts to her weblog Burningbird, and it was all very interesting stuff. Good photography, too.

Changing the labels on your forms helps, but what I've done is change what the 'bot/spammers are looking for: "mt-comments.cgi." I've renamed mine something tangential to its actual purpose, and then used the provided place in mt.cfg to point to the new name.

This doesn't stop to individual cretin pasting in his spam into the comments one at a time, but it does seem to thwart these mass bombings.

Y'know, Yule, I was inclined to cut you a lot of slack because I figured you were some variety of generally benevolent Programming Guy (not a gender-specific term) who wasn't necessarily familiar with the idea that tone can be read separately from the (pick one) [main argument] [ostensible content] [obvious surface information]; and besides, you were posting late at night, after the H.G. had gone to bed.

I'd half-drafted a post in which I explained to you that Patrick, who in a practical and constantly reality-tested fashion reads texts for a living, was predictably distracted by an article in which the strong emotional charge evident in some passages has no obvious connection to the straightforward exposition that constitutes the majority of the piece. He'd had a very characteristic editorial reaction: Hmm, there's something else going on in that piece.

And I figured that since (as I thought) you weren't familiar with the idea of reading tone separately from content, you'd mistaken Patrick's mildly snarky but essentially innocuous remarks for a slur upon Shelley herself. She's obviously your friend, and a friend is never a bad thing to have. I thought that perhaps you'd become frustrated by your mistaken attempt to map Patrick's (supposedly critical) remarks onto Shelley's main argument, where of course they didn't fit; and that this frustration had given rise to your second salvo, in which you accused Patrick of malfeasances unsupported by the text (f.i., misogyny), and contradicted by the text (f.i., old in connection with anything except arguments).

But today, when I went to double check, I discovered/remembered (I'd known it, but had somehow misplaced the information) that you're Dr. Heibel, a resident of Vancouver, and have an advanced degree in Art History (Harvard '91). The advanced degree in a non-technical discipline and the three-hour time difference have done my exculpatory explanations no good at all. I'm in the market for new ones.

Meanwhile, over at Burningbird, there's Shelley morosely saying that this "shows the dangers of writing in anything other than the most non-emotive manner," which I think is a far more dispiriting reflection than the situation warrants. In fact she's an excellent writer (I've admired her writing for a long time now), and here she is in the midst of a lively ongoing conversation (and much appreciated for it); so perhaps, just perhaps, last night wasn't the sort of occasion you imagined it was?

The top of my wishlist for an upgraded comments system would be a checkbox at the bottom of the post-comments form that said, "E-mail me when someone posts to this thread." That way I wouldn't have to check threads manually to see if there's been an update.

Datapoint: My blog is on the same hosting provider as TNH/PNH, I haven't had any uptick in spam. I get a few spams per week.

I like the "force preview" idea too. In the meantime, if Mike's right about them targeting

post a comment name: email address: url: remember personal info?

changing the wording of those prompts is an easy way to make spammers' lives harder.

We don't have to make it impossible. We just have to make it more trouble than it's worth.

Oh, and Shelley? Thanks, in a weird way, for saying this wasn't just vandalism. I know it means we're dealing with a more sophisticated opponent, but I was disturbed by the idea that a mess of this magnitude was just some random dweebs amusing themselves.

Teresa, the subscribe capabilities I'm looking for aren't something I want as a spam-blocking measure; I want it as a convenient way to keep track of discussiosn I'm involved with on other people's weblogs. Right now, in order to keep track of them, I have a complicated system of URLs that I check daily, with older discussions that appear to have died out moved to weekly checks to see if they revive.

The Ultimate BBS allows users to subscribe to individual discussion threads by e-mail, I'd like to see the same capability added to blogging systems, including MT.

Short version: Using Bayesian filtering to cut out the spam. No experience with using it for comments, but if it is as good as my email filter (Keir.net's K9) then 97% or better accuracy can be acheived. Worth a look as blacklists have their flaws, as others have pointed out.

Hm, what about a passport-like single-sign-in backed by phone-based human-run human tests, where signing up requires calling a 900 number. It's self-funding. It puts a price on accounts, and thus a limiting factor. And individuals are willing to pay because their signing will work across a whole network of blogs. The price could be high ($15, say), with a partial rebate ($13 back, with $2 to cover the cost of running the thing) for posting at least m comments which a blog owner in the network (of course, these would have to be vetted) "commends" as human-generated and insightful to the network operator.

Of course, it won't work for email, because email is the killer app; because there's no network of email receivers (everyone gets email). But for comment spam, it's a thought.

Also, it's horribly unfair to the poor, but I'm not sure that something couldn't be done about that. I'm not sure what, but something.

It's probably easier just to have the registration form impose some "cost" -- in terms of effort, or time, or complexity -- that's negligible for a human being but difficult or impossible for an automated system.

One of the most common tricks is to display as a graphic some characters that are easy for a human to read but hard for a computer to figure out without sophisticated OCR, and require that you retype those characters in the registration form. Even email confirmation would probably cut down on false registrations, since the spammer would also have to continually generate new working email addresses.

The problem with all such schemes is that they either don't work or they hand total control of the universe of discourse to the signing authority for the authentication keys.

There are two POST events with a comment; the one that says 'I want to post a comment' and the one that says 'this is my comment, please stick it up on the weblog now'.

Enforcing a time difference commesurate to amount of content in the post between those two events gets you something; enforcing non-simultanaeity of commenting from a single IP gets you something, too. Check the email address given for existence, and, if the email address hasn't commented before, require email confirmation of the intent to post.

Cache the IP associated with the email address; if the next post with that email address isn't from that IP range, refuse it. If any of the URIs in the post aren't valid, junk the post. If any of the URIs in the post are on a blacklist, junk the post.

This is a shedload of overhead, but it doesn't require a signle central authority or the idea that there's going to be a single elegant solution.

I was hesitant about coming back into the comments and responding to the original 'tone' thing associated with my post, but I did want to agree with you, Teresa, that Yule is a very good friend. More than that, a brave woman who has seen me get kicked around a bit by the tech community, and hasn't like it much. (Well, neither have I to be honest.) A good person.

But I think that the original topic of this post and all of our interests should return to center front and tone and lack or abundance thereof slide back into the obscure corner from which it belongs.

Graydon, my favorite comment spammer, whose signature I tend to recognize now, actually worked around the block of delayed time between accessing the page and posting the comment. And with the recent spam attack, the spammer spoofed different IP addresses between posts to a comment thread, which overrode this hand coded block.

As for caching an IP address with an email, that didn't sound right. Not everyone has a broadband always on connection with the same IP address. In fact, IP addressing schemes are pretty dead as a solution at this time, too easy to manipulate.

And David, yes we've talked about graphic challenge systems, as novalis also discusses. Unfortunately, these are not workable for the visually impaired.

Another thing to keep in mind is maintaining a sense of perspective about all of this, something I think novalis is hinting at. I think.

I didn't see this elsewhere in all the fooflah, but I guess LiveJournal has a challenge image (type what's in the image -- CAPTCHA) that also has an audio component for audio only browsers used by the visually impaired. Now that is supremely cool -- right up there with Sam's forced preview (which still appeals for the 'cool down' aspect if no other tech reason). These two combined could almost be the perfect comment spam killer. I won't say is the perfect comment spam killer -- but close.

Okay, so I put in a graphic challenge system, and I put a note on it telling people who are visually impaired to drop me an e-mail and I'll register them for the site. It's damned inconvenient, but it's better than having no comments threads at all -- which is what I'm looking at if we can't find a way to keep this garbage out.

During the last wave, I got a look at a sort-of-weblog, some kind of corporate site, that had left its comment threads open but hadn't been cleaning out the spam. They had spam messages piled six or seven deep. It read like the late-night conversations of a window full of department store mannequins.

On another note, it seems to me that if the purpose of all this is to get googlejuice for the beneficiary site, it would be awfully helpful of Google to announce a policy of not rewarding comment spamming.

I figure comment spam is like graffiti. You can make it harder to put up graffiti, but you can't make it impossible -- ask any archaeologist. So you do your best to make it harder to post, knowing as you do so that some will go up anyway.

If you leave it there, more graffiti and more elaborate graffiti will be added thereto. But if you make a point of scrubbing it off or painting over it as soon as it appears, its incidence will decrease. If a few minutes' work with some spraycans means your tag will ornament your neighborhood for years go come, there's maybe some point in doing it. But if you know it's going to be gone in a few days, the costs and risks outweigh the rewards.

We may never come up with a perfect mechanism for deflecting comment spam, but a bunch of imperfect mechanisms could help keep it under control.

The discussion has moved on a bit (I'm very dubious about using Bayesian spam tools in weblog comments, but some of these suggestions are really good), but I can report that my concern up the page about the ease of moving a Movable Type installatio from Berkeley DB to MySQL were, as far as I can tell, completely misplaced. Good news for those of you considering it.

Shelley, I'm curious as to how Spammer X got around time limits. I understand if you don't want to trumpet the technique to the world, but would you be willing to email me?

Graydon: Cache the IP associated with the email address; if the next post with that email address isn't from that IP range, refuse it. If any of the URIs in the post aren't valid, junk the post.

Both of these would have blocked legit comments from me recently. I post from office and at home; sometimes I post while on vacation too. On the second one -- well, yesterday I was posting something with a link to a site that just happened to be down for a few hours when I was posting.

I'm not convinced that having a central authentication system is necessarily a bad thing, if it's done well -- it certainly produces advantages on Livejournal, and I think it's likely partly responsible for what the fact that, amongst my friends with similar blogs, the ones on Livejournal get more comments. On the other hand ... ah, now here's a neat trick: the problem, to a large extent, is that it places too much power in the hands of a single auth server. But, if we instead implement the authentication via public-key encrypted tokens, you only need the auth server once to get a token, and everything after that is out of the hands of the auth server altogether. (This makes it a bit hard to revoke tokens, though, but if they're not easy to get in duplicate, blacklisting might work again.)

Problem is, how do you keep someone from stealing tokens? Can we tell the difference -- not just in this scheme, but in general -- if someone decides to borrow names and email addresses from existing posts to post with? (With the tokens, you have to trust the blog owner's server with them, certainly, even though they wouldn't get publicly displayed.)

You could, I suppose, have some method whereby one's token is also a public key, and the comment submission contains something like the url of the blog or the IP of the submitting computer or the time of day or something encrypted such that that public key will decrypt it, so that a swiped token is essentially useless. This, however, requires extra hardware or software for each user to calculate the tokens, and this sort of infrastructure becomes quite unweildy even in near-homeopathic doses.

So, I'm not sure if that's any closer to a solution or not. Ah, well; maybe someone will find it interesting.

On a completely different note, it sounds like from what I've heard from various sources that Google's whole pagerank algorithm is being broken (or at least badly bent) by weblogs in general, even if one only looks at legit posts and comments.... It will be interesting to see what they come up with as a response to it; my guess is that whatever it is, if it handles that problem it will also probably stop rewarding comment spamming as a sort of inherent side effect.

Ach, Shelley, I wasn't ignoring your post; it came in while I was composing my first post that followed it, and then I added a postscript without checking the accumulated thread. And I agree; a visual challenge plus optional audio, plus forced preview, could make it charmingly difficult to automate the commenting process.

Still thinking about the social surround. One of the reasons I'm willing to consider the combined use of multiple anti-spamming measures that individually will be only partially successful, or will only be effective for a limited time, is that they'll impede and collectively may even stave off the formation of a community of spam technicians and their clients and customers who once created will have a strong interest in the continuance of comment spam. It's like rioting and looting: you want to discourage it by any means possible, both to keep people who aren't currently rioting and looting from getting the idea, and to keep people who are already rioting and looting from getting good at it and starting to form networks.

Any central authentication authority is really, really vulnerable, and web-of-trust schemes -- while not technically difficult -- are more annoying than most people will tolerate and vulnerable to spoofing and fraud.

I wasn't thinking of authenticating a URI through trying to hit it, I was thinking of DNS for starters; if you try to hit the URI, you've created a DDoS tool.

Cassandra --

Of course they use Google to generate spam, just like telemarketers use the phone book.

There isn't any solution to that one, short of never having an index to anything.

This authentication-by-email scheme has promise. I envision something similar to challenge/response:

User "Bob" posts a message, including a valid e-mail address in the header. He receives an e-mail at that address, indicating he has to click a link to activate the message. When he clicks the link, the message is activated.

Next time he posts, if he gives the same name, same e-mail address and posts from a computer with the same IP address, the comment goes live immediately. If it's a different IP address, it's only a minor inconvenience: he gets another confirmation e-mail.

Left as a challenge for the student: How do we prevent spammers from automating responding to the e-mail challenges?

Brooks Moses: On a completely different note, it sounds like from what I've heard from various sources that Google's whole pagerank algorithm is being broken (or at least badly bent) by weblogs in general, even if one only looks at legit posts and comments....

In what way, broken? If lots of bloggers link to a page, and that page's pagerank is elevated, then isn't that a case of Google working as it should? Googlebombing doesn't seem to have any significant impact on real searches, it's a novelty, a game.

I have seen some pollution of Google search engine results recently, though, where I type in a search term and the top-ranked results are simply lame search engines or indexes festooned with ads (and now I can't think of any examples).

What I was going to say, before I was distracted by the opportunity to make a cheap joke, is that another problem with the challenge/response system I described is that e-mail is pretty unreliable. Challenges would get lost.

However, a challenge-response system would work if you simply left unapproved messages in a staging area, rather than deleting them outright. At intervals, the blogger could come by and review the messages in the staging area, releasing any legitimate messages for publication and approving the authors of those messages for future posts.

Stacking a batch of weak anti-spam defenses won't stop professional criminals, but it will stop casual vandals. The professional criminals have nothing better to do with their time than automate ways to navigate around the weak defenses.

Better to have two very separate layers of defense: One really good tool to block them (like registration systems which require some sort of human interactivity to navigate) and a really good tool to get rid of the vandalism fast when it gets in anyway. It sounds right now like MT has neither, which is a shame. (I doubt there are any off-the-shelf journalling rools which are any better, though.)

As MT bloggers go, I'm very small fry, but there are a few subjects where I seem to get sorted to the top on Google (not to mention the one comment thread that is linked to on a thousand popular image pages), and I've had precisely zero comment spams.

Why not? Because I set up MT with mod_perl, so my comment submission script isn't named mt-comments.cgi. I know some people have had problems changing an existing blog to use a different URL, but this common element is what made MT such an easy target.

Any CGI script name that gives 1.8 million hits on Google (even now, months after the first big attacks) is too good to pass up, if you're evil and lazy.

Or, evil and smart -- which is often a good thing. "What are all good Sysadmins? Lazy."

T - I must pass on the marmalade, though it sounds like the perfect topping to a pre-century carbloading.

Changing the CGI name is security through obscurity -- but so is parking your beater bike near, but not next to, the guy who's parked a $1200 bike and locked it with a cheap lock. Not that I'd ever do that.

Mitch, The Register--or maybe it's just someone at The Register--has been on a bit of a tear about weblogs distorting the Googleranking system. You can see examples here, here, and here. Here's Wired on the same subject.

Basically, Googleranking-type systems assumed a model of links from fixed sites. Then along came weblogs, which throw off links in all directions as part of the conversation. It gives us a lot of googlejuice. Type "electrolite" into Google, and four of the first ten results will be Patrick, which means he's beating out a popular REM song.

Huh. I just found out where you got the title of your weblog. That's weird.

The Register makes the whole googlewarping thing sound mildly apocalyptic, but I'm not sure that's justified. Nor do I think all weblog links and posts are trivial or ephemeral. For instance, that piece I did a while back on animal hoarding gets referenced by animal protection fixed sites.

It's curious that two other sites I often visit--Pandagon and Brad De Long--have also been attacked through their comments in the last couple of days. Is there some connection among these attacks, or is this just the internet spaamming season?

I've been involved in an online discussion group focused on a fairly obscure '60s group (Quicksilver Messenger Service) and its guitarist John Cipollina -- mostly the modern equivalent of concert tape exchanges, but sometimes interesting dialog -- but I've had to stop getting their daily digests because of all the porn spam posts. They had a brief exchange of emails about the problem, but nothing happened to fix it (not a very techie bunch, I guess). That group belongs to AOL, and my question is: Why don't the big online companies try to develop their own fixes? Do they want to see their discussion groups melt down? Last summer, AOL's "solution" to spam was apparently to block everything from my own server, Juno, which caused me no end of trouble when Locus was using AOL as a temporary mail drop. Can't they do better than that? (I know the US govt. certainly hasn't, with its useless "anti-spam" legislation. I still get all the Viagra ads.)

Pardon my techno-ignorant bewilderment, but this seems like a problem for more than bloggers.

I knew without even looking that all three of Teresa's Register links were to stories by Andrew Orlowski, a journalist who seems to have a massive hate-on toward blogs and their eeeeeeeeeevil effect on Google search results. I can't pretend to understand the issue in depth, and FOR ALL I KNOW ORLOWSKI IS ON TO SOMETHING (phrase emphasized so we can easily look back to it when his nearest and dearest turn up ten minutes from now to yell at me for this), but my impression (note that word again) has been that he's a little quick to dismiss the entire blogosphere as containing nothing but froth and "noise" that interfere with the Serious Work of search-enginery.

It occurred to spouse and I that you could just html-comment out the "post" button on the initial comments screen if you wanted to force a human user to preview (and I think I will do that), but if I understand the issue that would have no effect on a bot going after the comments.cgi script. If that's correct, no amount of force-preview is going to work if the bot can sniff out the names of your cgi scripts because one of them is going to have to post to the blog.

I think I will turn off comments on posts 30 (15?) days old, and maybe rename comments.cgi; if enough MT users do this, and use different alternative names for comments.cgi, the spammers' numbers game becomes significantly less rewarding.

I don't want to use any kind of log-in or email verification if I can avoid it, because I like casual comments. What I would like, though, is a CAPTCHA system which puts the text-graphic next to the comments box, so a poster only has to type a few extra keystrokes before hitting "post". That would not be too much trouble even for a lazy bastard like me. Does anyone know how I might go about acquiring such a thing?

Rea, if you follow the links from the words "see previous round" in my original post, you'll get a good picture of the last round. These things come in waves. This latest one seems to have targeted fewer weblogs than the last one, though there've been more hits per weblog.

Faren, a lot of us wonder the same thing. How often do you see legitimate businesses advertising via spam?

The idea of a "fixed site" seems distinctly odd to me. I mean, I know what TNH means by it, but I view all web pages as being permanently under construction. A blog merely changes a bit faster/more often than a "fixed" site -- and only the front page of the blog, at that.

"What I would like, though, is a CAPTCHA system which puts the text-graphic next to the comments box, so a poster only has to type a few extra keystrokes before hitting "post". That would not be too much trouble even for a lazy bastard like me. Does anyone know how I might go about acquiring such a thing?"

James Seng has written a script here:

http://james.seng.cc/archives/000145.html

I'm testing it now, but having a few errors kick out. Others have implemented it to great success.

sennoma - One simple way to force a preview is have the post-editing text include a hidden entry element, which is initially empty -- but, if you get to that window via previewing a message, that hidden element contains a token. Further, the "post" script acts just like the "preview" script unless it gets a valid token that hasn't been previously used.

TNH - I'll check out the links you sent -- thanks for forwarding them, but my impression prior to reading the articles is that blogs don't hurt Google, blogs help Google by pointing to interesting sites.

I know that Patrick's blog is the first link you get when you type "electrolite" into Google -- and yours is the first link on "Making Light." I know this because I frequently use this method for accessing your weblogs, because my fingers only recently memorized the full URLs.

So if Patrick's weblog comes up in Google before a reference to the song, well, that's an example of Google working. For me, at least, since I am not interested in the song -- didn't even know there WAS a song until just now. And if more people are interested in Patrick's weblog than the song, then Google is working right for those people, too.

Much of the Google-is-broken-the-sky-is-falling criticism I've seen falls into two camps: (1) Google is broken because it ranks OTHER PEOPLE'S SITES higher than MINE! and (2) Google is broken because it fails to conform to some arbitrary mental model I have of how a search engine SHOULD work.

I am reminded as I type this of some of the criticisms of publishing that I've heard you describe: (1) Publishing is broken because they didn't publish MY book, but they DO publish that crap by . (2) Publishing is broken because the business model is excessively byzantine.

Anyway, as long as people continue to be able to find what they're looking for using Google, I wouldn't say it's broken.

I did read an article a few months ago giving some real concerns about Google. Many of them seem legitimate. Unfortunately, I only remember one: if you type in the name of a product looking for evaluations and reviews and information about that product, you won't find those -- instead, you'll find links to places to buy the product.

"One simple way to force a preview is have the post-editing text include a hidden entry element, which is initially empty--but, if you get to that window via previewing a message, that hidden element contains a token. Further, the "post" script acts just like the "preview" script unless it gets a valid token that hasn't been previously used."

I don't want to be dumping on Brooks Moses, and I am very grateful for any and all free technical advice, but this is a perfect example of technical discussion that hovers just slightly beyond the level that would be useful for, say, me.

I recognize that Brooks is carefully generalizing with terms like "token" and "entry element," so that his advice is useful for users of more than merely Movable Type, but no matter how hard I stretch my tiny brain I can't quite deduce from it what I should do to, specifically, my MT comment templates. I suspect that at least half the weblog maintainers who are reading this thread are with me on this.

I don't need technical language dumbed quite down to the "click on the bunny" level, but.

E-mail addresses and IP addresses simply aren't a bottleneck. Spammers have cracked thousands of machines, and can have as many email addresses as they like (SPF is likely to improve the latter somewhat, but will probably not make a big dent until 2005).

Time delays aren't a bottleneck either -- there are enough blogs (1.8 million hits for mt-comments.cgi) that zombies can just wait it out *while posting to another hundred blogs*. Plus it penalizes people who type fast. And it only takes a short post to get googlejuice.

I agree with Brooks Moses's comments about LiveJournal.

And I agree with Graydon that there's an authority centralization problem with these systems. I actually can see away around it -- sell boxes which *are* registration and authentication servers, charge a fee to enter the network, and do DNS-style lookup of users. Kick registrars out for tolerating spamming. The problem with this, which is unsolvable as far as I can see, is that spammers will try to kill the registration servers by overloading them with fake credit cards. High (more than a few percent) rates of chargebacks get merchant accounts cancelled by credit card companies. I don't know if the same is true of 900 numbers, but AT&T got out of the 900 number business because of the high rate of chargebacks (or so said the rep I just spoke to).

OK, here's my next theory: LJ used to require you to have a friend who was a member to give you a code. You couldn't become a LJ member unless you knew one (or paid, but that bit's not important). Build a distributed version of that (basically identical technically to PGP's web of trust, but with different social semantics of key signing) Bottleneck: personal connections to other humans.

The difference between building an anti-spam system and building most other systems is that spammers are willing to destroy the whole system if they don't get their way.

Brooks Moses, PNH: if I understood Brooks' comment, you need to do more than tweak templates to implement that version of force-preview. You need the preview script to write a token (random string?) into the comment text, and the post script to search for a token and post iff it finds one. I'm not quite sure about "hidden" -- "present and viewable in source code but not displayed by browser"? I'm not sure why it needs to be hidden, since the post script could strip it out before posting. But then, perhaps I don't geddit after all.

sennoma, the idea is to use a "hidden field", which is an HTML way of having a field the cgi script can use to in effect pass a value back to itself. To the cgi script, it's just like any other form field, except browsers don't display it to the user.

Currently, the comment page has 5 visible fields: Name, Email Address, URL, Comments, and Remember info. All of these are setable by the user. Mozilla's "View Page Info" command shows that it also has two hidden fields, named "static" and "entry_id", whose purpose I don't know. Brooks is suggesting adding another hidden field, which is filled with a random value. The script would keep track of the random values it has generated recently, and would require the presence of such a recently-issued value in order to post a comment.

The trouble with this is that all the spammer's script has to do to get past it is retrieve the comment preview page, extract the hidden field, and use it to post an item of spam. It does roughly double the CPU and bandwidth requirements to post a given amount of spam, but I doubt that's enough to get the spammers to stop.

Patrick - Apologies for that bit of conversation being a bit beyond the useful level, but ... well, easiest way to explain it is that there's a reason I want a job as a paid theoretician when I graduate. In that particular case, I have no idea if the suggestion can be implemented in a MT comment template, or even if it can be -- so it wasn't a failure of the phrasing that made it useless, so much as that the idea itself didn't contain enough to make it implementable. Mostly I posted it in hopes that it would spur someone else to do all the hard work of figuring out how to implement it and whether the implementation actually works.

On the other hand, I'm now back from lunch, and have thought about it slightly more in the interim, so here's a possible implementation (perhaps overly dumbed down, but some of that is cover for the fact that I can explain what something does better than I know the correct jargon for it)....

All this stuff is handled by HTML forms, which work by listing out a bunch of "inputs" (text inputs, checkboxes, etc.) that get their values returned to the server when the "post" button is clicked. Aside from the visible inputs, there's also a "hidden" input type, which simply returns the default value listed in the HTML code.

So, one adds an extra hidden input to the form, which we could call "PostToken". In all of the HTML forms that are on pages other than a "preview" page, this input has a "blank" value. However, on a "preview" page, the input is given a "valid token" value. ("Blank" and "valid token" will be defined later.)

Now, your message-post script receives the value of PostToken (along with all the text of the message being posted and so forth) when the form is submitted, and it checks to see if it is or isn't a "valid token". If it is one, it posts the message; else, it returns a preview page (which, of course, includes a "valid token", so that if one clicks "post" from there, the message posts).

The definitions of "blank" and "valid token" can be a wide range of things. Simplest is something like defining "blank" as 0 and "valid token" as 1. It's relatively simple to write a script to fake a "valid token", but people are unlikely to do that. You can make things more complicated by making the "tokens" unique semi-random strings, and keeping a list of the ones that have been given out, and only considering a string to be a "valid token" if it's on that list and hasn't been used to post a message previously. The simple method is likely sufficient.

For sennoma's question of "why hidden?" -- if it's hidden, the users never have to worry about it or even see it -- whereas, if it's in the comment text field, they have to see it and be told not to delete it.

Probably this would require more than just a tweak to templates, but an actual rewrite of the core software.

Reimer Behrends: You commented about forcing all URIs through a redirector and lamented that "The downside is of course that URIs become longer, less readable and require an additional server access (unless you augment the A element with Javascript)."

You might consider an approach like that used at tinyurl.com (which I've seen implemented one or two other places, too). You still have the extra server access, and unreadable URLs, but at least they're short.

The renaming of the comments cgi script is a good short term strategy, however it will be easily circumvented by script kiddies by scanning the source code for any cgi scripts and hitting those. Very valid as a short term solution however.

A longer term strategy is to only allow search bots to see the archives without the comments. Thus the practice of spamming comments fields to be google-ranked becomes a useless exercise.

Ralf, IMHO, changing the name of the CGI script not only raises the bar a small but measurable amount for the spammers, it also makes it harder for them to discover that you're a comment-accepting blog in the first place. The latter doesn't help the well-known blog sites as much, who can be discovered other ways, but it's a very effective defense for the small fry.

My other idea, which I implemented as an MT mini-plugin, was to break all links by default in a way that humans can reverse but Google doesn't. So, "pornospam.com" becomes the zero-value "pornospam-DOT-com", unless the blog author (or trusted commenter) manually adds an approval token to the field.

Many journalists have problems with blogs because they're sort of like journalism, but not really. These journalists believe that it's dangerous to publish information without first going through the standard journalistic process of editing, fact-checking, looking for corroboration, etc.

I'm inside the sausage factory myself, and I don't agree with the devaluation of blogs -- indeed, sometimes when I hear a journalist denigrating blogs because they fail to uphold the high standards of journalism, I want to respond: what high standards are those, Bunkie?

The high standards of professional journalism brought us the stories of Wen Ho Lee, how Al Gore claimed to invent the Internet, Fox News, embedded journalists, and TV network news interviews with the latest Survivor castoff.

Other journalists denigrate blogs because blogs are often of limited interest. They often don't get much traffic, often the items posted on the blog are of interest only to the blogger himself and a few family and friends. To whic I respond: Yeah, so, what's wrong with that? If the blogger is happy, and his family and friends are happy, why should anyone else care? It's not like the blogger is using up precious and scarce Internet resources which could better be used for-- what? bigger, blander corporate web sites?

Me, I'm as troubled as the next left-leaning latte-drinking intellectual by the excess of consumer culture in America -- and therefore I am delighted that blogging has allowed so many people all over the world to create, rather than consume. So a lot of it is bad. Who cares? Who's hurt?

Finally, journalists will often publish the most exaggerated claims about a new technology, and then declare the technology useless when it fails to live up to those claims. Thus, in 2002 the journalist would find some yob who claims that everyone in the world will soon be blogging and blogging will replace e-mail -- and now the same journalist will write a story about how blogging is a failure because neither of those things have come close to happening.

And now I've read the articles Teresa linked to. Or, rather, I read a couple fo them and skimmed a couple of the others.

It's an interesting notion: the idea that bloggers are poisoning search results. I'd like to see some evidence, though. I think there's only one hard example in all four of those articles -- the phrase "second superpower" turning up an online essay rather than a New York Times article where the phrase actually originated -- and the author of the article never (1) demonstrates evidence that this page-ranking is due to links from A-list blogger or (2) shows why this page-ranking is a bad thing.

Google keeps its page-ranking algorithm a secret. We know that they generate rank based on the number of inbound links to a particular page, we know that Google ranks links from important pages as being more significant than links from less-important pages. Beyond that, we don't know much. In particular, we don't know how Google determines links from some pages are more important than links from others.

There isn't even a consensus among Google-watchers on whether keywords matter in meta tags.

I can't take the "blogs poisoning Google" claims seriously, mostly because I've gotten used to them giving top rank to phony pages dynamically generated for the sole purpose of directing traffic to shady retailers and "pay sites of dubious merit".

If I use any of the major search engines to find a user review of a product, I'd love to find a blog entry; I could use the rest of the blog as a sanity check. Instead, I usually get three things: Amazon, hole-in-the-wall dealers who parrot press releases, and traffic-directors with no real content at all.

I suspect blog sites are easily found through the blog indexes anyway, rather than googling them. If you're going to put the time in to make blog-comment spambots, you're going to use the best resources to find targets. Then scanning all blogs' page sources for cgi script links and traversing these to find comment like forms is really quite trivial.

The URL breaking is a good idea though, particularly if combined with a very easy to use click mechanism. For example, take the URL and turn it into the phrase you mentioned followed by a button to open the link. Example Google doesn't index the URL from a form, so linking is now safe.

Many journalists have problems with blogs because they're sort of like journalism, but not really. These journalists believe that it's dangerous to publish information without first going through the standard journalistic process of editing, fact-checking, looking for corroboration, etc.

You're kidding, right?

I mean, you already cover some really good reasons why this is just plain silly, but, since when was "blog" supposed to be "news source" anyhow? Even if I do get pointers to news through blogs sometimes, my personal understanding at blogs is that they're one person's opinion (or, occasionally, several people's opinions) and interpretation of things, not The Absolute Truth, and with a few exceptions here and there, that seems to be how they're presented, too. So just where does this come from?

Sometimes I wonder just what kind of crack people are doing out there...

First, hidden form field, been there, done that. Spammer started scraping html page, grabbing field, and then used it in posts. Time to break through idea? One day.

As for 'forced preview', there is no code involved. How to implement? Well, look below this comment form. See the POST button? Remove it. Then what happens is that the comment must go to preview, and from preview to post. That's what Sam Ruby is doing.

Sam is also looking at other conditions (see http://www.intertwingly.net/blog/1681.html) but even something as simple as removing the POST button disrupts the simple posting of a form -- it forces a response/challenge (post form, to preview, then to post).

As for MT 2.66, the IP throttling change they made would not have stopped the recent comment blast based on IP address, but would stop the script kiddies so it's a start. Won't keep the smart ones out but will at least keep out the annoying dumb ones.

And I will not support the redirect functionality they've included, and I sure as heck hope they give us an ability to turn this on or off. And a perfect demonstration, to me, of the difference between the spammers use of social software techniques, and the 'legitimate' developers use of clever technology.

PS Steve, nothing secret about the approach of timed access -- they posted a request for the post entry page, and then used a timing function to post the comment posting within 1 to 3 minutes after the initial page request.

I currently preview only when I've used tags. This means typos get into my comments when I don't use tags. I keep meaning to preview every time, but...add that to the list of things I keep meaning to do (hint: it would take Letterman WEEKS to list them).

Forced preview? Good thing. Never mind spam. Good thing on its own merits.

I'll install the MT 2.66 updgrade when I have a few minutes (I'm at work), but for now I'll note that I don't quite understand the "redirect functionality." The throttling sounds like a better-than-nothing feature, though.

I think J Greely and Ralf are close to something important with the "link breaking" idea.

What we need is a way of annotating a link to tell search engines that the owner of the page doesn't "endorse" the target of the link. Something like <a href="http://www.microsoft.com" noindex=true> where the "noindex=true" means "Google and the like, this link is not approved of by the owner of this page, so don't count it in your influence graph".

Then in the blogging software, there'd have to be a way for the blogger to specify that certain commenters or specific comments are "trusted", with a default of "untrusted". When the blog software lays out a comments page, any "untrusted" links would get rendered with the "noindex=true" attribute, telling page ranking algorithms not to assume that the owner of the page endorses the link. You might also want to be able to configure the blogging software to distinguish trusted from untrusted links, by color or font maybe.

If search engines such as Google were to respect "noindex=true", and blog software were to start annotating links that way, then comment spammers would gain very little benefit, and would have to go back to mugging their own grandmothers.

There's a bit of a chicken-and-egg problem, but blog software could implement this today. Browsers are supposed to ignore tags or attributes they don't recognize, so "noindex=true" would have no effect on browsers. Once there were pages out there using "noindex=true" links, Google would be more likely to make use of it.

Link breaking doesn't have to mean that you can't click to follow the link, just that search engines can't/don't index on them, as shown in my example earlier.

On mailing lists I administer I use the policy of having new participants moderated until they prove themselves 'friendly'. That obviously works well when there is an ongoing login system, not so good for repeated walk in traffic.

I'd like to really see a Bayesian filter to mark posts as 'not likely to need moderation' vs 'probably needs to be moderated', and then publishing/holding-off on these accordingly.