When I saw the subject header, my first thought was that it might be useful if a few experienced oldtimers posted their current lists. User-Agents come and User-Agents go. But like the man said, the Ukrainians you will always have with you. And if you remove someone from the list on the grounds that you haven't set eyes on them in several years, you can bet your britches they will show up again next week.

An equally important question is how you're blocking them. For simple text matches my preference is for mod_setenvif in conjunction with mod_authz-whatever:

That's assuming you blacklist: allow everyone except the folks you explicitly lock out. A few brave souls use whitelisting instead: lock out everyone except some specifically authorized user-agents. It depends on the purpose of your site and the nature of your target audience.

Urk. Is this happening in htaccess or the config file? I think you said htaccess. If so, the rule will run much more efficiently if you replace all the pipes with separate Conditions ending in [OR]. (Put it in a text editor and you can do this bit with a single global replace.)

You've now got a rock-and-a-hard-place dilemma, because (a) long lists are vastly easier to keep organized if you maintain them in alphabetical order (b) when there's a long string of RewriteConds, you should list them in order of most-likely-to-succeed. (Or most-likely-to-fail, in the case of an ordinary AND-delimited list.)

Short of reading through tons of server logs, I'm not sure what the best way of obtaining an up-to-date list of bad user-agents would be, or even if this is the best way to tackle spammers and scrapers these days. There's nothing useful in my hosts awstats other than a few host names/IPs. Am I better to block by IP?

Is there a useful list of known offenders anywhere? ... a good cut-and-paste htaccess code block anyone can recommend? Do I really have to re-invent the wheel and make my own list?

User Agent blocking is one component of a good arsenal. Some UAs will never be up to any good, so it is easiest to block them globally by name. You can't do much about intelligent robots that pretend to be someone else-- unless it's something obvious like a mismatch between IP and UA. But fortunately most robots are quite stupid; it never occurs to them to claim to be something other than lib-www-perl or what have you.

311 seems excessive, though. You can almost certainly make the list a lot shorter simply by merging similar UAs and by matching against shorter pieces of the name.

:: shuffling papers ::

My current list, using BrowserMatch or BrowserMatchNoCase, is less than 25 items. The IP list is vastly longer. I've always worked on the assumption that a straight "Deny from aa.bb.cc.dd" places less strain on the server than any other approach you can use.

Thanks, lucy24. I now have a few up to date sources, so I'll compile a list and get it installed today. I was using mod_rewrite, but I like the Apache alternatives that read more like English, so I will give BrowserMatch/Deny a bash this time.