Monday, December 10, 2007

After spending a few days trying to come up with a more comprehensive method of identifying known pre-existing bad IPs using the existing block lists it has become quite maddening.

SpamHaus has their collection criteria which comes up with one set of BL results, ProjectHoneyPot has their methods and even different results, and so on and so forth. Then I have my methods which traps IPs that may intersect those BL's but quite often cough up brand new IPs not showing in the other BLs for spammers and scrapers. Collectively all of these BLs, including my own, are quite comprehensive but unfortunately there's no easy way to combine them all in a real-time manner that makes sense.

Sadly, the current state of affairs is that there are just too many independent services to use that makes the process overwhelming for the average webmaster which probably opts just to pick one, which would let things slip through the cracks, out of frustration. Picking block list A over block list B might be the difference between your server getting hacked just because one list knew about the malicious botnet IP and the other list didn't.

Funny, if this were anti-virus software people wouldn't just pick any old thing, they would want comprehensive coverage, so why can't we get comprehensive coverage in block lists?

What is desperately needed is some mechanism to pool all the results together into one common service, a Block List Babelfish, where a single access can get the combined collective intelligence on whether the IP is good or bad so that everyone can easily benefit.

I know how they do it and they're mostly focused on email and email scrapers but I don't have the resources to set up large email anti-spam anti-phish anti-malware traps that they use or the big honeypots.

I do something that's total focused on scrapers so we have some overlap with the crawlers I block but I'd just as soon block all the bad guys.

FYI, I don't run any open source PHP software but the idiots running botnets don't seem to get that.

It might do the job if you can get the BL in the first place but most sites providing BLs now only provide a public DNSBL you can query in real time and for enterprise accounts you can download the list for a large sum of money.

With 100s of DNSBL servers and/or paid BLs out there, it's simply not practical for any single webmaster to manage unless you work for a really big company with a serious budget to deal with this.

What I'm looking for is a single aggregator that actually does all this work, compiles all the BLs together so someone can get a pretty definitive YES or NO answer that this is a badly behaving IP in real-time without jumping through hoops.

Whoever did this could probably make some serious coin being a single outsourced provider for such a cumulative service that could be easily tapped into.

Bill;I would be happy to share my blocklists with you. Three of them are already online, at www.wizcrafts.net (Nigerian, Chinese and Russian/Exploited servers blocklists). I have other unpublished IPs and CIDRs that I've captured from my raw access logs, for spam or exploit attempts. I also receive input from other webmasters who fend off scammers.

Not exactly a good idea to put all port 80 blocks in a firewall asthat much data in Apache's blocking mechanism or the normal firewall could slow the server down.

My code protecting against bot blockers is much faster by design than most firewalls for the sheer volume of data it handles.

The second point is many things I block in port 80 would cause you trouble if it were also used to block SMTP/POP ports as well. Just because I block all data centers from accessing my website doesn't mean I want to block them from sending email otherwise I couldn't get order notifications, newsletters, etc.

"Wizcrafts, if those IPs are permanently banned from your server, think about loading then into a firewall. Using a firewall will free up apache resources."

This does no good when you are using shared hosting, which I use. Our only recourse is directives placed in our own Apache Server .htaccess files. That's why the titles of all of my blocklists include: "from your Apache server, with this .htaccess file."

I should mention that I have requested and received a firewall block when my website was under attack from a Chinese server. That only got done because it affected the entire server, which hosts God knows how many other websites. If it was just me being annoyed they would laugh at me for asking for a firewall rule.