Preferred Member

joined:Jan 20, 2005
posts:495
votes: 0

Just wanted to note in this thread that contacting them by email does seem to work. They didn't tell me who it was that wanted all of our data, but they did promise not to spider us anymore. Since then, they haven't visited our site.

Senior Member

New User

joined:June 15, 2012
posts: 1
votes: 0

These folks just hit my org yesterday, about 6k hits over the 2 hours before I completely blocked them. Massively overloaded our DB and killed the sites. I did use the robots.txt mod as suggested on their site, but I also collected approximately 850 IP's (all overseas) from my web logs and blocked them via iptables rules in my firewall.

I did have a filter setup in my apache server to reject queries by specified agents, however it had no effect. It wasn't a crawler as much as a botnet based DDoS. IMHO - if you see these guys in your logs, block them.

Senior Member

joined:Aug 12, 2003
posts:854
votes: 1

The incidents above involve 80legs.com's spidering at a rate of less than one page per second. My server could have easily handled that, but the spidering of my site was on a different order of magnitude. If it's taking down your server, being hit by a "respectable" company's distributed spider doesn't feel much different from a DDoS attack.

I appreciate that 80legs.com's customer service acknowledges that they are at times responsible for overwhelming the servers their customers hire them to target, and that they make some effort to respect robots.txt. But given that they can manually slow the rate at which their botnet hits a website when they receive a complaint, there does not appear to be any reason why they haven't set reasonable default limits for all websites they spider in order to prevent their botnet from ever running amok.