Auto-ban website spammers via the Apache access_log

During the past few months, several of my websites have been the target of some sort of SPAM attack. After my getting alerted that my servers were under high load (from Cacti), I found that a small number of IP addresses were loading and re-loading or POSTing to the same pages over and over again. In one of the attacks, they were simply reloading a page several times a second from multiple IP addresses. In another attack, they were POSTing several megabytes of data to a form (which spent time validating the input), several times a second. I’m not sure of their motives – my guess is that they’re either trying to game search rankings (the POSTings) or someone with an improperly configured robot.

Since I didn’t have anything in-place to automatically drop requests from these rogue SPAMmers, the servers were coming under increasing load and causing real visitor’s page loads to slow down.

After looking at the server’s Apache’s access_log, I was able to narrow down the IPs causing the issue. With their IP, I simply created a few iptables rules to drop all packets from their IP addresses. Within a few seconds, the load on the server returned to normal.

I didn’t want to play catch-up the next time this happened, so I created a small script to automatically parse my server’s access_logs and auto-ban any IP address that appears to be doing inappropriate things.

The script is pretty simple. It uses tail to look at the last $LINESTOSEARCH lines of the access_log, grabs all of the IPs via awk, sorts and counts them via uniq, then looks to see if any of these IPs had loaded more than $THRESHOLD pages. If so, it does a quick query of iptables to see if the IP is already banned. If not, it adds a single INPUT rule to DROP packets from that IP.

You can put this in your crontab, so it runs every X minutes. The script will probably need root access to use iptables.

I have the script in /etc/cron.10minutes and a crontab entry to run all files in that directory every 10 minutes: /etc/crontab:0,10,20,30,40,50 * * * * root run-parts /etc/cron.10minutes

Warning: Ensure that the $SEARCHTERM you use will not match a wide set of pages that at web crawler (for example, Google) would see. In my case, I set SEARCHTERM=POST, because I know that Google will not be posting to my website as all of the forms are excluded from crawling via robots.txt.

The full code is also available at Gist.GitHub if you want to fork or modify it. It’s a rather simplistic, brute-force approach to banning rogue IPs, but it has worked for my needs. You could easily update the script to be a bit smarter. If you do, let me know!