Back from the Dead: Simple Bash for complex DdoS

If you work for a company with an online presence long enough, you'll deal with it eventually. Someone, out of malice, boredom, pathology, or some combination of all three, will target your company's online presence and resources for attack. If you are lucky, it will be a run of the mill Denial of Service (DoS) attack from a single or limited range of IP addresses that can be easily blocked at your outermost point, and the responsible parties will lack the necessary expertise to overcome this relatively simple countermeasure. Your usual script kiddie attack against a site with competent network and server administration is fairly short. If you are unlucky, you'll experience something worse: A small percentage of attacks is from a higher caliber of black hat, and while more difficult to deal with, the individual generally bores easily and moves on.

If you are very, very unlucky, someone highly skilled and just as determined will decide to have some fun with you. If this person decides they want to crack their way into your servers and explore your environment, eventually they will get in and their isn't too much you can do about it. As long as they don't do anything too obvious, like launch a huge dictionary crack attack against other sites from your servers, you may never know, even if you are pretty good and attentive. And if they decide they want to knock you off of the Internet, then down you go.

I had the misfortune to be on the receiving end of such an attack at a previous employer who shall remain nameless (but it was in 2007 and my linkedin is public: http://www.linkedin.com/in/gregbledsoe). Someone didn't seem to like us very much and decided to erase us from online existence. At first it was a standard DoS syn-flood that any script-kiddie could launch, a minor annoyance at best, easily mitigated by blocking the source IP at the point of Ingress. Then it got interesting.

The attacker adapted by engaging a substantial bot-net and it became a distributed denial of service (DdoS) attack. The targeted server address was down briefly until we engaged our carriers to block the inbound attack further out. Still, at that point, the crisis is over, right? Normally, yes. In this case? Not even close.

The attacker adapted the attack *again*, this time seeming to rotate through connections from real bot-net systems and also sending oodles of fake connection requests from random spoofed IP addresses. All told, the number of incoming connection requests was close to a million at a time. This took us down hard. Panic ensued, and after some quick brainstorming a number of mitigation techniques were attempted, all to no avail. The connections went through our firewall, through our load balancer, and hit one of three back-end systems, all of which were overwhelmed dealing with the load imposed by the attack. We tried using rate-limiting on the firewall, and while I'm not sure exactly what they implemented, this took everything behind the firewall down, not just the the targeted URL/server address. The rate limiting statements were taken back out of the configuration but everything stayed down. We discovered that the firewall equipment was out of memory, creating table space to keep track of all the connection attempts. It couldn't tell the difference between spoofed, real, and legitimate tcp SYN connection requests, so it tracked them all and let them through. Apparently the particular equipment we had did not allow more granular rate limiting. Options were discussed, including rejiggering our DNS to send all our traffic through a (very expensive) company that promised to scrub the attack before it reached us. I was skeptical of this idea.

Being the Unix Guy, my domain was the backend servers and to a lesser extent, the load balancer. After watching the output of netstats, lsof -ni's, and tcpdumps for a while, I knew how to defeat this attack. I spent about 10 minutes crafting my counter measure and deployed it on all three back end servers and within seconds our environment was alive again. The red of nagios alarms cleared within a few minutes and our phones stopped ringing. Our total downtime was about an hour.

The thing that I noticed that made this counter measure work was that there was a clear threshold between the number of connections opened by legitimate users, and the high number of connections from both the real and spoofed IPs that were part of the attack. By identifying them on the back-end servers and sending TCP resets (with the RST flag on) back on all those connection requests over the threshold, we could clear out the connection information on the server, the load balancer, and the firewall and free up the memory that had been used to store that entry in the table - clear out enough of them quickly enough, faster than new attack IPs were coming in, and life became good again.

Our attacker made a number of attempts to adapt to this solution, trying for instance to have sections of the bot-nets start at some IP, like 1.1.1.1, and send one connection apiece rotating through IPs as quickly as possible to avoid tripping the threshold, but couldn't rotate quickly enough to wreak the same level of havoc as before. This script proved very robust against the rest of his attacks. Some fine-tuning was done, for instance to remove lines after they aged a particular amount, but the essense of the script remained the same.

What I really liked about this solution was the simplicity. I have found that the best solutions are usually the simplest. If you really understand the underlying technology and protocols, then you can often see right through to what underlies a problem, and avoid adding layer after layer of expense and complexity (and corresponding break points) to your environment.

I'm more than willing to release this under the GPL v2. If anyone is interested in incorporating this snippit or concepts into a larger solution for distribution let me know via the email address below.

______________________

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I really believe the important lesson here is that rejecting packets instead of dropping them can help the surrounding network get a hint of what's going on and mitigate the situation, even though dropping the packets may superficially seem more effective (because it does not create any more traffic on an already heavily burdened network, REJECT does).

However, one must pay attention to very large DDoS attacks, where this simple method can fill up the iptables table with lots and lots of rules, adversely affecting CPU and memory usage to the point where the mitigation itself becomes an autoimmune disease. I have yet to see such a case on physical servers whereas OpenVZ containers can easily die because of this, courtesy of the numiptent limit that UBC-based containers heed to.

Wow. Remarkably similar. Any idea when this was written? (I really want mine to have been first! Come on 2008 or later!)

Reality is most good ideas occur to more than one person at different times... like the debate on who invented the lightbulb first, or calculus...

Bottom line is it works, simple, effective, fairly light-weight. I'll have to take a closer look at deflate, see if I can contribute at all... looks pretty complete though.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Greg, I can see an advantages to your approach also. Sometimes you would not want to go through the setup and the extra complexity when in an emergency situation. It's good and varied tools that help us be efficient at our jobs.

The only real difference is that I switched to --reject-with-tcp-reset while he uses --DROP. --DROP leaves the record and memory usage in place on stateful network gear for the connections - I would say that is a slight "slight" advantage to my solution in some circumstances, but easy to change in Ddos.sh.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I didn't see that in my first cursory look over it, since it isn't at the top of the file.

Then I guess the kudo's are yours Zak. You did it first. Darn you. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I must say it's rare to find the correct solution to defend against Web DdoS Attacks.

Your Solution is Simple, Effective and Very Clever, KUDO's !!!

To defend against Web DdoS in a Panic/Crisis Mode, Most Folks ultimately get their ISP's involved upstream. ISP's can/will cause blockage to legitimate Business Services, causing hundreds of help desk phone calls, exactly what the Attacker wanted to accomplish.

Preventing the Ddos Attack at the correct Endpoint, Web/Application Servers exposed on Public Internet, is by far the best solution to issue.

I didn't save that cron job -- but it really shouldn't be too terribly difficult to replicate. If I can squeak some time out of my day I'll take a stab at it.

I would run the script with a "nohup [command] &" which would only stop if killed specifically by pid or name, or with a reboot. Reboot seems like overkill though. "kill [pid]" should do it.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I have written a filter for tomcat, that count parallel connections from the same ip. If the counter reaches a threshold, it shutdown the connection with "shutdown(fd, SHUT_WR)", so that the server will send back a RST. I also took samples for memory usage, if a request is pending and memory is not enough, drop it.

I'd like to see that, too! That could certainly work in certain circumstances.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

These iptables statements will let every IP continue to send *up to* 10 connection requests per minute. That wouldn't really have helped us with the number of IPs being used in the attack - we needed to identify and then reject and clear *all* connection attempts from the "bad" IPs. I looked over the current man page for IP tables and don't see a way to do that without some scripting.

But I appreciate you provoking me to look! Always learning. :-)

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

But not quite. I've had quite a bit of problems getting --update and -hitcount to work together correctly, or more properly - as I expect it to. Its entirely possible that the issues I encountered are no longer relevant - but I've not tested it recently. Second - what your iptables lines will catch is connection requests in 60 seconds, what the script catches is simultaneous connections outstanding - a slight difference but meaningful, and could, in the right circumstances, make all the difference.

As an aside - DROP isn't what you want in this case. DROP leaves the tracking burden on all the stateful gear between you and the endpoint - which doesn't fix the problem.

Good suggestion though!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Thanks for the article, I plan to work through the script as an exercise to improve my knowledge of bash and networking, particularly the lsof command.

To add my 2 cents, similar to Pablo, if you ever need the script again I believe that it is more efficient to use grep -c rather than piping to wc -l (I think I read that in another LJ article??). Probably a negligible improvement but hey why not? :)

Its entirely possible grep is faster at counting lines - this isn't something I've tested personally - though it seems (uneducated guess alert) that grep is optimized for searching while wc is optimized for counting. I'd suspect wc was more resource efficient - though I could very well be wrong.

Now I will be irresistibly drawn to test it and unable to sleep until I do. Thanks! ;-P

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

ie, how to get the answer to a "which is faster or more efficient" question when it comes to bash scripting. Using time, I found that, as i suspected, wc is *much much much* faster than grep -c, but that excludes time for subshell spawning that would be involved in piping.

Generally, bash built-ins and one-shot single-purpose commands are way faster at what they do than the big commands that are swiss-army-style utilities like awk, sed, or grep. cut is faster than tr, and tr is faster than sed, and sed is faster than awk, etc. But adding in piping and associated overhead muddies the picture a little.

Maybe I *will* write that article. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I found that using grep -c was faster than piping to wc. Maybe because of what you mentioned with piping overhead and what not. I agree that wc would be faster than grep, but if you have to use grep anyway to perform the search, may as well just use it to count?

I tried this:

mybigfile.txt is 881M and just created by cat'ing /usr/share/dict files together a bunch of times.

$ time grep -c a mybigfile.txt
50224464

real 0m8.251s
user 0m8.110s
sys 0m0.120s

$ time grep a mybigfile.txt | wc -l
50224464

real 0m10.991s
user 0m11.610s
sys 0m0.320s

So basically what it comes down to is yes that would be an interesting article and I'd like to read it :)

I wish I had further time to do more testing. I'd be interested to see if versions made a difference, and the complexity of the grep. :-D

I just put it on my list.

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

I still need to test though. :-D I suspect it'll be a close call between wc keeping a cumulative count vs grep tracking it on the way through. :-D

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

reset all the connections coming from a pool of IPs you have previously selected as potential attacker IPs (spoofed and non spoofed).

How many IPs are we talking about? you should have some numbers (/var/log/Ddos/Ddos.log).
If this IPs are a lot (especially the spoofed ones), using the iptables rule you are potentially blocking also common users that, after the DDOS is over, are trying to hit your web servers...

Do you delete the iptables rules after a while?

Just asking here because I am very interested at fully understanding your bash script.

That is an excellent question. Thanks for asking! In fact, I ran a nightly cron job that removed reject rules that hadn't been hit in a certain amount of time. We tuned that out of exactly the concern you raise, blocking actual users, but eventually proclaimed it "good enough" when we only had one complaint over several days from an actual user that couldn't reach us, which we tracked back to an iptable rule.

Again, great question!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

Feel free to use it if you need to Dave. Would appreciate if you feedback any improvements though. :-)

F2B is really for a different kind of problem, more of a crack attempt kind of attack. I've also had bad luck trying to block whole geographical regions, as ISP's have a way of shifting blocks around unpredictably as IPv4 space availability tightens.

Glad to put a new tool in the toolbox!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

This particular script requires a pretty solid understanding of both basic networking and basic bash scripting. I would suggest start with some resources designed to get someone up to the CCNA level or equivalent (not necessarily cisco focused) and some bash scripting tutorials, like go from:

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king

But under the gun, I just didn't think of it. :-) Thanks for the suggestion! If I ever need to use this again (may it never be!) I'll include your suggestion!

--
I was cloud before cloud was cool. Not in the sense of being an amorphous collection of loosely related molecules with indeterminate borders -- or maybe I am. Holla @geek_king, http://twitter.com/geek_king