SSL computational DoS mitigation

Vincent Bernat

Some days ago, a hacker group, THC, released a
denial of service tool for SSL web servers. As stated in its
description, the problem is not really new: a complete SSL handshake
implies costly cryptographic computations.

There are two different aspects in the presented attack:

The computation cost of an handshake is more important on the
server side than on the client side. The advisory explains that a
server will require 15 times the processing power of a client. This
means a single average workstation could challenge a multi-core
high-end server.

The use of SSL renegotiation allows you to trigger hundreds
of handshakes in the same TCP connection. Even a client behind a
DSL connection can therefore bomb a server with a lot of
renegotiation requests.

UPDATED: While the content of this article is still technically
sound, ensure you understand it was written by the end of 2011 and
therefore doesn’t take into account many important aspects, like the
fall of RC4 as an appropriate cipher.

Mitigation techniques

There is no definitive solution to this attack but there exists some
workarounds. Since the DoS tool from THC relies heavily on
renegotiation, the most obvious one is to disable this mechanism on
the server side but we will explore other possibilities.

Disabling SSL renegotiation

Tackling the second problem seems easy: just disable SSL
renegotiation. It is hardly needed: a server can trigger a
renegotiation to ask a client to present a certificate but a client
usually does not have any reason to trigger one. Because of a
past vulnerability in SSL renegotiation, recent
version of Apache and nginx just forbid it, even when the
non-vulnerable version is available.

openssl s_client can be used to test if SSL renegotiation is really
disabled. Sending R on an empty line trigger renegotiation. Here is
an example where renegotiation is disabled (despite being advertised
as supported):

Rate limiting SSL handshakes

Disabling SSL renegotiation on the client side is not always
possible. For example, your web server may be too old to propose such
an option. Since those renegotiations should not happen often, a
workaround is to limit them.

When the flaw was first advertised, F5 Networks provided a way
to configure such a limitation with an iRule on their load-balancers. We
can do something similar with just Netfilter. We can spot most TCP
packets triggering such a renegotiation by looking for encrypted TLS
handshake record. They may happen in a regular handshake but in this
case, they usually are not at the beginning of the TCP payload. There
is no field saying if a TLS record is encrypted or not (TLS is
stateful for this purpose). Therefore, we have to use some
heuristics. If the handshake type is unknown, we assume that this is
an encrypted record. Moreover, renegotiation requests are usually
encapsulated in a TCP packet flagged with “push”.

The use of u32 match is a bit difficult to read. The
manual page gives some insightful examples. $payload
allows us to seek for the TCP payload. It only works if there is no
fragmentation. Then, we check if we have a handshake (0x16) and if
we recognise TLS version (0x0300, 0x0301, 0x0302 or
0x0303). At least, we check if the handshake type is not a known value.

There is a risk of false positives but since we use hashlimit, we
should be safe. This is not a bullet proof solution: TCP fragmentation
would allow an attacker to evade detection. Another equivalent
solution would be to use CONNMARK to record the fact the initial
handshake has been done and forbid any subsequent handshakes1.

If you happen to disable SSL renegociation, you can still use some
Netfilter rule to limit the number of SSL handshakes by limiting the
number of TCP connections from one IP:

UPDATED: Adam Langley announced
Google HTTPS sites now support forward secrecy and
ECDHE-RSA-RC4-SHA is now the preferred cipher suite thanks to fast,
constant-time implementations of elliptic curves P-224, P-256 and
P-521 in OpenSSL. The tests above did not use those implementations.

For example, with 2048bit RSA certificates and a cipher suite like
AES256-SHA, the server needs 6 times more CPU power than the
client. However, if we use DHE-RSA-AES256-SHA instead, the server
needs 34% less CPU power. The most efficient cipher suite from the
server point of view seems to be something like DHE-DSS-AES256-SHA
where the server needs half the power of the client.

However, you can’t really uses only those shiny cipher suites:

Some browsers do not support them: they are limited to RSA cipher
suites3.

Using them will increase your regular load a lot. Your servers may
collapse with just legitimate traffic.

They are expensive for some mobile clients: they need more memory,
more processing power and will drain battery faster.

Let’s dig a bit more on why the server needs more computational power
in the case of RSA. Here is a SSL handshake when using a cipher suite
like AES256-SHA:

When sending the Client Key Exchange message, the client will
encryptTLS version and 46 random bytes with the public key of the
certificate sent by the server in its Certificate message. The
server will have to decrypt this message with its private
key. Those are the two most expensive operations in the
handshake. Encryption and decryption are done with RSA (because
of the selected cipher suite). To understand why decryption is more
expensive than encryption, let me explain how RSA works.

First, the server needs a public and a private key. Here are the main
steps to generate them:

Pick two random distinct prime numbers·p· and ·q·, each
roughly the same size.

Compute ·n=pq·. It is the modulus.

Compute ·\varphi(n)=(p-1)(q-1)·.

Choose an integer ·e· such that ·1<e<\varphi(n)· and
·\gcd(\varphi(n),e) = 1· (i.e. ·e· and ·\varphi(n)· are
coprime). It is the public exponent.

Compute ·d=e^{-1}\mod\varphi(n)·. It is the private key
exponent.

The public key is ·(n,e)· while the private key is ·(n,d)·. A
message to be encrypted is first turned into an integer ·m<n· (with
some appropriate padding). It is then encrypted to a ciphered message
·c· with the public key and should only be decrypted with the private key:

·c=m^e\mod n· (encryption)

·m=c^d\mod n· (decryption)

So, why is decryption more expensive? In fact, the key pair is not
really generated like I said above. Usually, ·e· is a small fixed
prime number with a lot of 0, like 17 (0x11) or 65537 (0x10001)
and ·p· and ·q· are choosen such that ·\varphi(n)· is coprime with
·e·. This allows encryption to be fast using
exponentiation by squaring. On the other hand, its inverse
·d· is a big number with no special property and therefore,
exponentiation is more costly and slow.

Instead of computing ·d· from ·e·, it is possible to choose ·d· and
compute ·e·. We could choose ·d· to be small and coprime with
·\varphi(n)· and then compute ·e=d^{-1}\mod\varphi(n)· and get
blazingly fast decryption. Unfortunately, there are two problems with this:

Therefore, we cannot use a small private exponent. The best we can do
is to choose the public exponent to be ·e’=4294967291· (the biggest
prime 32bit number and it contains only one 0). However, there is no
change as you can see on our comparative plot.

To summarize, no real solution here. You need to allow RSA cipher
suites and there is no way to improve the computational ratio between
the server and the client with such a cipher suite.

What you should be asking at this point is whether a computational DoS
attack based on renegotiation is any better for the attacker than a
computational DoS attack based on multiple connections. The way we
measure this is by the ratio of the work the attacker has to do to the
work that the server has to do. I’ve never seen any actual
measurements here (and the THC guys don’t present any), but some back
of the envelope calculations suggest that the difference is small.

If I want to mount the old, multiple connection attack, I need to
incur the following costs:

Do the TCP handshake (3 packets)

Send the SSL/TLSClientHello (1 packet). This can be a canned message.

Send the SSL/TLSClientKeyExchange, ChangeCipherSpec,
Finished messages (1 packet). These can also be canned.

Note that I don’t need to parse any SSL/TLS messages from the server,
and I don’t need to do any cryptography. I’m just going to send the
server junk anyway, so I can (for instance) send the same bogus
ClientKeyExchange and Finished every time. The server can’t find
out that they are bogus until it’s done the expensive part. So,
roughly speaking, this attack consists of sending a bunch of canned
packets in order to force the server to do one RSA decryption.

I have written a
quick proof of concept of such a tool. To avoid any
abuse, it will only work if the server supports NULL-MD5
cipher suite. No sane server in the wild will support such a
cipher. You need to configure your web server to support it before
using this tool.

While Eric explains that there is no need to parse any SSL/TLS
messages, I have found that if the key exchange message is sent before
the server send the answer, the connection will be aborted. Therefore,
I quickly parse the server’s answer to check if I can continue. Eric
also says a bogus key exchange message can be sent since the server
will have to decrypt it before discovering it is bogus. I have choosen
to build a valid key exchange message during the first handshake
(using the certificate presented by the server) and replay it on
subsequent handshakes because I think the server may dismiss the
message before the computation is complete (for example, if the size
does not match the size of the certificate).

UPDATED: Michał Trojnara has written sslsqueeze, a
similar tool. It uses libevent2 library and should display better
performances than mine. It does not compute a valid key exchange
message but ensure the length is correct.

With such a tool and 2048bit RSA certificate, a server is using 100
times more processing power than the client. Unfortunately, this means
that most solutions, except rate limiting, exposed on this page
may just be ineffective.

Handling TLS state in Netfilter is quite hard. The first
solution is using the fact that renegotiation requests are
encapsulated in a TCP segment flagged with “push”. This is not
always the case and it is trivial to workaround. With the
second solution, we assume that the first encrypted handshake
record is packed in the same TCP segment than the client key
exchange. If it comes in its own TCP segment, it would be seen
as a renegociation while it is not. The state machine needs to
be improved to detect the first encrypted handshake at the
beginning of a TLS record or in the middle of it. ↩

However, since this rule relies on source IP to identify the
attacker, the risk of false positive is real. You can slow down
legitimate proxies, networks NATed behind a single IP, mobile
users sharing an IP address or people behind a CGN. ↩

Cipher suites supported by all browsers are RC4-MD5, RC4-SHA
and 3DES-SHA. Support for DHE-DSS-AES256-SHA requires TLS
1.2(not supported by any browser).↩

Eric is one of the author of several RFC related to TLS. He knows his stuff. ↩