Implicitly Authenticated DH

Last Updated: 16 December 2018

In all previous iterations of the authenticated Diffie-Hellman protocol, we've kept adding messages to the handshake, and all kinds of cryptographic primitives like encryption, signatures, and MACs... all of them requiring a lot of computations. This raises the question of whether it is possible to design a minimal authenticated, yet secure key exchange protocol, where minimal is understood with respect to the number of messages exchanged, and to the computations required (essentially exponentiations of bignums). It turns out that such minimal protocols do exist, although they are a little bit tricky to prove secure. They may seem like an intellectual curiosity, but implicitly authenticated DH is extremely useful in the context of the internet of things (IoT), where the devices are sometimes very underpowered. In this crypto bite we'll introduce HMQV, and we'll use this opportunity to introduce the incredibly useful underlying cryptographic primitive: designated verifier signatures.

Ability of Computing the Shared Key = Successfully Authenticated

The idea behind implicitly authenticated DH is that if both parties can compute a shared key, then they are also automatically authenticated. Before introducing one shared key computation with this property, let's present the general formal framework. Both parties \(A\) and \(B\) exchange a pair consisting of their public key and an ephemeral key, and nothing more (Instead of a public key, they could also send their certificate, i.e. \((\mathit{cert_A}, g^x)\) and \((\mathit{cert_B}, g^y)\), but that is am implementation detail):

\(A \rightarrow B \colon (\mathit{pk_A}, g^x)\)

\(A \leftarrow B \colon (\mathit{pk_B}, g^y)\)

Now, both \(A\) and \(B\) independently compute a shared key \(k\) with a function \(f\):

Of course, not all parameters are available to both parties at the same time: \(A\) doesn't have \(B\)'s long-term secret key \(\mathit{sk_B}\), nor his ephemeral secret key \(y\), and vice-versa, \(B\) doesn't have \(\mathit{sk_A}\) nor \(x\). But we'll see shortly how to overcome this difficulty. As with unauthenticated Diffie-Hellman, you'll realize than \(A\) and \(B\) compute \(f\) in a different way... but ultimatly, they'll compute the same function, and thus the same key \(k\).

Simplifying the Notation

We'll see in a moment some formulae for computing \(f\) from \(A\)'s and \(B\)'s sides. To simplify these formulae, we'll abuse the notation by setting \(A := pk_A\, B := pk_B\), i.e. \(A\) and \(B\) designate not only both peers, but also their long-term public keys, depending on context. We also set \(a := \mathit{sk_A}, b := \mathit{sk_B}\) to designate the long-term secret keys. By the same token, we set \(X := g^x, Y := g^y\), i.e. capital letters different from the peer's names always mean \(g\) raised to the power of the same lowercase letter whose value is uniformly randomly selected, e.g. \(S\) would be \(g^s\).

With this notation, the protocol becomes:

\(A \rightarrow B \colon (A, X)\)

\(A \leftarrow B \colon (B, Y)\)

and both parties compute

\[ k = f(A, B, a, b, X, Y, x, y) \]

Trying different ideas for f

First of all, is it possible to design a 2-message non-replayable protocol? We've already seen that this is not possible. We will show 2-message protocols below, but it should be noted that it is up to the applications to ensure non-replayability, e.g. by using nonces etc. once the shared key has been established.

With that out of the way, how do we design \(f\)? Of course, by combining a subset of \(A, B, a, b, X, Y, x, y\) such that each party has all the necessary ingredients. Let's try a few combinations.

Combining A, B, X, Y

The most obvious idea that first comes to mind is to combine all publicly available parameters \(A, B, X, Y\) (and the private keys that the peer computing \(f\) possess, i.e. \(A\) can also use \(a\) and \(x\), and \(B\) can also use \(b\) and \(y\) when computing \(f\)), so that both \(A\) and \(B\) can compute the same function \(f\). Let \(H\) be a hash function like SHA-256, SHA-512, SHA-3 and so on (of course, the security proofs will use a random oracle model for \(H)\). Now, both \(A\) and \(B\) compute:

\(k = H(g^{ab}, g^{xy})\). But that's too easy: this function is vulnerable to known-key and to interleaving attacks, among others

\(k = H(g^{ab}, g^{xy}, X, Y)\). This will prevent the previous attacks, but KCI attacks are still possible, which is a general weakness of protocols using \(g^{ab}\).

Our goal is thus a function \(f\) that resists all attacks, so long as an attacker learns neither the pair \((a, x)\) nor the pair \((b, y)\). Of course, if the attacker knows, say, \(a\), she can always impersonate \(A\) to \(B\), but without \(x\), she can't impersonate \(B\) to \(A\) in this session. Same vice-versa. Knowing both long-term and ephemeral keys of a peer is, of course, all it takes to mount a full-blown man in the middle attack.

Another idea that doesn't work

Instead of merely computing \(g^{ab}\) and \(g^{xy}\) separatly, and thus being vulnerable ot KCI attacks, \(A\) and \(B\) could compute a combination of both. They independently compute \(k = g^{(a+x)(b+y)}\) like this:

As you can see, both \(A\) and \(B\) use only parameters they know: their own private keys and the public keys they got from the peer.

How were we led to this idea? Remember unauthenticated Diffie-Hellman: with \(x\) an \(y\) unknown to an attacker, we could compute a key \(g^{xy}\) that was also unknown to the attacker under the DDH assumption. Since we've said above that the minimal security requirement requires that the pairs \((a,x)\) and \((b,y)\) must be unknown to an attacker, so would also \((a+x)\) and \((b+y)\) be unknown to the attacker. Therefore, \(g^{(a+x)(b+y)}\) would also be unknown to the attacker by the DDH assumption, just like in the case of unauthenticated Diffie-Hellman. But is this idea secure? In the AM model I think it is, but it's not in the UM model where man in the middle attacks are easily mounted.

So why doesn't it work in the presence of an active attacker? Here's an example of an attack. An attacker \(E\) could insert herself between \(A\) and \(B\) like this:

Now, \(E\) and \(B\) managed to compute a shared key \(k_{eb}\). But since \(E\) sent \(A\) (along with \(X'\)) in step 3., \(B\) now thinks that he is communicating with \(A\) because he was able to compute the shared key (remember, this is the defining property of implicitly authenticated DH).

Exercise: This attack is the \(E \leftrightarrow B\) side of the man in the middle attack. Show the corresponding \(A \leftrightarrow E\) side of the same attack, so that \(E\) can then act as a proxy between \(A\) and \(B\). What value \(Y'\) must \(E\) send to \(A\)?

The MQV Protocol

The problem with the previous \(k = g^{(a+x)(b+y)}\) idea was the attacker could easily replace \(x\) with \(x'\) to generate a new session key \(k_{be} = g^{(0+x')(b+y)}\), and vice-versa replace \(y\) with \(y'\) to generate (another) new session key \(k_{ae} = g^{(a+x)(0+y')}\). In other words, the attacker had too much control over \(x\) in the first case, and too much control over \(y\) in the second case.

Can we reduce this control? That's what the MQV protocol does. Instead of computing \(k = g^{(a+x)(b+y)}\), both peers will now compute \(k = g^{(a+dx)(b+ey)}\). Here, \(d\) is specified in such a way (see below) that an attacker can't have simultaneous control over \(d\) and \(X\) (and therefore \(x\)), and \(e\) is defined so that the attacker can't simultaneously control \(e\) and \(Y\) (and therefore \(y\)). With properly chosen \(d\) and \(e\), the attack shown previously would be thwarted.

What does it mean that "\(E\) has simultaneous control over \(d\) and \(X\)"? It means that she can choose both \(d'\) and \(X'\) at will, so that she can trick \(B\) into computing a \(d'x'\) of her own choosing instead of \(dx\) as part of \(f\)'s algorithm.

Why does the requirement that the attacker can't have simultaneous control over \(d\) and \(X\) prevent the previous attack on the \(E \leftrightarrow B\) side? For the attack to succeed, \(E\) wants to trick \(B\) to substitute \(g^{(a+dx)(b+ey)}\) with \(g^{(0+d'x')(b+ey)\) with \(d'x'\) of her own choosing. She wants to obtain not just any, but a specific value for \(d'x'\), like what she did in the previous protocol where she used and \(x'\) that depended on an \(X'\) and a \(d'\) of her own choosing: in that case, she wanted \(d' = 1\) and she wanted \(X' = \frac{g^{x'}}{A}\) with her own \(x'\) (\(e = 1\) in that case but that's irrelevant for this direction\). But if she has no simultaneous control over \(d'\) and \(X'\), it wouldn't work, since she can't specify \(d'\) and \(X'\) simultaneously.

The same reasoning applies to the \(A \leftrightarrow E\) direction: there's simply no way for \(E\) to have \(A\) compute \(e'y'\), and therefore \(g^{(a+dx)(b+e'y')}\) of her own choosing.

The question now is, how do we choose \(d\) and \(e\)? This is the core idea of the MQV protocol[K05]:

Here, \(d\) is \(\ell\) bits, i.e. half of the bits of \(X\), and \(e\) is half of the bits of \(Y\). By chosing so many bits of \(X\) for \(d\) in this specific way, it becomes very hard for \(E\) to assume simultaneous control over \(d\) and \(X\), or at least that's the intention. Similarly for \(e\) and \(Y\).

Note that \(\sigma_A = \sigma_B\) and therefore \(k_A = k_B\): both compute the same shared key.

which are the same values. Therefore, their hashes are also the same \(H(\sigma_A) = H(\sigma_B)\), so that they compute the same session key \(k_A = k_B\). \(\Box\)

The HMQV Protocol

That \(d\) and \(e\) in the MQV protocol satisfied the requirements that no attacker could neither simultaneously control \(d\) and \(X\), nor simultaneously control \(e\) and \(Y\) seemed plausible, but lacked a rigorous formal proof. Still, it didn't prevent MQV to be recommended by the NSA and to be widely deployed in a large variety of protocols at large. Using the Canetti-Krawczyk model, Hugo Krawczyk showed that these requirements were in fact not sufficient[K05]. MQV was broken!

But, in the same paper, Hugo proposed a slightly modified version of MQV that differed only in the way that \(d\) and \(e\) are computed. This new protocol, HQMV, looks like this:

Note that only step 1. was changed. Here, \(\hat{A}\) and \(\hat{B}\) are the names (Identities) of \(A\) and \(B\) as a string. E.g. \(\hat{A} = \mathtt{alice.AT.example.com}\) and \(\hat{B} = \mathtt{bob.AT.example.com}\). Function \(H\) is a hash function like SHA-256, SHA-512, SHA3 etc. that return hashes with the desired number of key bits, i.e. \(|k_A| = |k_B|\) bits. The function \(\bar{H}\) is a hash function that may use SHA-256, SHA-512, SHA3 internally, and that returns hashes that are \(|q|\) bits long, where \(q\) is the (prime) order of the cyclic group \(G = \langle g \rangle\) generated by \(g\), where \(g \in G' = \mathbb Z^{*}_p\) (or in any other supergroup \(G'\) like EC etc.\). For simplicity and to conserve program memory or silicon, hash function \(\bar{H}\) may reuse parts of \(H\)'s code (e.g. by truncating some bits\), but that's irrelevant to the HMQV protocol per se.

Costs of Authentication with the HMQV Protocol

We want to compare the cost of authentication via HMQV to the cost of unauthenticated Diffie-Hellman. To this end, we will investigate the communication costs, i.e. the number of handshake messages, and the computing costs, which are essentially the number of exponentiations that each peer has to compute in order to get the shared session key. Starting with unauthenticated Diffie-Hellman, we have

In other words, the overhead of authentication (compared to unauthenticated Diffie-Hellman\) with HMQV is 0 additional messages, and \(\frac{1}{6}\) additional exponentiation per peer. We get authentication almost for free!

What's the deal with this strange \(1 \frac{1}{6}\) authentications? Naively, computing \(\sigma_A = (YB^e)^{x+da}\) would require one exponentiation to the power of \(e\) and another exponentiation to the power of \(x+da\). Same for \(\sigma_B = (XA^d)^{y+eb}\). But, each time we have this special form of exponentiation involving the multiplication of two exponentiations, we can use a trick by Shamir, multi-exponentiation, that reduces the number of exponentiations from \(2\) to \(1 + \frac{1}{6}\).

Multi-exponentiation optimization

To compute \(g^{e_0}_0 \cdot g^{e_1}_1\) with less than two exponentiation, where the exponents have the bits \(e_0 = (a_{t-1} \cdots a_1 a_0)_2\) and \(e_1 = (b_{t-1} \cdots b_1 b_0)_2\) we start with precomputing some intermediary values:

\(G_0 := 1\)

\(G_1 := g_0\)

\(G_2 := g_1\)

\(G_3 := g_0 \cdot g_1\)

\(s_i := a_i + 2b_i, \forall i \in \{0, 1, \cdots, t-1\}\)

Now, for the main computation:

\(A := 1\)

for \(i\) from \(0\) to \(t-1\):

\(A := A \cdot A\)

\(A := A \cdot G_{s_i}\)

Output \(A\)

Exercise: prove that \(A = g^{e_0}_0 \cdot g^{e_1}_1\).

What's the cost of this multi-exponentiation algorithm? We end up with \(1 \frac{1}{6}\) instead of \(2\) exponentiations.

Interested readers may want to further explore multi-exponentiation techniques[M01] that are needed in many places in cryptography.

Security of the HMQV Protocol

[K05] discusses MQV and HMQV in depth, and claims with a formal proof that unlike MQV, HMQV is indeed secure. MQV's main inventor Menezes has some objections as to that claim[M05] and presents a small group attack on HMQV, while proposing an enhanced, albeit less efficient version. These objections are being addressed in the preface to the (meanwhile updated) HMQV paper. The Wikipedia entry on MQV mentions another small group attack on HMQV[H10] that is similar to Menezes' attack.

Hugo sketches a proof to the security of the HMQV protocol in his (follow up) talk "Implicitly Authenticated KEP" (see below). We'll leave it at that. Interested readers may want to consult the full HMQV paper[K05] for a formal proof and the corresponding threat model.

Designated Verifier Signatures

In this section, we'll talk about the cryptographic primitive that underlies MQV and HMQV: designated verifier signatures (DVS, challenge-response signatures).

Remember how non-repudiability of signatures in the ISO protocol was undesirable and even outright dangerous? The problem there was that by signing things like identities, ephemeral keys and so on with one's long-term private key, one also leaves a trail of indeniable connection meta-data for anyone to verify (using the possibly very wide spread long-term public key). Clearly, by using traditional signatures, we sign messages for too many people to verify.

What's needed is a signature scheme, where \(A\) can sign a message \(m\) in such a way that only \(B\), the so called designated verifier, but no-one else, can verify the signature. Conceptually, instead of computing \(\operatorname{sign_A}(m)\) that is verifiable by everyone who owns the public key of \(A\), \(A\) would compute something like \(\operatorname{sign_{A,B}}(m)\) that only \(B\) could verify. Anyone else seeing this signature wouldn't be able to distinguish this from random noise (with more than negligible probability). Any signature scheme that satisfies this property is called a strong designated verifier signature scheme[JSI96,RS10,YCm10].

In other words, \(A\) makes the signature \(\operatorname{sign_{A,B}}(m)\) only for \(B\).

Trying to design a DVS scheme

This raises the question of how to compute this signature. Obviously, \(A\) needs to "bake into" the signature algorithm some secret that only \(B\) could know. If we allow for interactivity, i.e. an online protocol, a naïve (but wrong) approach could look like this:

\(B\) picks a random secret \(r\), hashes it with a robust cryptographic hash function \(H\) and sends only the hash \(H(r)\) to \(A\). He keeps \(r\) for himself.

\(A\) computes a signature \(s_r\) of a message \(m\) using the hash \(H(r)\) that she just got (from \(B\)?). She then sends the signature and the message as a pair \((s_r, m)\) to \(B\)

\(B\) now tries of verify that the signature \(s_r\) matches the message \(m\). To this end, he provides his secret \(r\) (not the hash \(H(r)\)) to the verification algorithm, and gets back true or false depending on whether the signature verifies or not.

The idea of this challenge-response protocol is that an eavesdropper \(E\), upon seeing \(H(r), s_r, m\), still won't be able to use \(\operatorname{verify'}\) to check the signature \(s_r\): she doesn't have \(r\), and can't easily derive \(r\) from \(H(r)\). And since \(A\) didn't use her long-term private key to compute \(s_r\), she can always deny that she signed \(m\). If the signature \(s_r\) looks like random junk to \(E\) and to anyone else who doesn't have \(r\) (this is the "strong"-ness property of the DSV scheme), \(A\) could plausibly deny that \(s_r\) is even a signature at all!

Unfortunaly, this scheme is flawed, because nothing prevents \(E\) from impersonating \(B\). Indeed, all \(E\) has to do is to insert herself between \(A\) and \(B\) and send her own \(H(r_e)\) to \(A\), who would happily sign \(m\) with that hash. If \(E\) were some law enforcement official and \(m\) were some illegal message, \(A\) would just have provided the authorities enough rope to be hanged. So let's try to fix it by throwing the public key of the designated verifier \(B\) into the mix:

That's already better: if \(E\) tries to impersonate \(B\), she won't be able to check the signature made by \(A\) with the public key of \(B\) because she doesn't have the corresponding secret key.

But that too is wrong: \(E\) could now impersonate \(A\) to \(B\): basically, anyone has access to \(B\)'s public key, and if \(E\) intercepts \(B\)'s challenge \(H(r)\), she can compute \(\operatorname{sign'_{B,H(r)}}(m')\) for any message \(m'\) of her own choosing. In other word, \(E\) can mount an existential forgery attack against the challenger \(B\).

Exercise: There's a lot more that is wrong with this new version of DVS. Show as many possible attacks as you can to defeat this scheme.

Of course, strong DVS schemes do exist. XCR is a good example:

XCR: eXponential Challenge-Response

In this scenario, \(A\) wants \(B\) to sign a message \(m\) of her choosing. She knows the public key \(B = g^b\) of \(B\), but she has no public key of her own (none is needed in this case). Upon getting the signature back, she'll verify its validity. Basically, she wants something like this:

\(A\) verifies that \(s\) is a signature made by \(B\) on message \(m\)

Of course, that's not good, because \(s\) isn't a DVS: anyone can verify this signature, and \(B\) would be toast, i.e. he couldn't repudiate his signing of message \(m\). So let's add a challenge-response to the mix, so that we get a DVS that only \(A\) could verify. Here't the improved protocol:

XCR Signature Scheme:

\(A \rightarrow B \colon (m, X)\), where \(X = g^x\) is a challenge built out of a random secret \(x\)

Does this scheme make sense at all? Note that \(A\) can compute the signature herself: she has \(Y\) from \(B\)'s response, she already knows the public key \(B\) as stated previously, she can compute \(e = H(Y, m)\) on her own, because she has \(Y\) and she was the one to invent \(m\), and, using her secret "challenge trapdoor" \(x\), she can compute \((YB^e)^x\). This signature, she compares it with the signature \(\sigma\) sent by whomever claims to be \(B\). If both are the same, \(A\) can safely assume that it was \(B\) who signed her message \(m\).

Furthermore, in addition to \(B\), only \(A\) can also compute (and therefore verify) this signature \(\sigma\), because for any eavesdropper having only access to public information \(m, X, Y, \sigma\), it is trivial to compute \(e = H(Y, m)\), but computing \((YB^e)^x\) requires the secret \(x\) that only the designated verifier \(A\) possesses.

Important properties of XCR signatures are summarized in the following theorem:

Theorem: (XCR signatures are unforgeable) When \(H\) is modeled as a random oracle, and under the CDH (computational Diffie-Hellman) assumption, XCR signatures are unforgeable under the usual adaptive chosen message attack. Furthermore, only signer and designated verifier (challenger) can compute it.

Proof idea: "Exponential Schnorr" via Fiat-Shamir. \(\Box\)

DCR: Dual XCR Signatures

With XCR, we've seen how \(A\) is the designated verifier, and \(B\) is the signer. Of course, we could swap the roles so that \(A\) is the signer, and \(B\) is the designated verifier. If \(A\) and \(B\) concatenate both XCR schemes, both could verify the signature of the their peer, while nobody else could. By rearranging the order of the messages, and sending as much payload as possible in each message, we get the DCR scheme, the Dual XCR Signatures.

In the DCR scheme, \(A\) and \(B\) act as signers and designated verifiers simultaneously. \(A\) has public key \(A = g^a\), and \(B\) has public key \(B = g^b\). As with XCR, it is assumed that those public keys are widely disseminated and known by the peers (and by attackers).

Note that steps 1. and 2. can be done simultaneously because they are independent of each other. Steps 3. and 4. can also be done simultaneously for the same reason. And finally, steps 5. and 6. can also be done simultaneously.

Let's compute \(\sigma_A\) and \(\sigma_B\):

\[ \sigma_A = (YB^e)^x, e = H(Y, m_B) \]

\[\sigma_B = (XA^d)^y, d = H(X, m_A) \]

It turns out that both signatures \(\sigma_A\) and \(\sigma_B\) are the same. Of course, computing them requires knowledge of the challenge trapdoor \(x\) of \(A\) or of the challenge trapdoor \(y\) of \(B\), so that an eavesdropper \(E\) has no way to compute them herself.

But, since \(\sigma_A = \sigma_B\), not only can both \(A\) and \(B\) compute them independently... there is no need to send them over the wire! Indeed, if further down in the protocol, \(A\) and \(B\) decided to use \(\sigma_A (= \sigma_B)\) as a basis for, say, deriving a shared session key, they would implicitly verify the XCR signature of the peer simply by computing the same value! Indeed, if one or both of the signatures didn't verify, this would imply that \(\sigma_A \ne \sigma_B\), and both parties couldn't establish a shared secret (key) at all!

This is a 2-way handshake protocol that authenticates \(A\) and \(B\) almost for free.

HMQV uses DCR

If you're still with me, you've doubtlessly noticed that the optimized DCR Signature Scheme is noting else but the HMQV protocol, where \(m_A = \hat{A}\) and \(m_B = \hat{B}\), i.e. the identities of \(A\) and \(B\), expressed as strings (e.g. \(\hat{A} = \mathtt{alice.AT.example.com}\) and \(\hat{B} = \mathtt{bob.AT.example.com}\)).

Proof idea that HMQV is secure

To show that HMQV is secure, we use a proof by reduction: if if is possible to break HMQV, then it is also possible to forge DCR (a reduction proof that is not trivial). And forging DCR means that it is possible to forge XCR (a straightforward argument). Forging XCR is not possible, using the above theorem on unforgeability of XCR signatures. Or, alternatively, forging XCR can be reduced to solving CDH, the computational Diffie-Hellman assumption, in the random oracle model. Details in Hugo's slides and talk (see below). \(\Box\)

That's it for now...

We'll conclude this crypto bite here, since it's long enough as it is. Hugo's talk and slides also introduce:

Modified Okamoto-Tanaka (mOT), a super-efficient AKE protocol that makes do without certificates, but at the cost of a trusted Key Generation Center (KGC)