Diffie-Hellman in under 25 lines

How can you and I agree on a secret without anyone eavesdropping being able to intercept our communications? At first, the idea sounds absurd – for the longest time, without a pre-shared secret, encryption was seen as impossible. In World War II, the Enigma machines relied on a fairly complex pre-shared secret – the Enigma configurations (consisting of the rotor drum wirings and number of rotors specific to the model, the Ringstellung of the day, and Steckbrett configurations) were effectively the pre-shared key. During the Cold War, field operatives were provided with one-time pads (OTPs), randomly (if they were lucky) or pseudorandomly (if they weren’t, which was most of the time) generated1 one time pads (OTPs) with which to encrypt their messages. Cold War era Soviet OTPs were, of course, vulnerable because like most Soviet things, they were manufactured sloppily.2 But OTPs are vulnerable to a big problem: if the key is known, the entire scheme of encryption is defeated. And somehow, you need to get that key to your field operative.

Enter the triad of Merkle, Diffie and Hellman, who in 1976 found a way to exploit the fact that multiplying primes is simple but decomposing a large number into the product of two primes is difficult. From this, they derived the algorithm that came to be known as the Diffie-Hellman algorithm.3

How to cook up a key exchange algorithm

The idea of a key exchange algorithm is to end up with a shared secret without having to exchange anything that would require transmission of the secret. In other words, the assumption is that the communication channel is unsafe. The algorithm must withstand an eavesdropper knowing every single exchange.

Alice and Bob must first agree to use a modulus and a base, so that the base is a primitive root modulo the modulus.

Alice and Bob each choose a secret key and respectively – ideally, randomly generated. The parties then exchange (for Alice) and (for Bob).

Alice now has received . She goes on to compute the shared secret by calculating and Bob computes it by calculating .

The whole story is premised on the equality of

That this holds nearly trivially true should be evident from substituting for and for . Then,

Thus, both parties get the same shared secret. An eavesdropper would be able to get and . Given a sufficiently large prime for , in the range of 6-700 digits, the discrete logarithm problem of retrieving from in the knowledge of and is not efficiently solvable, not even given fairly extensive computing resources.

A safe random number generator. You could do worse than the byte random function in the ssl library (ssl.RAND_bytes) or the OpenSSL library’s OpenSSL.rand.bytes.

Something to do SHA-256 hashing for us, as the key in the Diffie-Hellman exchange is not actually the shared secret but the SHA-256 hash thereof. A decent one for that hashlib.

We need to have the shared primes, and have an agreement on both the shared prime and the corresponding base (generator) – the and the in our scheme. RFC 3526 lists a few canonical primes known as numbered groups, with bit lengths from 1536 to 8192. For ultimate paranoia, we’ll be using Group 18, with generator 2 and 8192 bits.4

Finally, we need to have an object that will handle state for us. We’ll start by creating the state holder.

Imports and other stuff

As said, we’ll need something to handle hashing for us. In addition, we will need a random generator. We’ll be going with the one in the SSL library.

import hashlib
import ssl
random_function = ssl.RAND_bytes

That solves our first two ingredients. Then, we’ll need to define the prime group we will be using as a constant. An expansive implementation would contain all groups, but for now, we’ll simply set the desired group as a constant.

What are we doing here? Most of this code deals with the fact that the random number generator thinks in bytes whereas we think in integers. As such, first, we’ll need to figure out the byte length of a number of a given length. Then, we’re using the int.from_bytes() conversion function to give us the output of the random_function() in an integer form, in big endian byte order (the byteorder='big' parameter specifies the endianness).

Generating the private key from this is simply generating a random number of a given length. The public key is a little more complex. We know that the public key is defined as where is the generator (in our case, 2), is the private key generated in the previous step. Finally, is the prime group. It’s probably a good idea to frontload this into the creation of the object. As such, let’s go back to our object and rejazz our __init__. And while we’re at it, we should make sure users generate useful keys – in other words, let’s keep users from creating any keys less than 600 digits long.

By far not the smoothest way to do it, but we have created a fully functional key exchange system from, basically, scraps – a hash generator and a random number generator – in all of 24 lines of code. Not bad.

If all is well, the console ought to let you know that it indeed works.

The moral of the story

Other than Python’s awesome expressiveness (to be fair, most modern languages let you implement this in the same number of lines, roughly!), a big question is what’s so cool about this. The answer is that Diffie-Hellman is not only an extremely elegant solution to a complex problem, it’s also a fantastic example of an every-day algorithm anyone with basic mathematical knowledge can not only understand but actually build and manipulate. As a society in which cryptography is getting a bad rap (see the recent #UnlockJustice campaign), many prefer fear to knowledge and many more prefer ignorance to an understanding of how they, as citizens, can build and use the tools to protect their secrets from government overreach.

There is an increased interest in banning, or limiting the use, of cryptography software. The day may come when the only cryptography will be what we write ourselves, and it’s never been more important and more of a citizenship skill to understand and know how to implement basic cryptography primitives.

Related

As a child, I once built a pseudorandom number generator from a sound card, a piece of wire and some stray radio electronics, which basically rested on a sampling of atmospheric noise. I was surprised to learn much later that this was the method the KGB used as well.

2.

↑

Under pressure from the advancing German Wehrmacht in 1941, they had duplicated over 30,000 pages worth of OTP code. This broke the golden rule of OTPs of never, ever reusing code, and ended up with a backdoor that two of the most eminent female cryptanalysts of the 20th, Genevieve Grotjan Feinstein and Meredith Gardner, on whose shoulders the success of the Venona project rested, could exploit.

3.

↑

It deserves noting that the D-H key exchange algorithm was another of those inventions that were invented twice but published once. In 1975, the GCHQ team around Clifford Cocks invented the same algorithm, but was barred from publishing it. Their achievements weren’t recognised until 1997.

4.

↑

I know someone who has Group 14 tattooed on the sole of his foot. Why? No reason, I’m told.

Post navigation

One thought on “Diffie-Hellman in under 25 lines”

I was just googling for what limitations there are on the private kyes and the way you generate the private key seems a bit iffy: 1. by using .bit_length you’re requiring the top bit of the 1024 bit number to be 1 thereby AFAIU limiting the number of possible keys to half of the 1024-bit number space. 2. therefore 0 is (correctly) not allowed but number bigger than PRIME_18 might still be generated (?) and they should not be allowed 3. 1024 is also passed as _bytes (rather than 1024 / 8bits), wouldn’t that mean that the numbers generated can actually be *way* too big?

This is me.

I'm a clinical computational epidemiologist, data scientist and researcher with an emphasis on using large-scale genomic data sets for clinical decision support. With this blog, I pursue two two main goals: making public health and epidemiology intelligible to diverse audiences by explaining fundamental concepts and dispelling various myths and misconceptions, and helping early-career scientists - not just in epidemiology! - master presenting their insights to the public in a cogent, intelligible manner. There has never been more demand and more need for voices that can articulate complex yet important issues to the public, and there’s so little that prepares us for it. And inbetween, I talk a lot about data visualization and aesthetics, computer vision, IoT, health policy and episodes from my life – and the things that I get out of bed for: my wife, our cat, cute photos of marsupials and finding out new things.