I've been struggling on this problem for a while now : the Hill cipher is well-known to be vulnerable to known-plaintext attack due to its linearity. Given a key matrix $K$ of size $n\times n$, one can retrieve the key with as little as $n^2$ plaintext/cipher text couples.

All examples found on the Internet assume an alphabet where $A=0, B=1 \dots Z=25$, but how could we break the Hill cipher using a know-plaintext attack without any clue over the alphabet permutation nor the key matrix ?

I've tried several things such as linearizing the whole set of equations together, however it requires several divisions in $\bf{Z}_{26}$ which aren't possible (i.e. division by even numbers or 13)

2 Answers
2

Well, I'll assume that we'll use the same mapping between letters and integers both to translate the plaintext into integers (to be matrix multipled), and the integers (after the matrix multiply) back into ciphertext. And, we don't know that mapping, the key matrix $K$, and possibly the value of $n$.

If so, the obvious place to start is to attempt to solve this $\bmod \ 2$. The key matrix $K$ consists of integers modulo 26; because 2 is a divisor of 26, it can be treated as if it consists of integers modulo 2. The mapping assigns 13 characters to even integers and 13 characters to odd integers; there are $\binom{26}{13} = 10,400,600$ possibilities. So, what we (actually, a computer, this is a bit much for hand computation) can do is scan through all 10 million possibilities (and through the reasonable values of $n$, if we don't know that), and check if that mapping to even/odd characters is consistent with the known plaintext/ciphertext.

The result of this will be the lsbits of the mapping (which are the 13 even characters, and which are the odd characters), the lsbits of the elements of $K$, and the value of $n$.

The next obvious thing to attack, if we have enough plaintext, is to recover the mapping of the even characters. There are $13! = 6,227,020,800$ such possible mappings; if we can find (say) $n+1$ blocks that consist of only even characters (and both the plaintext and the ciphertext blocks will consist of even characters; we know this because the mapping is consistent $\bmod \ 2$), then we can scan through the 13! possible mappings, and see if each such mapping is consistent with all $n+1$ plaintext/ciphertext pairs. If $n=5$, then we expect to be able to find 6 such blocks if we have at least 1200 characters of known plaintext; not unreasonable.

Once we have that, that gives us the value of $K$, and the mapping for the even characters. With that, deducing the mapping of the odd characters is straight-forward.

Linearizing should work fine. You'll have to make a minor adjustment to deal with the fact that we are working modulo 26, but either of the following two simple tweaks should work fine:

You could use generalized Gaussian elimination, generalized to work over $\mathbb{Z}/26\mathbb{Z}$ (standard Gaussian elimination assumes we are working over a field, but here we are working over a ring).

Alternatively, you can use the Chinese remainder theorem. Linearization gives you some equations modulo $26$. First, reduce them modulo 2, so now you get equations over $\mathbb{Z}/2\mathbb{Z}$; this is a field, so you can use standard Gaussian elimination to solve them modulo 2. Second, reduce your equations modulo 13, yielding some equations over $\mathbb{Z}/13\mathbb{Z}$ that can also be solved using Gaussian elimination. Now, for each unknown, you know its value modulo 2 and modulo 13; the Chinese remainder theorem then immediately tells you its value modulo 26.

The second approach is probably going to be easier.

Let me remind everyone what linearization means: it means we introduce extra unknowns so that the equations become totally linear. I should elaborate on how that works out in this context.

For instance, let's assume $n=6$, so all messages are 6 letters long. We'll have unknowns $A_{i,j},B_{i,j},\dots,Z_{i,j}$ and $A',B',\dots,Z'$, which are used as follows. Suppose we have a known plaintext ATTACK and the corresponding ciphertext QXUZAD. This will implicate the unknowns $A_{1,j}$, $T_{2,j}$, $T_{3,j}$, $A_{4,j}$, $C_{5,j}$, $K_{6,j}$ where $A_{1,j} = K_{1,j} \pi(A)$, $T_{2,j} = K_{2,j} \pi(T)$, etc., where $\pi$ is the unknown plaintext permutation and $K$ is the unknown matrix (key). Notice that the first letter of the ciphertext will have the value $(\pi(A), \pi(T), \dots, \pi(K))^T \cdot K$, which in terms of our new unknowns is exactly $A_{1,1} + T_{2,1} + T_{3,1} + \dots + K_{6,1}$. We'll use this in a minute. Since the ciphertext has the letters Q,X,U,Z,A,D, it will implicate the unknowns $Q', X', U', Z', A', D'$, where the value of the unknown $Q'$ is the number that corresponds to ciphertext letter Q (after the unknown ciphertext permutation mapping).

In this way, the known plaintext/ciphertext pair ATTACK/QXUZAD yields the following linear equations:

These are linear equations over are unknowns. We get 6 equations per known plaintext/ciphertext pair. In total, we'll have $26 \times 6^2 + 26 = 962$ unknowns, and 6 equations per known text, so given 161 known plaintexts/ciphertext pairs, we should have enough information to uniquely recover the value of all of the unknowns -- and then it is straightforward to recover the key $K$ and the plaintext permutation and ciphertext permutation. That's linearization, and it should work fine here, too, with the slight tweaks mentioned at the top of my answer.

Or, use poncho's answer. Poncho's answer is a fine approach too, and in practice is probably more efficient, easier to implement, and requires fewer known texts -- so his answer is probably superior in practice. I just wanted to share some of the mathematical theory in case it interests you.