Writing Wrong

The
substitution cipher
is one of the oldest encyption schemes. The idea is so simple that I
inadvertently invented it myself long ago: my handwriting is so bad that
readers often confuse some of my letters for others! Substitution ciphers just
do this deliberately.

We work with the uppercase Latin alphabet. Our code leaves other characters
untouched.

abc=['A'..'Z']

A substitution cipher is defined by a permutation \(\pi\) of the alphabet.
The encryption of a letter \(x\) is \(\pi(x)\). We represent permutations with
words, that is, we write the permutation \(\pi\) as \(p_A p_B ... p_Z\) where
\(p_x = \pi(x)\).

Given a permutation p, we can write functions to encrypt and decrypt
with the corresponding substitution cipher as follows:

subpx=fromMaybex$lookupx$zipabcpunsubpx=fromMaybex$lookupx$zippabc

The keyword cipher is a folk method for constructing a permutation.
We take a keyword and append the alphabet, removing any duplicate letters.
To guard against bad inputs we filter so that only uppercase letters are kept:

We chose our examples carefully. Few English words encrypt to other English
words. [Exercise: write code to find them all from a given dictionary.]

A Caesar cipher or Caesar
shift is a substitution cipher where each letter is replaced by the letter
that is a fixed number of positions ahead, wrapping around the alphabet if
necessary. For example, if A maps to D, then B maps to E, C to F, and so on.

We represent a Caesar shift by the letter to which A is mapped. This letter
is the secret key.

Polyalphabetic Substitution

The
Vigenère cipher is a
repeated sequence of Caesar shifts. Given a key \(k_1...k_n\), for all \(q \in
[0..], r \in [1..n]\) the \((q n + r)\)th letter is shifted by \(k_r\). It looks
better in Haskell:

Our code is inconsistent. Earlier, our functions worked on one character at a
time, but now they expect entire strings. This might seem unavoidable because
of polyalphabetic substitution, that is, because the permutation used depends
on the position of the character in our plaintext.

However, we can still write an encryption function that operates on a single
character at a time as long as we hang on to some extra state, namely, which
letter of the keyword to use next. And Data.List.mapAccumL is tailor-made for
calling such a function in order to encrypt a string:

On the first keypress, the rotor rotates by one letter, so that the wire from A
to E now goes from Z to D, the wire from B to K now goes from A to J, and so
on. Then current flows from the letter that was typed to the encryption of that
letter through the wire that connects them,

This repeats for subsequent letters: the rotor rotates by one letter, then
electric current flows from the letter that was struck to determine its
encryption. Thus encryption for one wheel can be described as follows:

We use letters to denote how far a wheel has rotated, from A for 0 rotations
to Z for 25 rotations.

Above, we really should have used shift 'B' abc instead of abc because the
rotor turns before the first letter is enciphered. On a real machine, if we
started with the wheel in the A position (no rotations), we would get "JKCHB";
we would get "EJKCH" by starting in the Z position. However, this is a trivial
off-by-one issue that we’ll fix later.

Assuming the adversary obtains a copy of the wheels, the secret key
is the choice of wheel and its initial position.

Early Enigma

The earliest Enigma machines consisted of three rotors connected in series. The
first rotor rotated by one letter for each keypress. The second rotor rotated
when the first rotor completed a full revolution, that is, every 26 keypresses.
Similarly, the third rotor rotated when the second rotor completed a full
revolution, that is, every \(26^2\) keypresses:

Actually, the above is slightly incorrect for a couple of reasons we shall
explore later (excluding the off-by-one bug described above), but suffices
for a first approximation.

Assuming the adversary obtains a copy of the wheels, the secret key
is the choices of the wheels, their ordering, and their initial positions.

Self-inverse Engima

From a user’s perspective, a drawback of the original Enigma machine is that
decryption is different to encryption. If the encryption function were its own
inverse (an
involution), then the
machine would be simpler to use: whether encrypting or decrypting, just set up
the rotors and type away.

Thus a reflector was introduced. This is a hard-wired substitution cipher
that is its own inverse. For example, if A maps to Y, then Y must map to A.
For reasons we will explain below, the reflector must map each letter to a
distinct letter. For example, despite being self-inverse, the identity
permutation is an invalid reflector.

In mathematical terms, the permutation is a product of thirteen 2-cycles.

Then on each keypress, after the rotors have turned, the electrical current
is sent through the wheels in one direction to the reflector and then sent
back through the wheels in the opposite direction. This why the reflector
permutation must have no fixed points: current can only travel in one
direction on a wire.

The reflector makes the Enigma cipher its own inverse: the letter on one end
of the path taken by the current swaps with the letter on the other end.
The encryption of a letter must be a different letter, a weakness gleefully
exploited by cryptanalysts.

In mathematical terms, two permutations have the same cycle structure if and
only if they are conjugates, though here we only need one direction of this
fact.

Because of the physical layout of the machine, if we use the “default”
setting and place the wheels I, II, and III from left to right, then the
current goes through wheel III, then II, then I, then the
reflector, then back through I, then II, then III:

Our code almost simulates the Enigma I with its wheels set to AAZ (not AAA due
to the off-by-one bug).

Dubstep

We’ve almost recreated an Enigma machine. Some differences stem from mechanical
engineering.

In a computer program, turning the second rotor for every 26 turns of the first
rotors might be implemented with a counter. With physical rotors, an elegant
solution is to carve a notch in the wheel that causes the next wheel to turn.
This notch is always in the same place, so we do not always reach it on the
26th keystroke. Instead, the first time we reach the notch depends on the
starting position of the rotor (and afterwards we reach the notch again every
26 turns).

A related problem caused by the notches is the double stepping anomaly.
We ignore the mechanical
details, and just state its effects. If we reach the notch on the middle
wheel, then the middle wheel turns when the right wheel turns. Hence the name
“double stepping”: after a keystroke causes the middle wheel to turn to its
notch, the next keystroke causes it to turn again, past its notch (which in
turn will cause the left wheel to turn).

It’s as if we had a malfunctioning 3-digit counter:

...
0 8 7
0 8 8
0 8 9
0 9 0
1 0 1
1 0 2
...

Some Enigma variants feature rotors with multiple notches.

Round and Round

One more subtlety. We have been using letters to indicate how far a wheel has
rotated. These are called the indicator settings or the Grundstellung. On
Enigma machines, each rotor is labeled with the letters of the alphabet on an
index ring and the indicator settings appear in a row of little windows.

It turns out we can also rotate the wiring relative to the index ring, and we
also denote the extent of such a rotation with a letter. These are called the
ring settings or the Ringstellung. Grundstellung and Ringstellung
rotations are measured in opposite directions.

We can account for the ring settings by modifying our shift offset. However,
the notch is a feature of the index ring, and not the wiring. The propagation
of rotation to the next wheel always occurs for the same letter in the
indicator window, but the wiring inside the current wheel may be in a different
position. Our code must handle this correctly.

Military Enigma

To make life harder for codebreakers, the German military augmented commercial
Enigma with an “aftermarket” part known as the Steckerbrett or the
plugboard. This allowed users to plug in cables to swap up to 13 pairs of
letters just before a signal enters the rotors and just after it leaves.

In other words, it is a self-inverse permutation applied before and after
the standard Enigma cipher. Unlike the reflector, the Steckerbrett may have
fixed points, that is, it can be any self-inverse permutation.

All these extra details encourage us to introduce a data structure to hold
the state of an Enigma machine:

The rotors list holds descriptions of the rotors from left to right on
a physical machine; the electrical signal from a keystroke enters the
rightmost wheel first. Each rotor is a pair of strings: the first string
is a permutation of the alphabet as a word, and the second describes all
notches on the rotor.

It remains to write wrappers. For an uppercase letter, the enigmaChar
function advances the machine one iteration then finds the encryption of the
letter. Otherwise we just leave the machine alone and return the input
character unchanged. The enigma function passes this function to mapAccumL
to encrypt strings.

The beginning of this webpage features a simulation of an Enigma machine with
rotors I, II, and III from left to right, the B reflector, and no plugboard.
The ring and indicator settings are initially both AAA, but these can be
adjusted by dragging the black rings or boxes on the wheels, or by typing
in the text areas.

I built it on top of the code throughout this article, also with Haskell, and
compiled it to JavaScript with Haste. Hopefully,
my simulation agrees with other online Enigma simulations: