Simple Substitution Ciphers

Authors: Chris Savarese and Brian Hart '99

These hieroglyphics have evidently a meaning. If it is a purely
arbitrary one, it may be impossible for us to solve it. If, on the
other hand, it is systematic, I have no doubt that we shall get to the
bottom of it. -- Sherlock Holmes in The Adventure of the Dancing
Men

A cipher is a method for encrypting a message -- i.e.,
for transforming the message into one that can't be easily read. The
original message is called the plaintext or clear and
the encrypted message is called a cryptogram or
ciphertext. A substitution cipheris one in which each
letter of the plaintext is replaced by some other symbol. Usually the
replacement symbols are themselves letters of the alphabet, but this
needn't always be the case as we see in The Dancing Men where
the replacement symbols were hieroglyphics.

An alphabet is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. is an ordered set of symbols. For example, the normal English alphabet consists of the symbols {A,B,C,...,Z}. A simple substitution is one in which each letter of the plaintext is always replaced by the same ciphertext symbol. In other words, there is a 1-1 relationship between the letters of the plaintext and the ciphertext alphabets.

For the normal English alphabet, how many different ciphertext
alphabets can we get if we use the same letters? In other words, in
how many different ways can we permute or rearrange the English
alphabet? The answer is 26!. That's approximately equal to the number
4 followed by 26 zeros. To understand how we got that number imagine
that you are given the task of making an arbitrary permutation of the
English alphabet. You have to make 26 choices. On the first choice you
can choose any one of the 26 letters in the alphabet. On the second
choice you can choose any one of the remaining 25 letters. On the
third choice you can choose any one of the remaining 24 letters. And
so on. On the last choice, there is just one letter remaining. So, in
all there are 26! = 26 x 25 x 24 x ... x 1 different ways to make
these choices.

Although there are 26! possible ciphertext alphabets, any fan of
puzzle books or newspaper cryptograms knows that simple substitution
ciphers are relatively easy to break by hand by analyzing letter
frequencies and guessing at common words. The nine most frequent
letters in English are E,T,N,A,O,R,I,S, and H. The five letters that
occur least often are J, K, Q, X, and Z. Generally, we would need a
letter of considerable length in order to make very good use of our
knowledge of letter frequencies. For example, consider the following
secret message:

TK IL KQ JKT TK IL TBST CR TBL OULRTCKJ

In this message the most frequent letter is 'T'. If we assume that
T=E, this gives

E- -- -- --E E- -- E--E -- E-- ----E---

which isn't very helpful. One problem in this case is the patter E-
and the pattern E--E. Since there are relatively few two letter
English words beginning with E, this throws our hypothesis that T=E
into doubt. Similarly, there aren't many English words that would fit
the E--E pattern. Can you think of any?

Another kind of knowledge that we can use to solve this cryptogram is
that the most frequent two letter words in English are:

OF TO IN IS IT BE BY HE AS ON AT OR AN SO IF NO

Since there are so many two letter words in the message that begin and
end with K, perhaps a better hypothesis would be that K=O. If we try
this substitution, we get

-O -- O- -O- -O -- ---- -- --- ------O-

Since the second most frequent letter in English is T, perhaps another
useful hypothesis would be that T=T -- i.e., that T stands for
itself. That would give us

TO -- O- -OT TO -- T--T -- T-- ----T-O-

which is starting to look a bit more promising. Not in this case the
T--. The most common three letter word in English that starts with a T
is THE. If we make the guess that B=H and L=E, we now get

TO -E O- -OT TO -E TH-T -- THE --E-T-O-

This is starting to look better. The patter TH-T looks very much like
the word THAT. The patter -OT looks very much like the word NOT. If we
make the additional guesses that S=A and J=N we get

TO -E O- NOT TO -E THAT -- THE --E-T-ON

The last word in the message ends in the patter T-ON, which looks very
much like the pattern TION. If we make the guess that C=I, we get

TO -E O- NOT TO -E THAT I- THE --E-TION

We now have something that looks very much like something Hamlet might say:

TO BE OR NOT TO BE THAT IS THE QUESTION

As this example shows, even though there are 26! ways to create a
simple substitution cryptogram, we can usually crack even very short
messages by making judicious use of our knowledge of English,
including knowledge of letter and word frequencies, pattern words such
as 'the' and 'that', and by making a series of guesses of the form
'the ciphertext letter 'K' is the plaintext letter 'O'. There are
simple ways to make simple substitution cryptograms more
difficult. One way is to remove the word boundaries. For example, if
the above message were written as:

TKILK QJKTT KILTB STCRT BLOUL RTCKJ

it would be much more difficult to use our knowledge of two and three
letter words to solve the cryptogram. The encrypted message is more
secure.

For Further Study and Enjoyment

Cryptogram
Tool. Try your hand at deciphering simple substitution cryptograms
with the help of this simple Java applet. (Requires a Java-compatible
browser.)

CryptoToolJ.
Try using CryptoToolJ to create and analyze your own simple substitution
cryptograms.

Sherlock Holmes.
One of the best accounts of solving a simple substitution cryptogram is the Sherlock Holmes story The Adventure of the Dancing Men . Sherlock Holmes explains in detail how one solves a simple substitution cryptogram.

Edgar Allen Poe. Edgard Allen Poe had an
intense interest in cryptography and believed that breaking ciphers
and other enigmas only required the straightforward application of
reason and logic. According to David Kahn, author of
Codebreakers, Poe's story The Gold Bug "remains
unequaled as a work of fiction turning upon a secret message." Visit
the Poe page on this site.