Substitution ciphers

It consists the following types of classical ciphers: simple (or mono-alphabetic) substitution, polygram substitution, and homophonic substitution. The difference between codes and ciphers is also noted.

Mono-alphabetic substitution

Suppose the ciphertext and plaintext character sets are the same. Let m = m1m2m3…
be a plaintext message consisting of juxtaposed characters mi E A, where A is some fixed
character alphabet such as A = {A,B…. Z}. A simple substitution cipher or monoalphabetic substitution cipher employs a permutation e over A, with encryption mapping Ee(m) = e(m1)e(m2)e(m3)… . Here juxtaposition indicates concatenation (rather than multiplication), and e(mi) is the character to which mi is mapped by e.

Recognizing simple substitution. Mono-alphabetic substitution alters the frequency
of individual plaintext characters, but does not alter the frequency distribution of the overall character set. Thus, comparing ciphertext character frequencies to a table of expected letter frequencies (unigram statistics) in the plaintext language allows associations between ciphertext and plaintext characters.

Polygram substitution

A simple substitution cipher substitutes for single plaintext letters. In contrast, polygram
substitution ciphers involve groups of characters being substituted by other groups of characters. For example, sequences of two plaintext characters (digrams) may be replaced by other digrams. The same may be done with sequences of three plaintext characters (trigrams), or more generally using n-grams.

Homophonic substitution

The idea of homophonic substitution is for each fixed key k to associate with each plaintext unit (e.g., character) m a set S (k, m) of potential corresponding ciphertext units (generally all of common size). To encrypt m under k, randomly choose one element from this set as the ciphertext. To allow decryption, for each fixed key this one-to-many encryption function must be injective on ciphertext space. Homophonic substitution results in ciphertext data expansion.

In homophonic substitution, |S (k, m) |should be proportional to the frequency of m in
the message space. The motivation is to smooth out obvious irregularities in the frequency distribution of ciphertext characters, which result from irregularities in the plaintext frequency distribution when simple substitution is used.

While homophonic substitution complicates cryptanalysis based on simple frequency
distribution statistics, sufficient ciphertext may nonetheless allow frequency analysis, in
conjunction with additional statistical properties of plaintext manifested in the ciphertext.
For example, in long ciphertexts each element of S (k, m) will occur roughly the same number of times.

Codes vs. ciphers

A technical distinction is made between ciphers and codes. Ciphers are encryption techniques which are applied to plaintext units (bits, characters, or blocks) independent of their semantic or linguistic meaning; the result is called ciphertext. In contrast, cryptographic codes operate on linguistic units such as words, groups of words, or phrases, and substitute (replace) these by designated words, letter groups, or number groups called codegroups.

The key is a dictionary-like codebook listing plaintext units and their corresponding codegroups, indexed by the former; a corresponding codebook for decoding is reverse-indexed.

When there is potential ambiguity, codes in this context (vs. ciphers) may be qualified
as cryptographic codebooks, to avoid confusion with error-correcting codes (EC-codes)
used to detect and/or correct non-malicious errors and authentication codes which provide data origin authentication.

Several factors suggest that codes may be more difficult to break than ciphers: the key
(codebook) is vastly larger than typical cipher keys; codes may result in data compression; and statistical analysis is complicated by the large plaintext unit block size. Opposing this are several major disadvantages: the coding operation not being easily automated (relative to an algorithmic mapping); and identical encryption of repeated
occurrences of plaintext units implies susceptibility to known-plaintext attacks, and
allows frequency analysis based on observed traffic. This implies a need for frequent re-keying (changing the codebook), which is both more costly and inconvenient. Consequently, codes are not commonly used to secure modern telecommunications.