Cryptanalysis of classical ciphers

Language statistics

Cryptanalysis of classical ciphers typically relies on redundancy in the source language
(plaintext). In many cases a divide-and-conquer approach is possible, whereby the plaintext or key is recovered piece by piece, each facilitating further recovery.

Mono-alphabetic substitution on short plaintext blocks (e.g., Roman alphabet characters)
is easily defeated by associating ciphertext characters with plaintext characters. The frequency distribution of individual ciphertext characters can be compared to that of single characters in the source language (estimated from 1964 English text). This is facilitated by grouping plaintext letters by frequency into high, medium, low, and rare classes; focusing on the high-frequency class, evidence supporting trial letter assignments can be obtained by examining how closely hypothesized assignments match those of the plaintext language. Further evidence is available by examination of digram and trigram frequencies. Figure gives the most common English digrams as a percentage of all digrams; note that of 262 = 676 possible digrams, the top 15 account for 27% of all occurrences. Other examples of plaintext redundancy appearing in the ciphertext include associations of vowels with consonants, and repeated letters in pattern words (e.g., "that", "soon", "three").

Cryptanalysis of simple transposition ciphers is similarly facilitated by source language
statistics. Cryptanalyzing transposed blocks resembles solving an anagram. Attempts to reconstruct common digrams and trigrams are facilitated by frequency statistics. Solutions may be constructed piecewise, with the appearance of digrams and trigrams in trial decryptions confirming (partial) success.

Cryptanalysis of polyalphabetic ciphers is possible by various methods, including Kasiski's method and methods based on the index of coincidence, as discussed below.