Security

The Book Cipher Algorithm

Source Code Accompanies This Article. Download It Now.

The Book cipher algorithm uses letters of subsequent words in some text or book as a key to encode messages.

Unless you're a professional cryptanalyst, writing cryptography code means meddling with "powers" you cannot fully comprehend, and seemingly insignificant slips can be fatal. During World War II, for instance, Polish and British mathematicians broke Germany's Enigma code only because the same message-key was enciphered twice at the beginning of every message. The Germans did this to avoid mistakes caused by radio interference, but at the same time, it ruined their carefully planned cryptosystem. And how many slips are there in the code that multiply big numbers, look for 1000-digits primes, and encrypt the fixed header of your document?

With the Book cipher algorithm, you're safe from these kinds of errors because it is simple enough that you can code it in a few lines of C that are completely understandable, but still extremely secure. The so-called Beale ciphers (unmuseum.org/beal.htm), which point to a location of buried treasure somewhere in Bedford county, were coded in 1885, but still have not been decoded. This secret (or maybe hoax) has occupied some of the best cryptanalytic minds. Likewise, when Simon Singh gave 10 problems in the appendix of The Code Book, problem #5 (Book cipher) was the most difficult one for the winners of the £10,000 prize (www.simonsingh.com/Cipher_ Challenge.html). Still, the Book cipher has probably never been used in commercial software.

Book Cipher Algorithms

Basically, the Book cipher algorithm uses letters of subsequent words in some text or book as a key to encode a message. Figure 1 is the simplest form, usually called the "running key cipher." In this case, text (usually from a book) is used to provide a very long key stream. The book used is agreed upon in advance, while the passage used is chosen randomly for each message and secretly indicated somewhere in a previous message. In this example, we agreed to use J.K. Rowling's Harry Potter and the Order of the Phoenix and to start on page 335, line 28, with the sentence, "Hermione bit her lip and did not answer." We write this text under the plaintext and use it as the running key. The particular message to send is "DRDOBBS." We XOR the corresponding characters of the message and the running key to get the ciphertext 12 23 22 2 11 13 29.

The running key cipher is much better than the famous Vigenère cipher because we do not repeat the keya book is hopefully long enough to encode everything we have to say. However, security is still poor because the entropy per character of both the plaintext and the running key is low, and the combining operation is relatively easily inverted. By guessing probable plaintexts along the ciphertext, attackers will eventually recognize the book and break the code.

A better idea is to replace words in the plaintext with the location of words from a book. For example, to encode the word "computer," you look for the first appearance of "computer" in the previously chosen book and enter its position as the cipher text. The second appearance of the word "computer" is replaced by the position of the second appearance of that word in the book, and so on. The real problem with this approach is finding the word: If you agree to use David Copperfield as a code book, and then try to encrypt an article about hash functions, you are unlikely to find all the necessary words.

An alternative approach that gets around this problem is to replace individual letters rather than words, in which case the Book cipher is properly a cipher. Figure 2 illustrates the concept. We are encoding a message "DRDOBBS" using the same passage from Harry Potter and the Order of the Phoenix. To code the letter "D," we look for the first word in the passage starting with "D" (it's the 6th word, "did"). Then we look for the first word starting with "R" (the 11th word, "rang"), then for the next word starting with "D" (the 16th word, "down"), and so on. The final ciphertext is 6, 11, 16, 17, 2, 10, 15.

[Click image to view at full size]

Figure 2: Encoding the message "DRDOBBS" using the Book cipher.

Decoding is even simpler. We take the ciphertext number-by-number and look for the corresponding words in the book, generating the original plaintext.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!