Occasionally, I like to pick a random interesting topic that’s entirely unrelated to my work, and read up on it. Recently, it has been polynomial factoring and computer algebra in general, which I’d like to post about when I have the time. As a side note, I’ve also been meaning to write a quick expository article on Dantzig’s simplex algorithm for linear programming, and use Haskell as a sort of a specification language for it. Unfortunately, I’ve been rather busy as of late (what else is new), so here’s something else entirely.

I was reading up on DES while waiting for a gigantic Perforce sync over VPN, and something struck me as interesting: DES is a Feistel network, and so is any number of other moden ciphers. This suggests abstraction: can we write a generic Feistel network, and then implement a variety of ciphers in terms of that? This post is basically the result of posing that question. I wanted to cover both DES and AES in a single post, but just presenting DES took a fair bit of text, and then I realized that AES is not Feistel. The post does, however, present a complete implementation of DES and 3TDES, and I’d like to follow it up with other ciphers later on.

As a word of caution, literally all I know about cryptography, DES, Feistel ciphers, or anything else in that field comes from about two days’ worth of reading up on it while waiting for builds and syncs. Feistel networks rely on a cryptographically-strong pseudorandom function. An incorrect implementation of that function may let you encrypt and decrypt plaintext, but do so in a way that’s potentially cryptographically useless. If you use any of the code here (or in any other of my posts, for that matter) for anything remotely sensitive, please make sure that you verify it against a description of the algorithm that is known to be correct.

A Feistel network is a block cipher: that is, it’s a cipher that acts on fixed-length blocks. It’s described by the number of rounds (iterations), a set of subkeys, one for each round (also called a key schedule), and two functions: (+) and f. The former is commonly bitwise addition modulo 2 (exclusive OR, in other words), while f is the so-called round function, which we’ll talk about shortly. Thus specified, the resulting Feistel network is a process that takes a block, splits it into two equal halves, call them L_0 and R_0, and proceeds to apply the following iterations to it, one per round:

If n is the number of rounds, then is the final ciphertext, which we’ll usually merge back into a single number. The process can be interpreted as follows: at each iteration, the right half R crosses into the left half L, and the left half L is scrambled using (+) and f, and crossed into R. One half is always “more” encrypted than the other, by one round. The idea is that the round function f is something weird and non-invertible — to be precise, f is a cryptographically secure pseudorandom function with as the seed.

Michael Luby and Charles Rackoff showed that if this is the case, then four rounds are sufficient to make the corresponding Feistel network a strong pseudorandom permutation, meaning that it remains pseudorandom even if the inverse permutation is discovered. This is a good property to have, since otherwise you can’t publish the algorithm; besides, cryptographers appear to be fond of providing hypothetical adversaries with access to an omniscient oracle who knows the details of the algorithm. After four rounds, the oracle doesn’t help. Plain DES uses 16 rounds. Triple-DES uses 48.

The ability to decrypt a ciphertext hinges on the definition of (+), in that we need to be able to reverse the process as follows:
The reason this works is that (+) is picked so that if one of the parameters is held constant, the function is its own inverse — so we’re backtracking from the last subkey back to the first. Exclusive OR works here, but we could pick other functions as well (although, it seems that anything more complex than XOR could as easily be absorbed into f instead — do Feistel ciphers exist that have a mixing function other than XOR?).

Alternatively, we can reverse the process by reversing the key schedule K, swapping L and R, and using the same network as we did for encryption. DES swaps the L and R halves before joining them into the final ciphertext, so it can be inverted by simply reversing the key schedule. Any generic description of a Feistel cipher is going to be a higher-order function, since it takes the functions (+) and f as arguments. Let’s write it:

As a side note, in C++, I would write the above as an abstract base class with pure virtuals for (+) and f. That sort of translation seems to crop up fairly often.

The above is hopefully a fairly straightforward implementation of the verbal description. We want to parametrize our Feistel implementation over all fixed-width types with bitwise operations on them, so we require that the type of a data block is in Bits. We then ask for the mixing function (+), the round function f, the block itself, and the key schedule. We split the block in half, and apply the iterations as described earlier (note that we infer the number of rounds from the key schedule). Since each round takes the results of the previous round, a straightforward way to implement the process is to describe a single round, and then left-fold the key schedule over it.

Finally, to make the bit fiddling easier for subsequent applications, we generalize splitting a block into halves and merging it back, by writing the functions split and merge.

A Feistel cipher is, more generally, a type of a product cipher. Product ciphers are block ciphers that execute, in sequence, a series of relatively simple transformation of the plaintext block. Commonly, these transformations include bitwise permutations (P-boxes), substitutions (S-boxes), and linear mixing (our (+) function). In the case of DES, there is a handful of P-boxes, 8 S-boxes, and XOR for mixing.

A permutation box is simply a bitwise permutation: we shuffle the bits around according to some table. Most of the permutation boxes in DES are invertible, but some are not. Let’s write the code for applying a permutation box in general.

Note the (n+k) pattern in shuffle: the DES documentation I’ve read uses the convention that LSB is bit 1, not bit 0, and I didn’t want to have to convert each table. This is fairly straightforward as well, but if you don’t read Haskell, the idea is as follows. We take a table of bit positions, presented as a list: so [4,2,1] means that bit 1 is shuffled into position 4, bit 2 remains the same, and bit 3 is moved to position 1. We decorate that table with the bit positions, by saying zip table [0..], which evaluates to a list of pairs (destination_bit, source_bit). Finally, we run over that list, and set bits as appropriate.

The above is almost everything we need to implement DES, and we haven’t even discussed the algorithm. I’ll use the implementation of single-block encryption as a way of introducing the process.

Let’s go through this step by step. To get some of the undefined functions out of the way, applyIp and applyFp are P-boxes, where Ip stands for Initial Permutation, and Fp stands for Final Permutation (also called Inverse Initial Permutation). Similarly, ebitSelect and pPerm are P-boxes which are applied at various stages of computation of f. Finally, applySboxes applies, as the name suggests, the S-boxes. The details of box implementations mostly consist of table data, so we’ll concentrate on the algorithm first.

First, feistel is evaluated, with xor as the mixing function, f as the round function, block with “Initial Permutation” applied as the data block, and the input key schedule (note that we haven’t yet discussed how the key schedule is computed either). The crux of the algorithm is then the round function f. As specified by feistel, the function takes R_i and K_i, and does something dodgy to them. The process is this:

a) Permute R_i using the ebitSelect P-box.
b) XOR the result with K_i.
c) Split the resulting 56-bit value (see below) into eight 6-bit values B_1 … B_8.
d) Run B_1 … B_8 through the corresponding S-boxes.
e) Run the result through the pPerm P-box.

As I mentioned in the introduction, my knowledge of cryptography is limited to what I’ve read in the past couple of days, so I can’t, unfortunately, detail the requirements on the specific P-boxes or the S-boxes — I understand that the combination of them needs to make f a cryptographic PRNG in K_i, but I don’t know enough about PRNGs to comment on why those particular transformations, in that particular sequence, are the right thing to do. My guess is that the S-box values are picked to avoid short cycles, and the two permutations lengthen the cycles further.

Once the Feistel network is applied, we apply another P-box, applyFp, the “Forward Permutation.” This is the inverse of “Initial Permutation.” In addition to this, we use a new function, merge’, to merge the results into a single ciphertext block while swapping L and R — which is a simple but subtle detail. Recall how while discussing the deciphering stage of a Feistel network, we noted that deciphering can be done by simply reversing the key schedule, and swapping L and R. Since the swap is performed at this stage, the decryption function is simply the encryption function with the key schedule reversed. This is nice, because des keys encrypts a block, while des (reverse keys) decrypts it.

Here, pc1 is yet another P-box, called “Permuted Choice 1.” Several new functions have made an appearance: functions to prepare the plaintext (convert it to 64-bit words), read plaintext from a list of 64-bit words, and compute a key schedule from a key.

We’ll jump ahead a little bit and divulge a major detail of the pc1 P-box. The original DES algorithm operates on 64-bit keys, but the top bit of every byte is used as a parity bit, thereby reducing the actual key size to 56. This is why the S-box step was dividing the key into eight 6-bit blocks: 8*6=56. The key schedule splits that 56-bit value into two 28-bit halves, rotates them 1 or 2 bits to the left depending on a table, merges the result into a key, and feeds that into the next iteration. There are 16 iterations total:

The first function takes a list of characters and converts them into a list of 64-bit words. The second function reverses the process. Ideally, this would be bit-width aware (in order to support different character encodings), and probably work on ByteStrings instead of [Char], but those are trivial changes.

With the exception of the tables for P-boxes, the only bit left is the S-box implementation. The S-box process is slightly weird in DES. Recall that we called applySboxes on a list of 6-bit words, B_1 through B_8. There are eight S-boxes, call them S_1 through S_8. Each S_k is a 4×16 array of values. The “substitution” part of the S-box comes from the fact that we substitute each B_k for S_k[i][j], where i is the 2-bit value composed of the first and last bit of B_k, and j is the 4-bit value in the middle. These substitutions are then combined to recreate a 56-bit value. The above can be stated as follows:

Everything infers the number of rounds from the key schedule, so by pasting three different key schedules on top of each other, we get triple DES with no further work. It’s trivial to actually generalize the encrypt…/decrypt… functions over any number of keys, although I don’t know whether the benefits start to erode after triple DES.

I really did not intend this blog to become a repository of Haskell code snippets, but I’ve been rather busy as of late, and writing toy code while waiting for a compile to finish has somehow become my primary means of entertainment. Here is the latest.

Arithmetic coding is a remarkably simple and clever thing. The idea is that given some half-open interval [a,b), that is, the interval a <= x < b, we can partition it into half-open subintervals, such that there is one subinterval per character in the message to be encoded, and the lengths correspond to the character frequencies multiplied by b-a. The same procedure is applied, recursively, to each subinterval, resulting in an infinite hierarchy of coverings of the original interval — call it S. Now, if we throw a rock at S, record the point where it hit, and follow the interval hierarchy, we’ll come up with a unique infinite string of characters.

To construct the actual encoding, set S to [0,1), and find out which subinterval S_1 the first character of the message falls into. For the second character, let S_2 be the appropriate subinterval of S_1, for the third character, let S_3 be the appropriate subinterval of S_2, and so on; if we repeat this procedure as many times as there are characters, we’ll arrive at some interval S_n. Numbers that fall in this interval have a useful property: given any such number, call it x, we have x in S_{n-1} (since x is in S_n, and S_n is a subinterval of S_{n-1}), x in S_{n-2} by the same argument, and, by induction, in every subinterval that we picked while encoding the message. Any such x, therefore, uniquely encodes the message: to decode, simply follow the hierarchy.

The last test shows the output of ‘encode’ : the length of the message is 2000 characters, this is followed by character distributions (in a practical setting, frequencies would be returned instead of explicit intervals), and finally the encoded message. The entire 2000 byte string is encoded in the fraction 9/85.

Toy code follows. As mentioned earlier, ‘encodeToStream’ is a helper function that breaks the fraction into a pair of lists of bytes; the actual encoder and decoder consist of just ‘encode’, ‘decode’ and ‘freqRanges’, weighing in at 23 lines of code including type annotations and line breaks. Gotta love Haskell.

Wrote a very basic prover for theorems in propositional logic while waiting for a build to finish. Not terribly exciting, but should be relatively easy to extend to first-order logic and/or turn it into a constructive prover by adding a DPLL step. I’ve tested it on a proof by contradiction for the hypothetical syllogism ((a => b) /\ (b => c)) => (a => c), and on some common inference rules. These are the tests:

The algorithm consists of bringing the expression into conjunctive normal form (I’m simultaneously compiling into a desugared core language), and applying a set of resolution steps. The resolution steps consist of trivially rejecting things like A \/ ~A \/ …, simplifying things like A \/ B \/ A \/ …, and merging expressions of the form (P \/ A \/ …) /\ (~P \/ B \/ …) into A \/ B \/ … . The process clearly terminates; when it does, the resulting expression is what has been inferred from the conjecture. If the result is a contradiction, then the conjecture is false, if the result is empty then nothing can be inferred, and if it’s non-empty, then we’ve proven an inference rule.

I came across this on someone’s blog about a week ago, unfortunately I can’t remember who it belonged to. I recall seeing the press release back in 99, the idea was to a) come up with a message to be broadcast in the direction of nearby stars, b) devise a universal but error-correcting way of encoding that message, and c) perform the actual broadcast. What I didn’t see at the time was the actual message in a graphics format — interstellar communication aside, decoding it makes for a nice time waster. I got to page 3 in about half an hour, before remembering that I’m in crunch mode and should probably be coding instead screwing around; for the rest of us, here’s the site.

So, let’s say we take the standard definition of the derivative,,
look at it for a bit, and decide that we don’t like the limit symbol, and that, in fact, we’re going to drop it entirely. After rearranging, we would then obtain the weird-looking f(x) + d f'(x) = f(x+d), and, presumably, set out to find what d could possibly look like.

To this end, we might expand f(x+d) about x, which yields

and, after subtracting f(x) + d f'(x) from both sides, degenerates into

It appears that d^n should be zero for all n>1. To see this, we can plug in f(x) = e^x. The coefficients of d^n become constants, and, dividing both sides by e^x, we obtain d^2/2! + d^3/3! + … = 0, which is the MacLaurin series for e^d with the first two terms missing, so e^d=d+1. Ignoring, momentarily, the troublesome fact that this gives d=0 in the reals, we certainly at least have d^2=0 (pardon the handwaving).

But in order for f(x) + d f'(x) = f(x+d) to be a remotely interesting statement, we must also have d != 0, and we want d to be unique. Given some structure whose objects we’d care to differentiate, we’re going to cheat a little, and extend it with the element d such that d != 0, but d^2 = 0. Having allowed such a number, all the weirdness disappears, the equality f(x+d) = f(x) + d f'(x) holds, and we’re left with a sort of infinitesimal constant which we’re free to plug into random things.

The derivative of Q(x) “fell out” into the coefficient of d. How about e^{x+d}?

Anything else? Let’s see. By the binomial theorem,.

The d^{n-k} factor vanishes whenever n >= k+2, so

and we’ve just obtained the power law.

Trigonometric functions:

Using the power law experiment from earlier, we get

Plotting coefficients of d along the vertical axis, and the reals along the horizontal, we get the unit circle, while exponential maps are lines through the origin, which is kind of cool in and out of itself.

Ratios:

One useful thing here is that d is small enough to be in any radius of convergence, so we get logarithms “for free”:

So, what’s the point? The point is that we automatically obtain the derivative as a side effect of any computation, since f(x+d) = f(x) + d f'(x). In other words, by switching to dual numbers (numbers of the form x+y d) for things that need to be differentiated (and making it transparent through operator overloading), we can ask for the derivative of any differentiable function we’ve ever defined, and it’ll be evaluated symbolically.