nskein - a Skein implementation in .NET

08/03/09

Quick Summary

nskein is a managed implementation of the Skein hash function. Unlike other implementations, it is optimized for readability and works well with other parts of .NET. You can find the code (which currently passes all the official test vectors) at the github repo.

Background

A week ago, I was pretty exhausted from a long stretch of hectic work-related stuff and book writing at home. I decided to take a mini-break. What better way than to go write some fun code and stretch those coding fingers a bit? I have been meeting Niels Ferguson for some work related issues. Niels officially works on Windows but teams across Microsoft go to him and a few other crypto-luminaries for advice. Anyhow, during our conversation, Niels casually mentioned that he had an entry in the NIST SHA-3 contest. This happened at around the same time as my need for a mini-break and looking at his submission, I thought it would be the perfect project to do over a weekend.

Skein

Skein ( website, wikipedia, paper) is an entrant in the NIST SHA 3 function and is one of 14 algorithms to make it to Round 2. Its authors list reads like a who’s who of the crypto world - Bruce Schneier and Niels Ferguson being a couple of the names with mainstream recognition.

For a hash function, it is elegant and simple. And fast. At its core, it is built on a ‘tweakable’ block cipher called Threefish. This blockcipher is repeated for a specified number of rounds depending on the input block size. Skein favors many simple rounds over a few complex rounds. In keeping with this spirit, the core work done by each round is very simple - rotation by a pre-defined constant and a XOR operation.

It is then chained together in a iteration mode which provides the ‘compression’ part of a hash function, mapping an arbitrary number of input bits to a known length of output bits. The ‘tweakable’ part of the algorithm really kicks in here. The tweak, used as an input to every iteration of the UBI (Unique Block Iteration) ensures that every block is processed differently. Even with everything else being the same, differing tweaks lead to a difference in output from that iteration.

The official submission has fast implementations for 32-bit and 64-bit platforms. Searching around a bit, I found implementations in various other languages as well, including one in .NET. In fact, I found one on Google code which seems to have been written the same time I was writing mine! Most of these re-implemented the reference implementation. Though blazing fast, this wasn’t great for me as this didn’t lend itself to the most readable code.

nskein Implementation

I took a different approach, one that turned out had already been taken in Java. In fact, Maarten’s implementation proved helpful when I got stuck in places. This is essentially a re-implementation as reading the paper. This implements only Simple Skein but implements it for all block and output sizes. Tree mode hasn’t been implemented yet but I expect that it won’t be too much work if someone else wants to do it.

I strongly advise folks to not use this in production code. Apart from the fact that this has no performance optimizations whatsoever, it hasn’t been vetted from a cryptographic perspective. And in the crypto world, you generally don’t want to be on the bleeding edge, you want to stay with tried-and-trusted. Most of all, the algorithm itself has some ways to go and several years of cryptanalysis in its future before we can be comfortable using it (that’s what the competition is for). However, the implementation is great if you’re looking to roll your own implementation or just want to see how one of these things work for fun.

There are several potential performance improvements. I didn’t do any of them, focussing on just getting the test vectors to pass. At a minimum, several calculated values can be replaced with precomputed ones from the paper (the initial UBI iteration). Also, the inner loops for the Threefish cipher’s rounds should be unrolled and the functions hand-inlined. I’m also making more byte array allocations than I would really like.

Code and Usage

You can find the code at Github. The main types derive from System.Security.Cryptography.HashAlgorithm so you should be able to ‘plug’ it into other parts of the crypto stack. For example, you can directly use it to generate HMACs. Using it directly is simple as well - the code below shows a simple example.

And before I forget, I should mention that the code is in the public domain. Feel free to do whatever you want with it. Just don’t sue me :-). Also, I would appreciate a note if you use/modify it in any way as I’m genuinely interested whenever someone uses my code.