To my amazement, that post still gets tons of hits and it’s linked to from lots of different places. The problem, it’s a bit crusty and I really don’t like it that much. So, here’s a much needed update:

I’m not going to spend much time explaining everything here. Hopefully the comments in the code will do that. This example uses AES, but could be easily modified to use other encryption algorithms.

It’s pure java with no weird requirements or external libraries. It performs reasonably well compared to the stock Java implementation of PBKDF2WithHmacSHA1; sometimes it beats it, sometimes it doesn’t. I’m sure there are ways to optimize performance.

A little background: RFC 2898 defines (among other things) the PBKDF2 algorithm for generating encryption keys based on a given password. The details are unimportant, but it produces a salted password hash that is computationally impractical to crack given a sufficiently long salt over enough iterations. RFC 2989 “recommends” an iteration count of 1000 or greater (section “4.2 Iteration Count”). It also says that salts “should” be at least 8 octets (or 64 bits) long (section “4.1 Salt”).

When reading RFC’s, terminology is extremely important in implementation. The words “should” and “recommend” are very different from the word “must” (check out RFC 2119). Things that say “should” are not required to conform to the spec; however, “must” explicitly defines an absolute requirement.

In the world of cryptography, “should” things are very important. Your algorithm may not be as secure as you think if you ignore the “should” stuff, even though you have implemented everything as defined in the RFC. In the case of the PBKDF2 algorithm, simply don’t use salts less than 8 octets and less iterations then 1000. Simple as that. It’s not required by the RFC, it’s just smart.

From a developer’s standpoint, the question always looms: Do I give my users enough rope to hang themselves? When implementing PBKDF2, should I even allow salts of less than 64 bits or small iteration counts? For super secure systems, the answer probably should be “no”.

Now, as a general rule, it’s a bad idea attempt to code your own hashing algorithms. There are only a handful of people on the planet who have a firm enough grasp of modern cryptography to understand all the intricacies and implications of improperly coded cryptographic algorithms. I am not one of those people, and odds are, neither are you. So just don’t do it, especially if there’s a tried-and-true implementation available.

With that said, if you are going to attempt to code your own algorithms, there are certain test vectors that you can run through your code to make sure everything works properly. For PBKDF2 derived keys, those test vectors are defined in RFC 6070. If you are testing PBKDF2 with HmacSHA1, this is the set of data you use to test.

Sooooooooo… I’ve been playing around with PBKDF2 a bit, trying my hand at coding the algorithm in various languages, and testing the implementations that exist in various frameworks. I know I just told you not to do such things, but in fairness I’m not trying to generate the HMACs myself. PBKDF2 isn’t a hashing alogrithm, it takes a bunch of hashes and squarshes them together in such a way that would be extremely difficult and time consuming to crack the key through traditional means. And yes, I said “squarshes”.

Anyway, in testing I find that there are frameworks out there that enforce the recommended salt length specified in RFC 2898. They are incapable of running 5 out of the 6 test vectors in RFC 6070 because their salt is “salt” or “sa\0lt”; both of which are less than 64 bits in length. Like I said “UGH!!!”.

I know this is relatively minor. You can always implicitly test the validity of algorithms by running the same test data through multiple known/working implementations. It’s just the principle of the thing, ya know?

I’ve got this long standing personal project that I just can’t seem to finish. It’s this file encryption program that I only seem to find time to work on in the 30 minutes before bed time. I’ve got the core written and it’s working mostly everywhere (even on my Droid). All I’ve got left is to wrap a GUI around it, and I’ll at least have a releasable beta. I don’t care much for front-end development, so perhaps that’s why I’ve hit a roadblock… No sure.

Anyway, I’m sitting here looking at some of the high level crypto functions and I’m having trouble following the logic. I can’t tell if my own code is genius or just plain dumb.

I’m working on a little app that can do simple and portable file encryption across different operating systems. Something I can put on a thumb drive to encrypt files so that when I inevitably loose the drive, all my documents aren’t public knowledge.

Most modern operating systems offer file encryption … a lot of new external drives and flash drives come with programs that will encrypt the filesystem … but all of those things are a huge pain when you live on multiple platforms. I move files around between my Windows, Mac, and Solaris computers at work and my Linux boxes at home. I need something simple to get the job done that works on all of the above. Java lets me do this pretty easily, let’s take a look …

Cryptography is still a bit of a black art, even when looking at high level API’s like we’ll see in this example. There are any number of different ways to do this using different types of keys and algorithms. In this example, I’ll be using a simple password based symmetric key with Triple DES (DESede) encryption (PKCS #5, RFC2898). Please see the RFC doc for a definition of terminology.

Creating the cipher

The first step is generating an encryption key and using it to create a cipher:

Update: The real beauty of using IO streams like this is that you do not have to worry as much about Java’s heap space. Your memory usage rarely grows past the chunk size specified in the “doCrypto()” function; even if you are operating on huge files.