The above code is based on real password hashing code that I found in the wild.
There’s a subtle weakness in this hashing mechanism – can you see it?

The What

Technically, I’m being unfair here – I didn’t give you all the information.
The missing piece is that this code uses a library called “python-pbkdf2”,
which defaults to using HMAC-SHA-1.

OK, so what? HMAC-SHA-1 isn’t broken, so what actually went wrong?
Interestingly enough, due to a subtle aspect of PBKDF2’s design, this code
actually hashes the password 400,000 times, of which 200,000 are a complete
waste of time.

This is a problem because programmers set the number of iterations based on how
much CPU time they’re willing to spend on password hashing. Password hashing
forces you to tradeoff performance for security. In practice, there are limits
to how much performance can be sacrificed for security. If the hashing code
wastes time, then the programmer is forced to lower the security level in order
to meet their standard for performance.

The How

But why is the password hashed 400,000 times instead of 200,000? Let’s take a
quick detour to look at how PBKDF2 actually works.

Password-Based Key Derivation Functions (“PBKDFs”) transform passwords into
cryptographic keys. PBKDF2 is a particular PBKDF design from the late ’90s.

You might be thinking that PBKDFs sound unrelated to password hashing. It
turns out that these two problems are mostly the same. A good PBKDF is slow to
prevent brute force attacks, just like a password hash. A good PBKDF needs to
preserve as much of the password’s entropy as possible, just like a password
hash. Etc.

An important feature of PBKDF2 is its ability to produce any size output.
Since the output of PBKDF2 is used as an encryption key, PBKDF2 needs to able
to produce whatever size of key the encryption algorithm is expecting. This
feature doesn’t make much sense for password hashing. With a password hash,
the output just needs to be big enough to be secure – the specific size isn’t
particularly important – so it’s typical to pick some arbitrary number, like
32 bytes.

In order to produce outputs of any size, PBKDF2 does something like this:

output=iteratedHashing(password,salt,iterationCount,1)+iteratedHashing(password,salt,iterationCount,2)+iteratedHashing(password,salt,iterationCount,3)+...# as many as needed

From a performance perspective, this is alarming: each block of output causes
the password to be hashed all over again!

For generating keys, this design is kind of acceptable. For password hashing,
though, it’s a complete waste. Let’s suppose we choose to use four blocks of
output. If an attacker gains access to the hash, they can pick a single block
and crack it individually to find the password. Once they have the password,
they no longer need to crack the other blocks.

If we look back at the hashing code from earlier, we can see that it asks for
32 bytes of output. Since it uses HMAC-SHA-1, which has a 20-byte output,
PBKDF2 will perform the hashing twice in order to produce enough output. The
second block of output adds no security benefit.

The Fix

It’s straightforward to work around this design flaw. Any time PBKDF2 is used
for password hashing, the output length should match the output length of the
HMAC function used: