Suppose Alice is using a password prompt that only accepts up to 32 characters for any particular password.

Memorization of long strings of random characters is not one of Alice's strengths, so she opts instead for a memorable passphrase consisting of a permutation of 7 of the 2000 most common English words. Unfortunately this passphrase exceeds the maximum password length allowed by the prompt (i.e. the passphrase is easily longer than 32 characters).

This is a common problem faced by the average user, who often is required to enter passwords to sites or server with maximum password length limits. What I wonder is whether this solution is relatively secure:

While I think the broken collision-resistance is not a problem here, you should use a more modern hash function (SHA-2, for example, maybe with an output transformation to make it fit) instead.
–
Paŭlo Ebermann♦May 14 '13 at 15:19

2

Also, your math is wrong ... an MD5 hash is 16 bytes, not 16 "alphanumeric characters". That would be 2^128 = 256^16. Though likely the output of MD5 is not a valid password, and this doesn't strengthen the password at all (since the attacker just has to iterate the passphrase and hash it).
–
Paŭlo Ebermann♦May 14 '13 at 15:25

Oops, you're right I misread the information about the length of an MD5 hash. Also, what would be the difference between using more modern hash function such as SHA-2 over the older MD5, if there is no security issue with MD5?
–
Vilhelm GrayMay 14 '13 at 15:33

2

We don't know if MD5 is broken for the problem at hand (which is preimage resistance). But we know it is broken for collision resistance, and attacks only ever get better.
–
Paŭlo Ebermann♦May 14 '13 at 17:46

1 Answer
1

Password strength is typically measured in bits of entropy, or in layman's terms, the amount of "true randomness" in the system. This is measured by the process of how the password is generated rather than by the number of bits in the output. It's a simple extension of Kerckhoff's principle: assume your attacker knows your process, and the only information unknown to him/her is the secret key.

You are correct that the number of permutations of the passphrase is $2000^7$. This is about equal to $2^{77}$, so a password generated with this approach has about 77 bits of entropy, which is a very good password strength. However, passing the output through a digest function like MD5 doesn't actually increase the number of possible permutations. MD5 has $2^{128}$ possible outputs, but we're only actually using $2^{77}$ of them. Put another way, assuming an attacker knows our process, they don't have to guess all possible MD5 outputs; they only have to try ones that correspond to a hash of 7 of the 2,000 most common English words. And that's comparatively much easier to do.

In practice, however, 77 bits of entropy is still quite good for a password. MD5 doesn't add any strength in and of itself but it doesn't necessarily remove any either. As an "encoding" (assuming you're using a 32-character hex representation) to stuff seven random words into a field restricted to 32 characters (and frequently to only alphanumeric values), it appears to me to be practical.

One slight improvement would be to use Base32 encoding for the binary MD5 output instead of hexadecimal. This would reduce the length of the output to 26 characters (by allowing for uppercase letters as well) and could allow you to use this scheme for password fields limited to fewer than 32 characters.

Edit: Others have expressed concerns about weaknesses in MD5. While I also share a general distaste for MD5, I believe this use is perfectly acceptable. It is not being used for its preimage resistance here, because the attacker has no need to determine the original seven words that were used; having the hash itself is good enough. Given that both the chosen words and the MD5 hash itself are considered secret, I don't believe there is cause for concern.

In practice, there's another problem in that different sites have incompatible password complexity schemes. Some require upper and lower case, some require one or more special characters, while some prohibit those same special characters.
–
John DetersMay 14 '13 at 17:57

While I'm aware MD5 is cryptographically broken (which is why I specifically used it in the example), I'm not familiar with actual exploitation techniques. Is the fact the hash is kept a secret the reason why a collision attack is impossible here?
–
Vilhelm GrayMay 14 '13 at 18:43

Also, in the case of a brute-force attack -- where a random guess is hashed then attempted as the password -- wouldn't the possibility of a collision drop the entropy, or does it only drop for passphrases with entropy greater than MD5's 2^128?
–
Vilhelm GrayMay 14 '13 at 18:48

After reading a bit more, I realized I was confusing a collision attack with a second-preimage attack. Regarding my second question, I'm still confused about whether there is an inherent drop to entropy due to an unintentional collision.
–
Vilhelm GrayMay 14 '13 at 18:59

@VilhelmGray It might be a really slight drop, like from 77 bits to 76.5 or such. (I didn't do the math, though, and welcome answers which do it.)
–
Paŭlo Ebermann♦May 14 '13 at 19:57