Storing passwords: why you should flavour your hash with salt

Rider: I am not a cryptographer, so this is an attempt at my understanding of any consensus between genuine cryptographers. If I’ve got anything wrong or you disagree, let me know – or comment.

A cryptographic hash function is widely used to ‘encrypt’ stored passwords. Strictly speaking, it is not encryption: encryption is something you can scramble and then unscramble mathematically – it is two-way. A hash is one-way: it generates a meaningless output (often called a digest) that bears no relation to the input, and cannot mathematically be used to recreate that input. Other characteristics are that it always produces a constant length output regardless of the size of the input, and that no two different inputs will produce the same output.

These characteristics make the cryptographic hash ideal for the secure storage of passwords:

the plaintext user password is neither stored nor can be recovered mathematically from the hash

no two different passwords will create the same hash

the fixed size makes storage simple.

So, when a user account is created, a user password is either created or selected by the user. It is run through the hash function, and a hash is generated and stored with the user name as part of the user account. The plaintext password is not stored and cannot be recovered from the hash.

The next time the user logs on to the account, he or she has to re-present the password. That password is run through the hash function again, and the hash output is compared to the one stored in the user’s account. If they match, access is granted. If they don’t match, access is refused.

If the server is breached and the password database is stolen, no matter, the passwords are securely and irreversibly scrambled. Right? Wrong.

The problem is that attackers have become adept at cracking hashes. It can’t be done mathematically, but it can be done by brute force. In fact any password can be cracked by brute force – brute force simply means that every possible combination of characters and symbols is tried until the correct one is found.

Generally speaking, this is a theoretical possibility rather than a practical reality: even with modern computing power, brute forcing all the possible combinations would take too long to be of any use.

There are many riders to this. The computational power of parallel GPUs or FGAs is improving the time of brute-forcing hashes; so brute forcing just a few hashed passwords is realistic. If the attacker recognises the username of a few high value targets and has the computational power, he could concentrate on brute forcing just those hashes – using, for example, HashCat. Note that Jens Steube, the author of HashCat, has just published a methodology that can reduce the effort needed to brute force SHA1 by 21.1%.

But the attackers have adapted brute force into the so-called ‘dictionary attack’. Here, hashes for millions of the most likely passwords such as names, dates, places, combinations of them, l33t speak variants (h3110_w0R1d), and everyday words in multiple languages are pre-computed and stored in a table or dictionary. Now, instead of having to try every possible output against every possible input, the attacker merely has to take the output and compare it to the dictionary in order to locate the plaintext password. With a good dictionary attack, most passwords can be recovered from the hash in just a few seconds or minutes.

And this, finally, is why you should flavour your hash with salt.

Salt is a unique random string prepended to each plaintext password before the hash is generated and stored. “What the salt does,” Robin Wood, aka security researcher and pentester DigiNinja, told me, “is prevent an attacker generating a pre-computed list of the hashes.” Robin developed the widely used Pipal password analyser – and knows a thing or two about passwords.

The characteristics of the salt should be a reasonable length, and genuine randomness. Any old pseudo-random number generator isn’t good enough – it should be a cryptographically secure random number generator. (If you want to consider the effort that goes into a good random number generator, have a look at the independent report produced for Intel by Cryptography Research Inc on the Intel Ivy Bridge RNG.)

(Not everyone believes the salt needs to be truly random, just truly unique to each password. However, if it isn’t generated randomly, there is a danger that it becomes predictable.)

This salt is then added to the plaintext password and the hash generated by the combination. Ultimately it does nothing to prevent a brute force attack, so in this way doesn’t make the security stronger. But what it does do is defeat the dictionary attack. The dictionaries cannot contain all of the hashed standard passwords plus all of the possible hashes of each of those passwords with a large random number (or salt) included. Provided that the salt for each password is unique, it needs no security in itself – on its own it can tell you nothing about the plaintext password that is included in the stored, combined hash.

Dictionary attacks are one of the reasons for users to choose a long, strong password. If the password database isn’t salted, a long complicated password may avoid precomputation and still defeat the dictionary.

Unique salts also have the added advantage of creating unique hashes for every user, regardless of their plaintext password. In a Pipal analysis of “the list of passwords from the phpBB leak which I grabbed from the SkullSecurity site,” DigiNinja shows that the top two passwords are ‘123456’ and ‘password’ – and these are likely to figure highly in any list of passwords. Without a unique salt, it would be statistically likely that the most frequently occurring hashes have been generated from one or other of these passwords, even without the use of a dictionary. With unique salts, however, every single stored hash will also be unique, and no statistical analysis will be possible. A long and unique salt is thus important to prevent a standard salt being guessed or discovered by statistical analysis. If the attacker knows a standard salt, he could just precompute an additional dictionary with that added salt.

ConclusionsWhat this tells us is that the user must do two things: choose long, strong passwords to defeat dictionary attacks where the website doesn’t use salting; and never to reuse the same password on multiple accounts. Using the same password means that it is only as safe as the weakest account we use; and the simple fact is we do not know which websites are storing our passwords safely. It is also sobering to realise that the strength of the password we choose is meaningless if the website stores it in plaintext, and then gets breached.

For the website, it tells us that hashing passwords is essential; but on its own, hashing will not protect the majority of passwords from a dictionary attack.

The addition of a unique, long and genuinely random string appended to the password before hashing will provide a far greater defence against password cracking, without requiring any additional effort to secure or hide the salt itself.

Salt is useful, but within limits. Salt does render most rainbow tables useless.
But if the attacker has access to the website scripts (“source”, but it’s all source in a script), i.e. the attacker has compromised the whole server, they will be able to find where the passwords are generated and extract the salt, then use that in the brute force attack.

Salt of this nature (for there are other types of cryto salt that aren’t simply appended/prepended to a password) has a mathematical possibility of reducing the strength of the hashing *if* weaknesses already exist in the hash.

Although hashing is seen as “one way”, in practice it is not. It is a lossy operation, meaning you can’t mathematically get back to the cleartext from the cipher. Information is lost on the way.

It is also, usually, a many-to-one mapping, so in theory 2 different passwords will create the same hash, but this is by the by.

As a commenter above notes, hashes aren’t really designed for use on small amounts of text, they were designed to fingerprint a larger amount of text.

In practice one can work backwards from a hash and get clues about the cleartext.

When the cleartext is short one can sometimes get quite a lot of clues.

If a salt is being used we have another clue – there exists an identical block of characters in every password.

All these clues can be used to narrow the tree/branch search during brute force attack and make the attack quicker.

To express my concern in layman’s terms, as someone having worked professionally on crypto and related software design, I would feel uneasy about my password if the data was breached whether or not it was encrypted or salted.

Many feel it’s unrealistic to have a different password for every website, which is recommended.

If you can’t manage it, then try having three different “strengths” of password PLUS a separate password for your email. This is better than having the same for all, and arguably better than having once for each but having a file on your computer with each password listed.

Why is email special? Because most accounts are tied to an email address, so if either of your 3 passwords are compromised, any attacker’s first move would be try and sign in to the email server and use that account to send spam.

Typically group the sites you use into categories:

A password that’s quick and easy to use for throwaway websites where it wouldn’t matter if your account was breached. E.g. chatrooms, newspapers, maybe shopping websites that *DON’T* remember your credit card details.

One that’s reasonably strong for accounts that matter to you, e.g. the major social media and shopping sites that do remember your credit card details. Don’t use a password that even resembles the weaker one and consider using a variant for each site if you can.

And finally a secure password for banking. Maybe try to have a different “flavour” for each and every bank, too, if you can.

Kevin Townsend :
Do you mean key stretching with specialist hash functions like PBKDF2 or bcrypt? But aren’t they basically just hash functions made to run slower? And do they have any impact on whether to salt or not? Is salting not necessary with slow hash functions?

those are examples of the functions i’m referring to, yes.

you can think of them as slow hash functions or you could think of them as functions where the computational work needed to perform them is adjustable, so that the storage can be made stronger at the expense of a little extra time. the earliest example i can recall hearing about is if you were to use a traditional cryptographic hash iteratively 100000 times. it doesn’t significantly increase the time it takes for the single verification operation you’d perform when authenticating a login credential, but if someone wanted to crack the hash what might have normally taken a day would be extended to hundreds of years.

of course a salt still helps – if you choose the same password as 1000 other people then without a salt your password hash will be the same as that for those 1000 other people. a salt helps prevent a single cracked password hash from affecting multiple accounts. in that sense the salt helps make up for bad practices by users.

i think there may be some confusion here. while salt does help, cryptographic hashes are in fact NOT ideally suited to password storage for the simple fact that they were designed to be fast. that speed aids cracking more than rainbow tables do (rainbow tables aren’t actually used much in practice from what i’ve gathered). that’s why there exists a special kind of hash just for passwords where the work required to generate a hash value is several orders of magnitude larger. this has a negligible effect on password verification (it doesn’t matter if the time taken to verify a password when someone is logging in is 1 second rather than 1 nanosecond) while having a huge impact on generating values to test against a stolen password database.

Do you mean key stretching with specialist hash functions like PBKDF2 or bcrypt? But aren’t they basically just hash functions made to run slower? And do they have any impact on whether to salt or not? Is salting not necessary with slow hash functions?