Hashing passwords the Right Way

Simon Sapin,
2011-12-05

Short answer: I chose PBKDF2, but bcrypt and scrypt are good too.
In any case, use a constant-time comparison to avoid timing attacks.
Find the code below.

So you’re writing some application where users log in with passwords.
As you already know (or so I hope!), you should not store passwords in plain text.
There is plenty already on the why, so I’ll focus on the how.

The precautions I’m taking here may seem overkill, but they’re really easy
to implement so there is no reason not to.

The scenario we want to protect from is an attacker getting access to the
database. Of course you should try to prevent that in the first place,
but it has happened before.
Traditional, reversible cryptography does not help: if the database was stolen,
the key may have been stolen too. Thus, we need a one-way hash.

SHA-256 is not known to be broken at the time of this writing, while
MD5 and
SHA-1 are.

The basic idea is to store hash = sha256(password) instead of the password
itself. However this method vulnerable to Rainbow Table attacks where a hash
is compared against a big table of pre-computed hashes. Creating such a table
is expensive (takes time), but after that it can be used for all entries in the
database. A salt protects against this:
hash = salt + sha256(salt + password). Salts should be 8 bytes or longer,
unique and random.

Salting is good but it does not protect
against dictionary and brute force attacks. This is because SHA-256 and similar
hashing functions are designed to be fast.

To make it slower we could just apply SHA-256 to its own result many times.
This number of application is the work factor. When computers become faster
next year we can increase the work factor to keep up with Moore’s law.
However, the first rule of designing a crypto-system is: Don’t.

PBKDF2,
bcrypt
and scrypt
are all reputable hashing functions with a work factor. See this answer
to compare them. I chose PBKDF2 since it is easiest to implement in Python
and I judged it good enough based on what I read about it.
There are several implementations of PBKDF2 for Python out there. I picked
Armin Ronacher’s for its
simplicity (15 lines of actual code).

One more thing where we need to be careful is timing attacks. If we use the == operator
to check a hash, it will compare byte-by-byte and return on the first
inequality. This timing leaks information to the attacker who could
guess the correct hash without even getting access to the database.
This attack is more plausible
than one might first think. But even if it is not, a constant-time comparison
is easy (See the the code below.) so there is no reason not to do it.

Finally, it is useful to store the algorithm parameters such as the work
factor next to the hash and its salt. That way, if we later want to change
these parameters we don’t need to throw away old hashes and reset everyone’s
password. Old hashes can stay valid and be re-created on next log in.

We need some code to tie all this together and it’s pretty generic.
The whole point of this article is actually to provide it.
This is the code I use in production:

importhashlibfromosimporturandomfrombase64importb64encode,b64decodefromitertoolsimportizip# From https://github.com/mitsuhiko/python-pbkdf2frompbkdf2importpbkdf2_bin# Parameters to PBKDF2. Only affect new passwords.SALT_LENGTH=12KEY_LENGTH=24HASH_FUNCTION='sha256'# Must be in hashlib.# Linear to the hashing time. Adjust to be high but take a reasonable# amount of time on your server. Measure with:# python -m timeit -s 'import passwords as p' 'p.make_hash("something")'COST_FACTOR=10000defmake_hash(password):"""Generate a random salt and return a new hash for the password."""ifisinstance(password,unicode):password=password.encode('utf-8')salt=b64encode(urandom(SALT_LENGTH))return'PBKDF2${}${}${}${}'.format(HASH_FUNCTION,COST_FACTOR,salt,b64encode(pbkdf2_bin(password,salt,COST_FACTOR,KEY_LENGTH,getattr(hashlib,HASH_FUNCTION))))defcheck_hash(password,hash_):"""Check a password against an existing hash."""ifisinstance(password,unicode):password=password.encode('utf-8')algorithm,hash_function,cost_factor,salt,hash_a=hash_.split('$')assertalgorithm=='PBKDF2'hash_a=b64decode(hash_a)hash_b=pbkdf2_bin(password,salt,int(cost_factor),len(hash_a),getattr(hashlib,hash_function))assertlen(hash_a)==len(hash_b)# we requested this from pbkdf2_bin()# Same as "return hash_a == hash_b" but takes a constant time.# See http://carlos.bueno.org/2011/10/timing.htmldiff=0forchar_a,char_binizip(hash_a,hash_b):diff|=ord(char_a)^ord(char_b)returndiff==0

As usual, the code is BSD licensed (as is Armin’s PBKDF2 implementation) and
available by itself.