Securely storing and verifying passwords

First published: March 11, 2008
Last updated: April 12, 2012

Introduction

The task of storing and verifying passwords arises in just about any piece of
software that has any level of security incorporated into it. This task is not
quite as straightforward as an uninitiated programmer might think. Certainly,
the most obvious solution to the problem is not at all close to secure. This
document briefly discusses some standard or common approaches to password
storage and verification. This document or one which covers the same issues
should be considered required reading for anybody implementing a password
storage and verification scheme without prior experience.

The Naive Approach

What I've decided to call a "naive password storage scheme" is probably
exactly what you think it is. It's the simplest possible password storage
scheme that could work in practice. It's also the least secure scheme possible.
Despite this, it is still sometimes used occasionally by people who don't know
any better.

Naive password storage

We keep pairs of usernames and passwords stored in a database, plain and
simple. When a user authenticates themselves to our application, we look to
see if the username and submitted password occur together in our database.

That is, the verification procedure is very simply:

Take the offered password,

Look up the stored password,

Compare the offered password from step 1 to the stored password from step
2 and verify the password if and only if these match.

Throughout this document, example password databases will be presented for each
storage scheme discussed. The usernames and corresponding passwords will
always be the same as those above.

Why do it like this?

The only reason to implement this password storage system is because it is
obvious, easy and you didn't know any better! It should never be used in
practice, under any circumstances, for reasons we shall soon see.

Problems with the Naive Approach

What if somebody gets read access through the database by exploiting a web
server vulnerability? What if a disgruntled system administrator sells the
database to the local organised crime syndicate? What if someone physically
steals a database server? What if an old backup tape containing the database is
thrown out and found by a dumpster diver?

As soon as any one of these things happens, whoever illegitimately comes into
possession of your database knows every single one of your user's passwords.
They can log in as any one of your users and wreak havoc with their account.

It's easy to take the attitude of "none of those things would ever happen to
me!". The truth is that these things occasionally do happen. In December of
2006, the popular (and excellent) social bookmarking site
reddit had a backup of their database stolen. They
had stored their user's passwords as plain text in that database - they were
using a naive password storage scheme. The database thief subsequently knew
every Reddit user's password.

Occasionally, things go horribly wrong. There is nothing that can be done to
prevent this. But you can take steps to make sure that when things go horribly
wrong, the consequences are not as bad as they have to be. Never, ever use a
naive password scheme.

Hashed Passwords

The first step up the security ladder from naive password storage is something
called a hashed password storage scheme.

Hash functions

The central concept for this section is that of a hash function. A
sufficient definition of a hash function for our purposes is this:

A hash function is a function which:

Takes as input an arbitrarily long sequence of bits,

Produces as output a fixed length sequence of bits, and

has the following properties:

Making even a small change to the input (e.g. flipping a single bit) will
cause a substantial change in the output (e.g. will flip at least half the bits)

Given an output of the hash function, finding an input which maps to it is
very, very hard.

Finding two different inputs which map to the same output is very, very hard.

Lots of different hash functions have been designed over the years. By far the
most common are MD5 and the
SHA family of hash
functions. Hash functions have a variety of uses in computer science,
including assuring the integrity of data and as a vital component of digital
signatures. The design of new, more secure hash functions has recently become
a hot topic in cryptography. You can read more about hash functions on
Wikipedia.

In an attempt to make the concept clear, here are some example strings and
their corresponding hashes (the output of a hash function given a certain input
is usually called the hash of that input) using the MD5 hash algorithm:

String

Hash

Hello, world!

746308829575e17c3331bbcb00c0898bM

The quick brown fox jumps over the lazy dog

37c4b87edffc5d198ff5a185cee7ee09

the quick brown fox jumps over the lazy dog

1e280e1713df124d35709cf6138d9f91

Observing the second and third strings, notice that making a simple change to
the "Quick brown fox..." string (changing the first letter from upper to lower
case) has resulted in the hash changing entirely. This is the first property
in our list of properties above.

Hashed password storage

In a hashed password storage scheme, rather than storing pairs of usernames and
plaint's passwords, we store usernames and the hashes of plain text passwords.
When a user authenticates themselves to our system, we take the password they
have provided, compute its hash and compare the result.

In other words, the verification procedure is very simply:

Take the offered password,

Compute the hash of the offered password,

Look up the stored hash of the user's password

Compare the computed hash from step 2 with the stored hash from step 3,
verifying the password if and only if these match.

Why do it like this?

Consider the problem we found with the naive password storage scheme: as soon
as the password database was compromised in any one of several possible ways,
every user's password is immediately compromised. This problem does not occur
in a hashed password storage scheme. If the database is compromised, the
attacker is in possession of the hash of each user's password, not the
passwords themselves. By virtue of property 2 in our list of hash function
properties, retrieving the actual password from this hash is very, very hard.
For an ideally secure hash function, the attacker's only option is continually
compute the hash of random guessed passwords until a matching hash is found.
This sort of exhaustive searching - or "brute force" attack - is supposed to
be outside the realm of feasibility. Whether it is or not can be questionable,
as we shall see later, but it certainly provides more security than the naive
scheme.

Problems with Hashed Password Storage

Hashed password storage is a definite step up from naive password storage, but
it is not without problems itself. Some problems are discussed in this section.
In particular, the first two problems we discuss are security vulnerabilities
which are sufficient reason not to use a plain hashed password storage scheme in
production.

Vulnerability to "Rainbow tables"

The theoretical ideal which gives hashed password storage an advantage is that
if an attacker only has a hash of a user's password, the only way to recover the
actual password is to repeatedly hash random guess after random guess until a
match is found. This is supposed to be too impractical to ever succeed.

The unfortunate reality is that the vast majority of users choose passwords
which make the search for a matching hash substantially easier. A large
proportion of passwords are short (8 characters or under), contain only
lowercase letters (no punctuation or digits) and are either ordinary words one
would find in a dictionary for the user's native language or common names in
that language. Thus, rather than having to hash every possible password, an
attacker needs only to hash a relatively small subset of all possible passwords
before having a good chance of finding at least once match.

Large tables which contain the hashes of a range of likely passwords are called
"rainbow tables" (some purists will insist that this usage of the term rainbow
table is a simplification of the actual meaning of the term. They're right,
and you can see Wikipedia for the
full details, but this approximation of the phrase is sufficient for our
purposes), and are a traded commodity amongst those in the business of
breaking security systems. Reportedly, large rainbow tables can be bought on
CDs or DVDs at the right kinds of conventions. Certainly, online rainbow table
services and projects are around - see, e.g.,
www.freerainbowtables.com,
The Shmoo Group or
Project Rainbow Crack.

So, while hashing passwords certainly increases the difficulty of breaking a
password, it is insufficient by itself to stop a determined and resourceful
attacker. This unfortunate situation can be avoided by "salting" passwords,
as discussed in the next section.

Identifiability of Identical Passwords

One less serious problem with hashed password storage is that if two different
users have the same password, then the same hashes will be present in the
database. If the database is stolen by someone who has an account themselves
(or the ability to get on-demand access to the database is obtained by somebody
who can subsequently create an account), then they may find the hash that
corresponds to their (known) password simply by searching for their username.
They can then search for other usernames which have the same hash, and log in
to any such accounts using their own password.

Complication of forgotten password recovery

Something else worth noting about hashed password storage schemes is that they
somewhat complicate the task of dealing with forgotten passwords. If a user
forgets their password for some application which uses hashed password storage,
the administrator of that application is powerless to recover that password -
all they or anybody else has access to is the hash.

Thus, using a hashed password storage scheme means you need to implement a
more sophisticated password recovery scheme. Of course, the scheme doesn't
really need to be complicated. A common solution is to generate a random
replacement password (storing its hash!) and then email that password to the
user. The user can log in with the random password once and then change it
to one of their choosing.

Other considerations

It is worth noting that all hash functions which will be available as existing
libraries in common programming languages will have been designed for speed.
This is because hash functions are often used to ensure data integrity: if you
download a large file (say an .iso), you can make sure that the download was not
corrupted by computing the hash of what you downloaded and comparing it to the
hash of a known good copy, published on the download website. In this sort of
application, one wants to compute the hash relatively quickly.

When using hash functions to secure passwords, however, speed is not desirable
at all. The faster a hash function is, the faster an attacker can compute
rainbow tables. Furthermore, a slow hash function is not really a problem in
this application. Slowing a hash function down by a factor of ten might mean
that logging in takes 0.1 seconds instead of 0.01 seconds for a legitimate user
who knows the correct password (no big deal), but that brute forcing a single
password takes an attacker 10 years instead of 1 year (certainly a big deal for
the attacker). In light of this, some applications iterate a standard hash
function like MD5 a few times over and store the result of that - i.e., they
store the hash of the hash of the hash of the...etc., say 10 levels deep.

Salting Passwords

The final password storage scheme we consider in this document might be called
a salted and hashed password storage scheme. This is more secure than both
naive password storage and hashed password storage. It is not a new idea -
Unix operating systems have been storing user passwords on the local hard drive
in exactly this way for literally decades. However, astonishingly few
programmers who implement password storage systems are aware of this sort of
scheme. Even those who are occasionally don't understand all of its details.

If you are responsible for developing a password storage system from scratch,
you should use the scheme described in this section. Anything less is taking
an unnecessary risk, and should things go wrong your efforts will make you look
negligent at best.

Hashed and Salted Password Storage

The idea behind salting passwords is that, instead of hashing the user's
password verbatim, we add a small quantity of random "salt" to the end of the
password and then compute and store the hash of this combination. We also store
the salt by itself. When a user attempts to authenticate themselves using their
password, we append the stored salt to the offered password and compare the
resulting hash against that in the database.

The password verification process is:

Take the offered password,

Look up the username's corresponding salt string,

Append the salt string to the offered password and compute the hash of the
result,

Look up the stored hash of the user's salted password,

Compare the hash computed in step 3 with the stored hash from step 4 and
verify the password if and only if these match.
The password database looks something like this:

Why do it like this?

The advantage to salting is that it defeats attackers using pre-computed tables
of the hashes of easy passwords. Even if users choose passwords like "dog" or
"cat", hashes stored in the database are actually of strings like
"catbHnYP9WFpjqGV3SX". Thus, instead of requiring a pre-computed table of all
passwords of length 8 or less using only lower case characters in order to
recover passwords, an attacker requires a pre-computed table of all passwords
of length at least 16 using lower and upper case letters, together with digits
and possibly even punctuation. The longer the salt and the larger the range of
characters within it, the more effort pre-computed tables take to generate and
thus the more secure the scheme.

Other considerations

One salt for the whole table, or one per user?

One sometimes sees explanations of a hashed and slated password scheme in which
the same random salt is used for the entire database of passwords, instead of
there being a unique random salt per user as I've done here.

While using the same salt for each user will save storage space, it does add
less security. If the database is compromised, an attacker may set about
constructing a rainbow table for the database - this will take as much time as
constructing a rainbow table for an unsalted database of hashed passwords.

A space-time trade-off

A possible variation on randomly generating a salt for each user and storing
that salt in the database is to use the hash of the user's username as the
salt. This saves some storage space, in that there does not need to be a salt
column in the password database, but it makes password checking take slightly
longer as one extra hash needs to be computed. Whether or not this is a good
trade depends on your particular hardware situation. This trick does not
affect the security of the overall scheme in any way.

Doesn't the salt need to be secret?

There seems to be some confusion as to whether or not the random salt(s) need(s)
to be kept secret. The answer is quite definitively that they do not.
Anybody who insists that they must be (and there are plenty of these sorts of
people on the internet) has entirely misunderstood the purpose of salting. In
fact, they haven't really thought things through at all - the salt must be
readable by the password verifying program in order for the scheme to work at
all. So you can put your salts in your database with no obfuscation at all
without affecting the security of the overall scheme in any way.

Secure transmission of passwords

An ever present truth in information security is that any system (be it for
password based authentication or something entirely different) is only as
strong as its weakest link. The relevance of this truth here is that no matter
how securely your passwords are stored in the database, the system is
vulnerable if users submit their passwords for authentication over a public
network - such as the internet - without some sort of preventative measure.

If you're writing a web application where the users authenticate themselves by
POSTing a filled out HTML form (by far the most common way this is done),
unless you have good reason not to you should do the post over a HTTPS
connection, rather than a plain HTTP connection. Otherwise, the password
travels from the end user's personal computer to your web server unencrypted.
Anybody in between the two computers - and that includes, but is not limited to,
people working for your user's ISP and your own ISP - can recover that password
with minimal effort.

Passwords should always be transmitted to web applications over HTTPS. This
includes not just the main log in form, but also pages which handle
functionality like changing passwords.