I understand it is important to hash passwords over multiple iterations to make things harder for an attacker. I have read numerous times that when processing these iterations, it is critical to hash not only the result of the previous hashing, but also append the original salt each time.

Based on Paülo's explanation it seems that adding the original password is far better so that if a different password produces the same result in an early iteration, they would likely produce a different result next iteration. Thus, if you iterated using hash = sha512(password + hash) then the only way another password would collide is if on the last iteration, the result was equal. Seems the likelihood of collision would now be identical to doing one iteration of sha512(salt + password). Still, is it still better to iterate to make it harder on an attacker due to computational complexity?
–
jcoopJan 9 '14 at 0:04

1

@StephenTouset Why does everyone rant about "this is the best algorithm," "this is how you need to create the salt," "this is how you need to append the salt," and in your case "don't do it yourself"????? I'm only curious about the math and theory here, not what you think I should or shouldn't be doing. If you have anything useful to share in that regard, I will listen.
–
jcoopJan 9 '14 at 0:22

1

Because so many people don't heed that advice, roll their own, get it wrong, and their users suffer the inevitable consequences. If you are only asking for educational purposes, that is perfectly fine and to be encouraged. On the other hand, if you are actually attempting the above for production code, it's the only opportunity to convince you to divert course before making a costly mistake (both to your users and potentially yourself as well).
–
Stephen TousetJan 9 '14 at 1:09

1

There is essentially no security difference between your two suggestions (or PBKDF2) as long as the hash function is good and has a large enough output size (SHA-512 fulfills both requirements). One ugly property is that you don't cleanly separate salt and password. And of course it can't compete with the current generation of memory hard password hashes like scrypt.
–
CodesInChaosJan 9 '14 at 10:11

2 Answers
2

Edit: I missed a detail in the original question when writing the below. I compared the effects of including the salt in each iteration to including the password, but the original question asked about including the salt versus only iterating on the previous hash output. Mea culpa. My link to Thomas Pornin's answer to a related question contains an explanation for a roughly equivalent measure in PBKDF2, where the purpose is to defeat short cycles.

(full disclosure: I am one of the authors submitting an entry to a competition for helping determine an industry-standard password hashing algorithm, so I am reasonably familiar with, although certainly not an expert on, the current state of the art; the following is my opinion which follows from my familiarity, but I cannot and do not make any concrete claims to its correctness)

It is my suspicion that including the salt in the algorithm you have provided (as opposed to re-including the password) does not meaningfully affect any of its security properties.

Of the two algorithms I mentioned in my above comment that share a similar iterated-hashing approach (PBKDF2 and scrypt, which have both stood the test of some cryptanalysis), neither of them includes the salt of the password in their hashing iterations. My own entry to the password hashing competition (which to be fair, has received essentially zero cryptanalysis) likewise does not include the salt in its iterated hashing. Of the other proposals on the competition mailing list, I'm not aware of any others that do either (also to be fair, I have not extensively scrutinized all of them).

The purpose of a "slow" password hashing algorithm is to increase the asymptotic cost of attempting to verify a password guess. Chiefly this cost can be increased by requiring the attacker to expend more CPU time or contain more data in memory. Other recent proposals have also brought into consideration approaches such as requiring an attacker to have large amounts of memory bandwidth.

Of these, I can think of no way in which including the salt for each of those iterations meaningfully impacts the amount of resources an attacker would need to expend in order to test the validity of a password.

That said, the algorithm shown is not a particularly good way to protect users' passwords. It is certinaly better than an individual hash, but there are critical weaknesses with the approach that existing algorithms already mitigate. The two most obvious to me are:

SHA-512 is much faster in dedicated hardware than in software (giving a attackers with GPU acceleration a massive advantage over typical defenders), and

this algorithm requires a constant, negligible amount of memory, which allows it to be massively parallelized (again giving GPU-accelerated attackers a massive advantage)

These are not the only weaknesses. There are others (some of which are likely more theoretical than practical, like avoiding hypothetical short cyclic sequences), but those are two that I simply considered relatively straightforward to identify and explain.

So I understand why including the salt would not do anything beneficial but why wouldn't including the original password be beneficial? Seems that as I described above in my comments, this would ensure the only chance of collision is on the final iteration... You said most algorithms wouldn't include the salt or password in their hash iterations. What do they do instead? And why would would it be superior to using the password?
–
jcoopJan 9 '14 at 18:32

Given that SHA-512 is used, there is no practical benefit to iterating hash = sha512(salt + hash) compared to iterating just hash = sha512(hash). For some parameters, it even weakens the scheme by a factor of nearly 2 against the attack that most matters: guessing the password.

Let's first justify the weakening. Assume salt is 125 bytes. salt + hash is 1512 bits long, and two SHA-512 rounds (each hashing 1024 bits) will be used by the legitimate user to compute hash = sha512(salt + hash), rather than one round for hash = sha512(hash). On the other hand, the adversary trying millions of passwords can pre-compute the result of one SHA-512 round for each of the $2^{24}$ 1024-bit strings starting in salt, then replace the first round of each computation of hash = sha512(salt + hash) by a table lookup in that 1 MiB table. The ill-advised idea of hashing salt prevents legitimate users from about doubling the number of iterations at constant effort/time, when it is only a marginal annoyance to attackers.

In theory, when using some unspecified hash function H, it is a reasonable idea to perform something like

A rationale behind hashing i is that it makes it highly implausible that a short cycle could be reached in the iteration, regardless of considerations on the width of hash. Without this precaution, if hash is $b$-bit, odd of entering a cycle on or before $n$ iterations are about $n\cdot(n+1)/2^{b+1}$ (when $n\ll2^{b/2}$).

A rationale behind hashing salt or/and password or/and i is that it makes it impossible for an adversary to perform a precomputation that would be conceivable for iterated hash = H(hash) if the adversary can hope to ever perform next to $2^b$ hashes and store next to $2^b/n$ values. Assuming that, an attacker could pre-compute the result of $n/2$ iterations for values of hash less than $2^{b+1}/n$. During the normal computation for a given salt + password, there is good chance that such low values of hash is reached, and the precomputed table usable as a speedup. Edit: it seems quite likely that the number of hashes or/and the memory necessary can be greatly improved, perhaps even to $O(2^{b/2})$ hashes or $O(2^{b/2}/n)$ values, though I can't figure how for now.

The above two things could be an issue when $b=128$ (e.g. when using MD5); but are a complete non-issue when $b=512$ (our situation since SHA-512 is used).

A rationale behind putting hash first is that it avoids the pitfall we first studied, when H is an iterated hash function starting with the beginning of the message, as most practical hash functions are.

Update following comment: a rationale behind not iterating hash = H(hash + password) (but rather including salt and i in the mix), when the width $b$ of the hash is small, and the number of iterations $n$ high, could be that in the former case an adversary might get some advantage by a strategy where for each plausible password she performs some precomputation on common values of password giving it some sizable advantage in recognizing password from the final value of hash, especially if many pairs (salt, full final hash) are available.

As an example of such a strategy, assume that $b=64$ and $n=2^{30}$. For any given password, a sizable fraction of salt values are such that a cycle is entered during the legitimate computation, and a powerful adversary can tabulate, independently of salt, a sizable fraction of the lower values of hash reached in such cycles. Then, for each final value of hash at hand, and each password, the adversary can iterate perhaps $2^{26}$ times, test if any of the (smaller) values reached is in the precomputed table, and in the affirmative make a full test for this password. Odds of recognizing a (salt,password) pair for a given effort are improved compared to pure brute force (at least when the number of password tested is such that a few are among the (salt,password) pairs, and neglecting the cost of the precomputation and table lookups; and I guess, with some refined strategies, even accounting for these costs).

Again, using $b=512$ is plenty appropriate to make iterating hash = H(hash) entirely satisfactory (assuming an attack using classical computers). It even seems possible to make a formal reduction from any attack against that to an attack on the hash (to keep the proof simple, it might help to prevent salt + password from having the same size as hash, e.g. by using a 65-byte salt).

And notice that there is at least one excellent reason not to include password in the mix: making it less likely that password could leak by some side channel on a legitimate user's platform, perhaps by a mechanism remotely similar to this.

Final note: the state of the art is not iterating a hash, but rather iterating a function requiring a large (and preferably parameterizable) amount of memory, and as an aside such that the efficiency on the legitimate platforms using commodity multi-core CPUs is as good as possible, which makes dedicated hardware less attractive for the attacker; see scrypt, or bcrypt (still more common although it lacks parameterizable memory size, use of multiple cores, and seems less close to optimality on commodity CPUs).