$H(s,r)=d$ is a function that hashes the secret string $s$ with a salt $r$, and returns a digest $d$.
$r$ may be arbitrarily chosen and each $r$ returns a different $d$.
For any $d$, $r$ is known from a function $R(d)=r$.

Both $H$ and $R$ are not reversible, i.e. there isn't a neither a reverse function $H'(d,r)=s$ nor a reverse $R'(r)=d$.
To prove that d is a valid hash for the string $s$, it's given a function $V(d,s) ↦ \{0,1\}$ that outputs $1$ only if $H(s,r)=d$.

Is it possible to compute another valid hash $d'$ for an unknown $s$, knowing only $d$ (and therefore the salt $r$)?
That is, is it possible to find a function $G(d,r')=d'$ such that $V(d',s)=1$ for at least some $r'≠ r$ matching $d'$?

This from a theoretical point of view.

On the practical side, BCrypt is an implementation of $H$. If a know a valid digest for an unknown secret, can I compute another valid digest? I've heard about a technique called "hash extension attack", but this involves changing the length of the digest, that for BCrypt is fixed. Is BCrypt vulnerable from this point of view? For which (if not all or none) cryptohashing algorithms is it possible to have a function like G?

$\begingroup$Generally once we start talking about crypto with formal logic notations, I punt that over to our sister site that focuses on crypto.$\endgroup$
– Jeff FerlandMay 1 '13 at 15:06

1

$\begingroup$@fgrieu I was assuming (wrongly?) that the salt r can be always be determined by the digest d. For BCrypt implementations known to me this is true, since the digest a string composed by the concatenation of salt and the checksum computed from the secret and the salt. Also, since d=H(s,r), it can be proven that there is a V'(s,r,d)=V(d,s). So if V'(s,r,d)=1 only if H(s,r)=d, also V(d,s)=1 only if H(s,r)=d. I will edit the question if this is confusing.$\endgroup$
– Claudio FloreaniMay 1 '13 at 18:45

$\begingroup$I incorporated the above changes in the question. Also: the restriction of having the same s for d and d' does not match the usual threat model, where an adversary can choose any password she wants; in which case, if she is in a position to inject some d' of her liking, she can simply compute d'=H(s',r') for an s' of her choice, and whatever r' (possibly r) she can benefit from.$\endgroup$
– fgrieuMay 1 '13 at 21:03

4 Answers
4

In the first part of your question, you appear to be describing a password hashing scheme. A common (or, at least, commonly recommended) way to construct such schemes is based on a message authentication code (MAC).

Specifically, let $\operatorname{MAC}_K(m)$ be a message authentication code with key $K$ and message $m$, and let $H(s,r) = d = (r, c)$, where $c = \operatorname{MAC}_s(r)$ and the tuple $(r, c)$ is encoded as a string in some non-ambiguous way.

The standard security property demanded of MACs is resistance to existential forgery under chosen-plaintext attacks: that is, even if an attacker is allowed to request $\operatorname{MAC}_K(m)$ for arbitrarily many messages $m$ (up to some reasonable bounds of computational feasibility) of their own choosing, they must not be able to forge a valid MAC (with a probability significantly higher than just by random guessing) for any message whose MAC they haven't requested from the holder of the secret key $K$.

It's easy to see that, with $H$ defined as above, this security property implies that an attacker cannot produce new values $r'$ and $d'$, such that $H(s,r') = d'$, without knowing $s$: to do so, they would have to produce a $c' = \operatorname{MAC}_s(r')$ for some $r'$ whose MAC they don't already know, thus violating the assumption that the MAC is secure against existential forgery.

Generally, we cannot prove that any given MAC is secure in this sense (just as we generally cannot prove the security of any cryptographic primitive, except for some trivial ones like the one-time pad), but there are plenty of MAC functions that have withstood considerable cryptanalytic attention and are generally believed to be secure. Also, what we can, in some cases, do is reduce the assumption about the security of the MAC to an assumption about the security of some other cryptographic primitive, just as we reduced the security of $H$ above to that of the MAC used to construct it.

For example, HMAC, instantiated with a secure hash function, can be proven to be a secure MAC (and, in fact, a PRF), provided that the hash function satisfies some technical assumptions. Of more relevance to password hashing, the PBKDF2 key derivation function can be shown to be resistant to existential forgery, provided that it is instantiated with a secure PRF (such as HMAC). Also, since the scrypt key derivation function uses PBKDF2-HMAC-SHA256 for its initial and final passes, I believe it can also be shown to be existentially unforgeable as long as the SHA-256 hash function satisfies the security assumptions of HMAC.

As for bcrypt, of course its security also cannot be proved, and AFAIK it cannot be directly reduced to that of any other primitive either (except trivially to EksBlowfish, of course). However, neither (AFAIK) has anyone disproved the security claims asserted in the original bcrypt paper. These claims don't directly mention existential unforgeability, but as far as I can tell at a glance, the "$\epsilon$-security" property asserted in the paper does effectively imply it.

Let's get terminology right. If you talk of "unknown s" then s is not a salt; when some piece of data is secret, we call it a key. And your "hash function" is then a MAC. In the context of "password hashing", such things are sometimes called "peppering" (as always, technical terminology is, at its core, a collection of bad puns).

If your MAC is correct (i.e. "cryptographically secure"), then no, it is not possible to produce new pairs message+hash value without knowing the key; being able to do that would be called a forgery. The length extension attack is a forgery attack on some poorly-designed MAC algorithms, which work by simply hashing the concatenation of the key and the message, with a Merkle-Damgård hash function like SHA-256. Don't take it wrong: the "length extension attack" does not mean that SHA-256 is cryptographically weak as a hash function; it just means that making a MAC out of a hash function is not as immediate as it initially seems to be. The proper way to turn a hash function into a MAC is HMAC.

Bcrypt is not a MAC, and, arguably, not a hash function either, since it does not accept as input a sequence of bits of arbitrary length. The salt in bcrypt is not secret at all; it is often encoded in the bcrypt output; moreover, if you are using the salt as a secret for some passwords, then you are reusing the same salt value for several hashed passwords, and that's very wrong. The whole point of the salt is to be distinct for every instance (every user has his salt; when the user changes his password, a new salt must be generated).

Edit: it has been brought to my attention that I misread the question somehow: the salt is known to be public. The question being: can we transform a hash for some secret password with some salt value, into another hash for the same secret password, but with another salt value ?

If such a thing is feasible and the new salt value can be arbitrarily chosen, then this would be a severe flaw in the hashing function, because it would allow the attacker to transform a bunch of hashed passwords into another bunch of hashes of the same passwords but all with the same salt -- thereby nullifying the advantage of the salt: parallel and precomputed attacks are again possible. Even if the attacker could just turn the salt into another "random" salt, then it could be used to make salt collisions more probable. No such flaw is currently known for bcrypt.

$\begingroup$"If you talk of "unknown s" then s is not a salt" Yes, that's what the post says; r is the salt and s is the message.$\endgroup$
– raybrittonMay 1 '13 at 13:55

$\begingroup$Let's get terminology right. If you talk of "unknown s" then s is not a salt Yes, infact I said that s is the secret, not the salt. The salt is r. BCrypt takes s and r and outputs a digest d (where r is a substring of d). The point is not if r is secret (obviously is not) or if I'm using it as a secret. r is there so that different r produce different digest for the same secret. The question is: if someone knows a valid digest d (and therefore he knows also the corrisponding salt r, at least for BCrypt), can he compute another valid digest d' (i.e. using another salt r') for the same secret?$\endgroup$
– Claudio FloreaniMay 1 '13 at 14:02

$\begingroup$OT: Bcrypt is not a MAC, and, arguably, not a hash function either BCrypt implies both hashing and checksum, but that was not the point either. I only says that BCrypt can be viewed as a practical implementation of a function like H that I described.$\endgroup$
– Claudio FloreaniMay 1 '13 at 14:04

$\begingroup$OT: If your MAC is correct (i.e. "cryptographically secure"), then no, it is not possible to produce new pairs message+hash value without knowing the key. It may be more or less computationally feasible to find a collision, but this is possible and has been proved for the most MAC algorithms.$\endgroup$
– Claudio FloreaniMay 1 '13 at 14:05

First, separate the idea of "salt" from "hash". Salting is no more than a process applied to the message in a known way, such as appending the salt value to the end of the original data, yielding a "salted" message that differs from the original message. The hash algorithm is then performed using the salted message as input, yielding a digest value. A salt is typically a unique, random value added to a message ensuring that two otherwise identical messages will yield differing digests. This form of salting can be useful for ensuring that two identical passwords do not yield the same digest value, preventing what are called "rainbow table" attacks.

I assume that by "valid hash" you mean to run the hash algorithm a second time, yielding an identical digest to a previously computed digest.

Setting aside the concept of salt for a moment, I believe what you are asking for is if two different messages can yield identical digest values. This is called a "collision", and cryptographically secure hash algorithms must have "collision resistance" as one of their critical attributes. Without that attribute, someone could change a message undetectably, defeating the entire purpose of the hash algorithm.

Such a function is not known to exist for any of the widely used cryptographic hash algorithms, such as SHA, MD5, etc. Such a function theoretically COULD exist for a non-cryptographic hash algorithm, such as CRC32, Jenkins Hash, or others, but I could not provide an example.

$\begingroup$I assume that by "valid hash" you mean to run the hash algorithm a second time, yielding an identical digest to a previously computed digest. Right. I believe what you are asking for is if two different messages can yield identical digest values. No, I'm really asking if knowing a digest (and its salt) of a secret, it is possible to compute another digest of the same secret. That is not a collision, since if I know the secret I can calculate a different valid digest for each salt I choose.$\endgroup$
– Claudio FloreaniMay 1 '13 at 21:17

$\begingroup$I'm still guessing at your intent, but it seems like you want to create parallel systems that can yield different digests given the same inputs. Some implementations use unique salt as the differentiator, by taking the hash of message M+"system1", M+"system2", etc. Otherwise you could use different hash algorithms, but that would be complex, confining, and inflexible.$\endgroup$
– John DetersMay 2 '13 at 14:59

If I understand the question correctly, you got an unknown value $s$, and known values $d$ and $r$, such that, for some one way function $H$, $H(s,r) = d$. You want to find both a function $G:\{0,1\}^*\times\{0,1\}^*\to \{0,1\}^*$ and a function $V:\{0,1\}^*\times\{0,1\}^*\to \{0,1\}$ such that for any $x$,

$V(G(d,r'),s) = V(d,s) = 1$.

Technically, at the expense of allowing the digest value to grow with each $G$ call, you could implement functions with such properties like this:

Let $F$ be a cryptographically secure one way hash function with a fixed output length. Define $H(s,r)$ such that:

Parse $r$ as a bit string $r_1...r_m$.

Let $h_0 = s$.

For each $1 \le i \le m$, let $h_i = F(r_i|h_{i-1})$

Output $d = r|h_m$.

Now define $G(d,r')$ such that:

Parse $d$ as $r|h_m$.

Parse $r'$ as a bit string $r'_{m+1}...r'_{m+k}$

For each $m+1 \le i \le m+k$, let $h_i = F(r'_i|h_{i-1})$

Output $d' = r|r'|h_{m+k}$.

Lastly, define $V(d,s)$ such that:

Parse $d$ as $r|h$.

If $H(s,r) = d$, output $1$, otherwise $0$.

Edit:

I've heard about a technique called "hash extension attack", but this involves changing the length of the digest, that for BCrypt is fixed.

It should be noted that what I outline above will change the length of the digest, but that a hash extension attack won't, as explained by Thomas Pornin in his answer.

Also note that the above method can't be used on BCrypt, due to the way BCrypt mixes the salt with the input string. However, I am not sure all versions and implementations of BCrypt actually fix the output digest length, a fixed salt length of 16 octets seems common, but the algorithm outlined in the Wikipedia article (which with varying deviations corresponds to the most common implementations) would theoretically work with salt lengths up to 1024 octets.

$\begingroup$I think that the remaining x was an artifact, now removed, of a partial correction that I made to the question following comments by the author (one of which is now deleted). Please see the reworded question, and accept my apologies for the improper edit.$\endgroup$
– fgrieuMay 2 '13 at 4:48

$\begingroup$@fgrieu: I see. Well, calling the new salt $x$ or $r'$ shouldn't technically matter. What would make my answer invalid, would be a restriction of the length of the salt part $r$ of $d$, or a restriction that if $d \neq d'$, then there doesn't exist a sub-exponential function $f$ that would allow the derivation of $r$ from $d'$ or $r'$ from $d$.$\endgroup$
– Henrick HellströmMay 2 '13 at 6:11

$\begingroup$@HenrickHellström This is a good answer that shows a way to artificially construct H,G,and V that allow for any d to compute a d' so that V(d,r)=V(d',r).$\endgroup$
– Claudio FloreaniMay 2 '13 at 20:36

$\begingroup$Can you prove that if H(s,r)=d returns a different digest d for each r and R(d)=r, so there are $d_1(r_1), d_2(r_2), d_3(r_3)$ possible valid hash for the same secret s, it's generally impossible to compute a valid $d_x(r_x)$ from (one or more) $d_n(r_n)$?$\endgroup$
– Claudio FloreaniMay 2 '13 at 20:47

$\begingroup$@ClaudioFloreani: The opposite, with the algorithms in my answer, anyone can take a valid $d = r|h$ value and compute a new $d' = r|r'|h'$ value. If you want someone to be able to do this without any of the secret information involved in the original operation, anyone can do it. OTOH, if you want someone to be able to compute a new hash without all of the secret information, you need additional secret information. That changes your question completely.$\endgroup$
– Henrick HellströmMay 3 '13 at 5:03