Assume I want to design a protocol (or data format or similar) including some cryptographic hash, and want it to be as future-proof as possible, i.e. I want to avoid that breakthroughs in cryptography make my old data insecure.

If I use only one hash algorithm, I can have bad luck to pick just the one which will be broken after some years.

For example, MD4 (published 1990) had first collision attacks in 1995, and as of 2007 these are even cheaper than calculating the hash itself.

So, the idea would be to combine multiple such algorithms in a way that breaking just one (or some) of them does not compromise the security of the combined construct - breaking all of them would be necessary.

I do not care about efficiency loss from calculating multiple hashes instead of one, I just want to be sure against better cryptanalysis.

So, if I have hash functions H1, H2 ... Hn, how to combine them to form a hash function which is as secure as any of them?

Two basic ideas, which do not work as intended:

simple concatenate the outputs: H1(m) || H2(m) || ... || Hn(m).

This should be quite secure against collision attacks, but is only as
secure to preimage attacks as the weakest of them (and the other ones
then can be used to check if I found the right preimage).

chaining the functions: H1(H2(...(Hn(m)...))

For preimage attacks, you would now need to break all of them. But a
collision in Hn trivially leads to a collision in the result, too.
(Collisions in the others are less easy to exploit, as you then need
to have to get preimages in the previous ones.)

Any better ways of combining them? Or is this a stupid idea at all?

To clarify: I do not really need more security than the hardest of the component functions, but I want at least that much. (And using more space is not really a problem here.)

And for preimages, I'm now mainly concerned about someone finding the original preimage (which is around the size of the hashes output), and it looks like this will be at most as difficult as for the weakest hash, in the case of simple concatenation.
I'm not so much concerned about someone constructing a new one - this sounds quite difficult, but is not so much, if the size of the message can grow as needed (one block per bit of hash or such) and we have collision attacks on the compression function, as Joux shows.

IMO, this is a bad idea. You have no rational basis to believe that your resulting function is any more secure than any other unbroken hash function that produces an output of the same length. Say, for example, you XOR two hash functions together. How do you know that some future researcher won't find some correlation between the bits of the two hash functions making the XOR weaker than either one alone?
–
David SchwartzAug 23 '11 at 12:28

1

I'm aware of this, which is why I didn't include XOR in my list of proposals. I'm searching other ideas on how to make it safe. (Also, I'm not constrained on "same size".)
–
Paŭlo Ebermann♦Aug 23 '11 at 12:34

I'm just using XOR as an example. There is nothing special about XOR. My point is, whatever operation you use to combine them could add its own vulnerabilities.
–
David SchwartzAug 23 '11 at 12:40

@David: This is not true. Depending on the properties that you are interested in, there are combinations that are provably more secure than the two functions individually. Take pre-image resistance as example. A pre-image for the function $H_1(m)\|H_2(m)$ requires you to find a message $m$ that is a pre-image for both $H_1$ and $H_2$.
–
acerberusDec 14 '13 at 16:38

7 Answers
7

Combining is what SSL/TLS does with MD5 and SHA-1, in its definition of its internal "PRF" (which is actually a Key Derivation Function). For a given hash function, TLS defines a KDF which relies on HMAC which relies on the hash function. Then the KDF is invoked twice, once with MD5 and once with SHA-1, and the results are XORed together. The idea was to resist cryptanalytic breaks in either MD5 or SHA-1. Note that XORing the outputs of two hash functions relies on subtle assumptions. For instance, if I define SHB-256(m) = SHA-256(m) XOR C, for a fixed constant C, then SHB-256 is as good a hash function as SHA-256; but the XOR of both always yields C, which is not good at all for hashing purposes. Hence, the construction in TLS in not really sanctioned by the authority of science (it just happens not to have been broken). TLS-1.2 does not use that combination anymore; it relies on the KDF with a single, configurable hash function, often SHA-256 (which is, in 2011, a smart choice).

As @PulpSpy points out, concatenation is not a good generic way of building hash functions. This was published by Joux in 2004 and then generalized by Hoch and Shamir in 2006, for a large class of construction involving iterations and concatenations. But mind the fine print: this is not really about surviving weaknesses in hash functions, but about getting your money worth. Namely, if you take a hash function with a 128-bit output and another with a 160-bit output, and concatenate the results, then collision resistance will be no worse than the strongest of the two; what Joux showed is that it will not be much better either. With 128+160 = 288 bits of output, you could aim at 2144 resistance, but Joux's result implies that you will not go beyond about 287.

So the question becomes: is there a way, if possible an efficient way, to combine two hash functions such that the result is as collision-resistant as the strongest of the two, but without incurring the output enlargement of concatenation ? In 2006, Boneh and Boyen have published a result which simply states that the answer is no, subject to the condition of evaluating each hash function only once. Edit:Pietrzak lifted the latter condition in 2007 (i.e. invoking each hash function several times does not help).

I edited the question a bit - getting only the strength of the hardest of both is enough, even for using more space. I just want to make sure that I don't somehow get less than the security of the hardest (without knowing which is harder). (I'll still have to read most of your links.)
–
Paŭlo Ebermann♦Jul 29 '11 at 19:08

@Paulo: getting both preimage resistance and collision resistance looks like an interesting problem. I am not aware of any existing research on the subject. For second-preimage, this is easy: resistance to second preimages it at least equal to resistance to collisions (we usually want it to be higher, but that's already something).
–
Thomas PorninJul 30 '11 at 2:42

3

In 2013, Arno Mittelbach provides a new definition (and method) for combiners shorter than concatenation that provide as much collision resistance as the stronger of the two hash functions (and similarly for other properties) with the added assumption that at least one of the functions is IRO. eprint.iacr.org/2013/210
–
archieOct 30 '13 at 8:50

You were right with your ideas in the the original question. If what you want to protect against is pre-images then chaining hash functions produces a function at least as strong as the strongest of its two components:

H∘(x) = H₀(H₁(x))

If what you want to protect against is collisions, then concatenation is at least as strong as the strongest of its two components:

H|(x) = H₀(x) | H₁(x)

There are several other properties that you might want from your combined function, such as pseudorandomness. For pseudorandomness, you could combine two hash functions like this:

H⊕(x) = H₀(x) ⊕ H₁(x)

The tricky part (as you observed) is if you want to have more than one of these properties. The best research about this that I'm aware of so far is Anja Lehmann's dissertation. (You can find discussion of this and related topics on the "One Hundred Year Cryptography" wiki at the Tahoe-LAFS project.)

If I needed more than one property from a secure hash function, and didn't mind extra CPU cycles, and didn't mind double the output size, then I would probably use Lehmann's $Comb_{4P}$ construction and not worry too much about the rather remote possibility that the resulting combined function may not preserve pre-image resistance.

If you're sure that you only need one property (careful here—think very carefully about this and write down explicitly what property or properties you rely on, and what an attacker can do if each possible property doesn't hold), then you can safely use one of the combiners above.

By the way, that dissertation also includes very interesting results on two other topics that have been discussed in this thread: whether you can have a combined function C(H₁, H₂) that is stronger at collision-resistance than the strength of H₁ plus the strength of H₂ (she answers in the affirmative) and whether the way that SSL and TLS combined SHA1 and MD5 was secure (answer: sort of...).

I always feel nervous directly feeding the result of one hash function into another. I know that theoretically it's supposed to be just fine, but somehow it seems like maybe my output range will in actuality be smaller than the full possible output range. Of course, if I just fed it in concatenated with itself I'd feel better about it, but that's probably also a bit silly.
–
OmnifariousAug 4 '11 at 23:12

3

@Omnifarious: I think this is the reason one usually uses H₀(H₁(x)|X) or such instead of directly H₀(H₁(x)) - to have the improved preimage resistance, while still not having more collisions than with only one hash.
–
Paŭlo Ebermann♦Aug 5 '11 at 0:35

@Zooko: Thanks, I'll have a look at this dissertation you linked.
–
Paŭlo Ebermann♦Aug 5 '11 at 0:36

1

Chaining Hash functions is not a good idea. Assume $H_0$ is a random oracle (and thus perfectly one-way/pre-image resistant). Now assume that $H_1$ is a constant function (for example it maps every input to $0$). Then the combination $H_0(H_1(x))$ is not pre-image resistant as it is a constant function.
–
acerberusDec 14 '13 at 16:28

I'm sure @Thomas will give a thorough answer. In the interm, I'll just point out that the collision resistance of your first construction, H1(m)||H2(M) is surprisingly not that much better than just H1(M). See section 4 of this paper:

In fact, I was content of having resistance(combination) = max(resistance(H1), resistance(H2)), for all useful values of resistance (preimage, collision, second-preimage).
–
Paŭlo Ebermann♦Jul 29 '11 at 17:56

It looks like the attacks described here all work from a collision in the compressing function to generate multi-attacks on long messages by selecting the right blocks for individual messages. They don't really apply on short messages (about hash output size or such), or messages of some fixed format (without enough space to insert random collisions), I think.
–
Paŭlo Ebermann♦Jul 29 '11 at 18:07

Multicollisions are based on collisions in the compression function as Paulo points out. If you have some of those, you can exploit the iterated structure (mostly Merkle-Damgaard) of the hash function. This has nothing to do, however, with the concationation combiner $H_1(m)\|H_2(m)$.
–
acerberusDec 14 '13 at 16:32

Well, I see two clean ways of having practical resistance to these vulnerabilities.

If you want to use two hash functions, make sure you feed back the original data in a HMAC to the second function:

hash = algo1(data)
hash = hmac(algo2, data, hash)

The benefit here is that any collisions for algo1 will not automatically become collisions for algo2 due to the MAC. So for a collision attack to work, the attacker would have to find a collision for both functions using the same source data. This should in practice be significantly more difficult than attacking either function independently (it will be at least as difficult as attacking the strongest of the two functions).

The other method would be to simply iterate a single hash function (with feedback). This looks similar to the previous algorithm.

Where n is greater than or equal to 0. The larger it is, the slower it will be to compute. Note that this is basically just PBKDF2 with an empty salt and the length parameter set to the output size of the hash.

The benefit to "stretching" is that it protects against both preimage and collision attacks since even if the attacker was able to find a preimage for the first round, that still leaves them multiple rounds to attack. And considering the feedback, the data necessary to attack a specific round is destroyed on the next one. So even if the attacker did manage to get one round of a preimage attack done, it would be very difficult (if not impossible) to attack the other rounds.

I don't see the point. Sure, collisions for algo1 won't automatically become collisions for algo2. But your overall algorithm could have collisions that none of the the underlying algorithms have. I see no rational basis to think that the new collisions created aren't likely to be worse than the possible collisions in the underlying hash functions. On the bright side, this will probably be no weaker than the stronger of the two hash functions. So you're probably okay if one of them is broken but not both of them.
–
David SchwartzAug 23 '11 at 12:30

@David: Well, nothing's going to help you if both of them are broken... The point was to be as secure as the most secure function (if not better)...
–
ircmaxellAug 23 '11 at 15:18

If you want only specific combiners it's often possible to create hash functions that are at least as good as the stronger hash. For example concatenation for second pre-images and collisions. Though that combination might be weaker than one algorithm producing a long hash.
–
CodesInChaosJul 5 '13 at 21:11

My argument is that if you can design hash functions with certain characteristics, if you just pick two hash functions and combine them, you might. So yes, you might improve things, or not.
–
ddyerJul 5 '13 at 21:16

Concatenating two hashes provably is as collision resistant as the stronger of the two. So it certainly doesn't make things worse compared with using only one of them. It probably makes bad use of a given output size or processing time budget.
–
CodesInChaosJul 5 '13 at 21:23

Suppose I design a very bad hash, such that all it's output bits are ones. How would concatenating that hash with another hash be as strong as the other.
–
ddyerJul 5 '13 at 22:39

@ddyer So assume our concatenated function is $H(x) := H_1(x) || H_2(x) = H_1(x) || \mathtt{1...1}$. Then $H(x_1) = H(x_2)$ still implies $H_1(x_1) = H_1(x_2)$, i.e. $H$ is (for collision resistance) as strong as $H_1$. (It is not as strong as a $H_1$-like design with the same output size as $H$ could have been, though.)
–
Paŭlo Ebermann♦Jul 6 '13 at 10:35

Use a modified duplex construction. First run the message through $h_1$, then take the final 32-bits of the result, XOR it onto the first 32 bits of the, say 512 bit IV. Then put the IV into a 512-bit hash function, $h_2$. Then take the first 32 bits of the hash function and put it into the hash buffer. Repeat this process until enough hash has been put into the buffer. I know, the speed is $n(h+i)$ where $h$ is the speed of $h_1$, $i$ is the speed of $h_2$, and $n$ is the number of rounds. But if $h_1$ and $h_2$ are really fast, then that's good news!

Thanks for your brainstorming of a new "solution" ... do you have any analysis why this is better than other ways of combining them? In which way the first or final some bits are special here?
–
Paŭlo Ebermann♦Jul 5 '13 at 8:54

Okay, it has a good avalanche property, it's good against length extension attacks, and it has the properties of both h1 and h2 (only use for good h1/h2). The first/final bits are special because they combat against length-extension attacks.
–
Gavriel FeriaJul 6 '13 at 22:43

You can also switch between h2,h3,h4,...,hn, and switch to one on certain conditions.
–
Gavriel FeriaJul 6 '13 at 22:46

And why did anyone downvote this answer? It was a great asnwer though.
–
Gavriel FeriaJul 6 '13 at 23:02

I forgot something. A collision in $h_1$ doesn't necessarily mean a collision in the resultant hash $h_p$, with probability $2^{-224}$. And a collision in $h_2$ doesn't mean a collision in the resultant hash $h_p$ with The probability $2^{-256}$.
–
Gavriel FeriaJul 14 '13 at 21:50