August 15, 2004

SHA0 is cracked

According to the below post, SHA-0 has been cracked. The researchers crunched their way through lots of data and lots of cycles and finally found some text that hashes to the same value. And people at Crypto 2004 in Santa Barbara are reporting the fall of many older message digests such as MD5 as well.

A brief explanation: SHA-0 is one of a big class of messaging digest algorithms that has a particular magical property: it gives you a big number that is one-to-one with a document. So take any document and hash it with a message digest, and you get this big number, reliably. Here's one for the dollar contract my company issues: SHA#e3b445c2a6d82df81ef46b54d386da23ce8f3775. It's big, but small enough to post or send around in email [0].

Notwithstanding last week's result, you can't find a document that also hashes to that number, so software and people can reliably say, oh, yes, that's the Systemics PSD dollar. In short, message digests make wonderful digital identifiers, and also digital signatures, and we use them bountifully in Ricardo [1].

So if SHA-0 has been cracked, it might be a big deal. Is our digital infrastructure at risk? Yes, it's a big deal, but no, there is little risk. Here's why.

In cryptographic terms, this is a big deal. When the NSA designed SHA back in the early 90s, it was designed to be strong. Then, as the standards process plodded along, the NSA came out with a correction. It seems as though a flaw had been discovered, but they didn't say what that flaw was.

So we ended up with SHA-0, the suspect one, and SHA-1, the replacement. Of course, cryptographers spent years analysing the tiny differences, and about 6 years ago it all became clear (don't ask me, ask them). And now, those flaws have been exploited by the crack and by the machines. So now we know it can be done to SHA-0.

Luckily, we all use the replacement, SHA-1, but this will also fall in time. Once again, it is lucky that there is a new generation coming online: SHA-256, SHA-512 and the like.

But as a practical matter, this is not a big issue. When we as financial cryptographers build systems based on long term persistence of hashes, we weave the hash and its document into a system. This is called entanglement, whereby the hash and the document are verified over time and usage [2]. We use the software to lay a trail, as it were, and if someone were to turn up with a bogus document but a matching hash, there would be all sorts of other trip wires to catch any simplistic usage.

Also, bear in mind that the two documents that hashed to the same value are pretty useless. It took Antoine Joux and his team 80,000 CPU hours to do it, even then. So in cryptography terms, this is a milestone in knowledge, not a risk: For practical purposes, any message digest still fits the bill, as long as it is designed into a comprehensive system that provides backup by entanglement [3].

Common hash value (can be found using for example "openssl sha file.bin"
after creating a binary file containing any of the messages)
c9f160777d4086fe8095fba58b7e20c228a4006b

This was done by using a generalization of the attack presented at Crypto'98
by Chabaud and Joux. This generalization takes advantage of the iterative
structure of SHA-0. We also used the "neutral bit" technique of Biham and
Chen (To be presented at Crypto'2004).

The computation was performed on TERA NOVA (a 256 Intel-Itanium2 system
developped by BULL SA, installed in the CEA DAM open laboratory
TERA TECH). It required approximatively 80 000 CPU hours.
The complexity of the attack was about 2^51.

We would like to thank CEA DAM, CAPS Entreprise and BULL SA for
their strong support to break this challenge.

Eli Biham -- has collisions on 34 (out of 80) rounds of SHA-1, but can extend that to probably 46. Still nowhere near a break.

Antoine Joux -- his team announced the collision on SHA-0 earlier this week. There is concentration on the so-called "IF" function in the first 20 rounds... f(a,b,c) = (a & b) ^ (~a & c). That is, the bits of a choose whether to pass the bits from b, or c, to the result. The technique (and Eli's) depends on getting a "near collision" in the first block hashed, then using more near collisions to move the different bits around, finally using another near collision to converge after the fourth block hashed. This took 20 days on 160 Itanium processors. It was about 2^50 hash evaluations.

Xiaoyun Wang was almost unintelligible. But the attack works with "any initial values", which means that they can take any prefix, and produce collisions between two different suffixes. The can produce the first collision for a given initial value in less than an hour, and then can crank them out at about one every 5 minutes. It seems to be a straightforward differential cryptanalysis attack, so one wonders why no-one else came up with it. The attack on Haval takes about 64 tries. On MD4, about 4 tries. RIPE-MD, about 2 hours (but can improve it). SHA-0 about 2^40 (1000 times better than Joux).

Xuejia Lai clarified that the paper on E-print has been updated with correct initial values. They were initially byte-reversed, which they blamed on Bruce Schneier.

In the light of day and less inebriated, I'd like to clarify some of what I wrote last night, and maybe expand a bit. My original account wasn't what I'd like to think of as a record for posterity.

Greg.

At 13:11 2004-08-18 +1000, Greg Rose wrote:

> Xiaoyun Wang was almost unintelligible.

This was not meant as a criticism. It just meant you had to concentrate really hard. Her English is much better than my Chinese.

> But the attack works with "any initial values", which means that they can take any prefix, and produce collisions between two different suffixes. The can produce the first collision for a given initial value in less than an hour, and then can crank them out at about one every 5 minutes.

As mentioned previously, the idea is to produce a good "partial collision" with the first blocks input to the hash, and then from two similar starting points, find subsequent blocks that correct them back to the same output. So, for a given input chaining vector, it takes about an hour to get the partial collisions in the first input block. From there, they can compute subsequent "second blocks" to produce the collisions in a few minutes.

Note that they did two entirely new collisions for MD5 overnight, communicating back to China, when they found out about the endianness problems. No-one can now argue that they can't find collisions at will.

> It seems to be a straightforward differential cryptanalysis attack, so one wonders why no-one else came up with it.

With further hindsight, and Phil Hawkes' help, I understand now. The technique needs to alternate between single bit (xor) differences and subtractive differences, with careful conditioning of the bits affected to make sure the right carries do, or don't, propagate. This is explained in Phil's upcoming paper about SHA-2 cancellations, which was presented later in the rump session. That should be on e-print in the next couple of days. The Chinese team is also writing a much more detailed paper, but it will take longer.

There has been criticism about the Wang et. al paper that "it doesn't explain how they get the collisions". That isn't right. Note that from the incorrect paper to the corrected one, the "delta" values didn't change. Basically, if you throw random numbers in as inputs, in pairs with the specified deltas, you should eventually be able to create your own MD5 collisions for fun or profit. How they got this insight, we'll have to wait for... but the "method" is already there.

I'm still at Crypto. SHA-1 is still safe. There have been a lot of unconfirmed reports about all sorts of things.

The bottom line is that SHA-1 is the most analyzed, still-safe hash function we have. That is also the bad news. There needs to be a lot more work on hash functions. However, none of the attacks we learned of this week apply to SHA-1.