Within a relatively short period of time, an aggregation of cryptanalysis, on various hash functions, surfaced. Along with this came a myriad of opinionated ways of looking at the implications. After just a cursory glance at what most of these views entailed, the obliviousness of the matter becomes apparent – the general community has an extremely limited understanding of what's going on, setting aside reiterations of the obvious, still, of which, are comprehended to a small degree.

Without further ado, or rehashing (pun slightly intended) what the media has already done a fantastic job at over-hyping, I'll mention a “structural” perspective on the matter, that isn't as much of a commonality as it should be. In other words, unless one practices cryptography in the realm of academia, it is likely that they haven't a clue. Hopefully, this memo will provide the nudge that non-academics may need, in order to be “clued in,” so to speak.

The main observation that I wish to address, is that of the design strategy of which the majority of the family (i.e., MD4 family) of conventional hash functions are composed; this family consists of MD4, MD5, HAVAL, RIPEMD, SHA-1, and SHA-2, and the respective output length extensions of those, where applicable.

Basically, in the MD4 family of hash functions, we're dealing with a common structure, which is the conversion of a block cipher into a one-way hash function, utilizing a Davies-Meyer feedforward construction; these functions, for the most part, can be summarized as exhibiting the structure of an UFN, or Unbalanced Feistel Network, that are both source-heavy and heterogenous.

Since the recent cryptanalytical reduction of SHA-1's generic level of complexity, from 2^80 to 2^69, the media has been an uninvited catalyst to nonsensical thinking. Many suggestions revolve around the panic of rushing to make infrastructure alterations to support SHA-2, and phase out the use of SHA-1. Before I go any further, I want to briefly stress the importance of not rushing cryptography at the implementation level; there is nothing quite as detrimental to security, than altering cryptography on a whim, at the implementation level. It's at this level where the effectiveness of cryptography is largely dictated.

First, this [migration to a larger hash function output] should have been instantiated long beforehand. Advocating primitives with, at least, 128-bit levels of security should already be standard practice, even if just in the de facto sense. Second, while SHA-256, for example, would be sufficient enough to satisfy this desire for 128-bit security, it is not a terminal solution. At best, it is an interim, and here's why:

Recent cryptanalysis has rendered enhanced methodologies and extensions that apply, in some form or other, to each construction within the MD4 family. It's not time to use functions within that family that support larger output; it's time to explore the route of designing hash functions that not only support larger output, but are composed of entirely different design strategies. It's time to analyze other strategies for constructing cryptographically-secure hash functions.

By “other strategies,” we're simply looking at anything but the UFN nature of the majority of existing conventional hash functions. There is one peculiar strategy that seems, in my opinion, to be the ideal point of interest in a case such as this. This particular strategy is the wide trail strategy. Sound familiar? The underlying primitive of the AES, Rijndael, is a product of this strategy, as are a variety of other similar ciphers within the family of those before it.

One interesting design, using strengthening of the Merkle-Damgård variety, and a Miyaguchi-Preneel hashing construction, with a block cipher that is similar, in some aspects, to Rijndael, is Whirlpool. This 512-bit hash function relies on the wide trail strategy, as well, and should be a model for the direction, or one of directions, that we should investigate. Another effect, somewhat direct and indirect, both, is the fact that such a direction would prompt for further, more extensive and rigorous, cryptanalysis of the wide trail strategy.

Not only would this benefit the confidence in this strategy as a methodology for constructing cryptographically-secure hash functions, it would also provide a better look into the same general principles on which Rijndael is based, which is certainly a beneficial tactic. In conclusion, I feel that the issue at hand – the real issue – is much deeper than simply extending the length of hash function output to provide a larger margin of generic security.

I feel this is a matter of structural significance, where we need allocate our analytical effort to other strategies, aside from Unbalanced Feistel Networks, and the composition of the MD4 family, lest we find ourselves in a position where every construction within the only family we have is susceptible to attacks that have the potential to become more efficient.

Note, I'm certainly not declaring that Unbalanced Feistel Networks are inherently insecure, by any means. I'm simply suggesting a change in design policy, so to speak. Look at it as a way of approaching hash function design in the way we've approached block cipher design.

Perhaps this need for strategic variety requires an initiative. Maybe a design-by-contest; it worked exceptionally well for the AES selection process. Consequently, the worst initiative is the one not taken, so we at least need to do something. SHA-2 may buy us time, and as we haven't a concrete idea as to how much, we should use what we have, wisely.

If you're an academic cryptographer, this should already be common knowledge, as it has been preached before; if you're not, well that's what this memo is for. A hash function is, perhaps, the most versatile of cryptographic primitives; it is, definitely, among the most versatile, at least. With that said, the importance of design strategy should be unambiguous in the third degree.

Last edited by JustinT on Wed May 10, 2006 4:09 pm; edited 1 time in total

Yet again a seemingly silent period from you, brought to an end with a well written post.

It is true that the media has overblown the whole hash function attacks to generalize that a large proportion of the cryptographic systems in use are now insecure. It is also much more true that you should never rush the implementation stage of a cryptographic system. Security takes time to implement correctly, and ever advancing deadlines do nothing to help the process.

I agree that the majority of the hash functions in use at the moment are very similar in design, and that needs to change. The most interesting thing I found about your post was that you are suggesting that a new standard hash function should be found by a 'trial-by-competition', much like the AES standard was found.

JustinT wrote:

Perhaps this need for strategic variety requires an initiative. Maybe a design-by-contest; it worked exceptionally well for the AES selection process.

I have seen a large number of cryptographers suggest this, including Bruce Schneier in this post to his blog. I quote:

Bruce Schneier wrote:

I'd like to see NIST orchestrate a worldwide competition for a new hash function, like they did for the new encryption algorithm, AES, to replace DES. NIST should issue a call for algorithms, and conduct a series of analysis rounds, where the community analyzes the various proposals with the intent of establishing a new standard.

I didn't realize that so many cryptographers liked this method of choosing a standard. I found it interesting at first, but after some thought I did come to the conclusion that it would probably result in the best method being chosen for the job. It's definitely a much better method that a standards committee, where everyone has to agree.

Yet again a seemingly silent period from you, brought to an end with a well written post.

Gracias! I'm recuperating from exhausting conference travels, and the generally hectic nature of life at 20.

mxb wrote:

It is true that the media has overblown the whole hash function attacks to generalize that a large proportion of the cryptographic systems in use are now insecure. It is also much more true that you should never rush the implementation stage of a cryptographic system. Security takes time to implement correctly, and ever advancing deadlines do nothing to help the process.

Certainly. Good security has costs, where time is one of them. The process cannot be rushed. Unfortunately, media hypes situations to the point of suggesting that rushing is just good instinct taking its due course. First, the wise will avoid situations that cause panic; their system should grant them enough comfort, and time, to gradually migrate to a more secure solution. This is possible if system architects design their security conservatively, and refrain from being minimal, with a "get-by" attitude.

Ironically, while the current security situation has been hyped beyond necessity, the effect it could have may render the type of insecurity that many are falsely assuming (thanks to the media) our current situation to be experiencing. If I can put this into English, it's almost like worrying a falsified notion into truth. Paranoia is crucial to security, but only if you comprehend what paranoia means in this context.

These hash functions, such as SHA-1, that are academically broken, impose practical implications that are deserving of immediate consideration, but not panic. For anyone who has analyzed a system that deploys a cryptographic framework, you have probably noticed that a hash function can be incorporated into a large number of components.

Security should be a tactful activity. As such, in my opinion, it seems logical to migrate, with a reasonable pace, to a hash function parameterization that gives us the collision resistance we should expect, generically; this should be no less than 2^128 complexity.

After all, when you're analyzing a system, there's more peace in having this assurance of collision resistance, rather than gamble having an academically broken hash function impose collision issues that may affect one component of our system more so than another. Anything to simplify analysis, especially when it comes to such a versatile and widely-used primitive, as a hash function, is warranted.

mxb wrote:

I agree that the majority of the hash functions in use at the moment are very similar in design, and that needs to change. The most interesting thing I found about your post was that you are suggesting that a new standard hash function should be found by a 'trial-by-competition', much like the AES standard was found.

It appears to be a fruitful method of standardization; this is echoed by the amount of cryptographic real estate that it rendered, with a group of finalist candidates that were all decent, respectively. Although not in entirely the same vein as the AES design-by-competition, there is also a similar initiative for stream cipher primitives, as well.

Given the monumental success of the AES selection process, and the fact that hash functions are among the most versatile and widely-deployed, yet least-understand, primitives we have, it makes sense to milk this route for what it's potentially worth. What further boasts this as a good idea is the fact that we have results to go by.

mxb wrote:

I didn't realize that so many cryptographers liked this method of choosing a standard. I found it interesting at first, but after some thought I did come to the conclusion that it would probably result in the best method being chosen for the job. It's definitely a much better method that a standards committee, where everyone has to agree.

Indeed, and to expand on this, we are left with two design philosophies, primarily - design by committee and design by competition. The former has the potential tendency of being unnecessarily closed, in process, and opens the door for political persuasion, and dangerous complexity and interoperability that cause inconsistencies in implementation, and do nothing to ease analysis and testing.

Oftentimes, the product of such a standardization committee is something horribly insecure, or too complex and non-interoperable to really live up to the intent of a standard, which is, the expectation of achieving a certain level of something, given the implementation of a specific set of rules. What good is a complex and non-interoperable standard? Certainly no good for conventional cryptography.

Design-by-competition opens things up a bit, and invites flexibility. There is a call for proposals. Experts deliver proposals. Other experts deliver evaluations of those proposals. Proposals are weighed against how well they comply to a strict body of expectations, or guidelines. Things are organized, consistent, and simple, the entire way through. The proposal that most efficiently complies to the guidelines is selected; if there isn't such a proposal that does, then more proposals can be requested. This demonstrates the flexibility, and rigorousness, of the process.

By following this from the beginning, and observing its somewhat-surprising success, it was a point of interest, in that it coaxed us into questioning, "why can't we do this with stream ciphers and hash functions?" We can, and, in my humble opinion, the potential reward of doing so is worth the effort. If we didn't, it would be a complete waste of a design philosophy that we've finally found to be success, for cryptographic security.

Last edited by JustinT on Fri Jul 08, 2005 3:49 pm; edited 2 times in total

i just did a report for class about the SHA security concerns. one of the things i came across was that the attack on SHA-1 that was recently discovered is supposed to only be able to be carried out with direct help from the attacker. it's interesting how little details like that get left out of the reporting on the issue.

Interesting note, especially because I have recently been studying cryptography and security proofs considering hash functions as random oracles. I would not be surprised if finding collisions in certain hashes turns out to be easy due to some inherent mathematical structure that was overlooked. I wanted to point my observations:

* Vanilla Hash functions are inherently insecure. They are non-random and cannot be considered secure. The only use of a hash function should be to "map" elements to a finite set for some higher level cryptographic operation that is independently secure (without the hash). Example the recent published schemes of Boneh, Boyen etc that do not rely on random oracles.

* Keyed hash functions can be used instead of normal hash for people and organizations who are very paranoid. The "key" is generated and made public by a trusted central authority so that anyone can authenticate files using the hash function. The central authority peroidically changes the key so that the hash not only authenticates the file, but also the time at which the file was hashed.

Out of curiosity, do we know of even a single instance of collision in MD5? (I mean does anyone explicitly have two values m1 and m2 such that MD5(m1)=MD5(m2) ?) I guess from the definitions of MD5, it may be possible to manually construct two such values working backwards from somewhere in the "middle" of the hash.

Also, I want to know if this sort of hash function is possible:
(a) Given a specific hash value it is hard to find a collision
(b) It is easy to generate chosen hash values with collisions

Out of curiosity, do we know of even a single instance of collision in MD5? (I mean does anyone explicitly have two values m1 and m2 such that MD5(m1)=MD5(m2) ?) I guess from the definitions of MD5, it may be possible to manually construct two such values working backwards from somewhere in the "middle" of the hash.

Pardon my late reply. I had the majority of this post completed over a week ago, but have been so busy with work, and finals (I'm sure some of you are experiencing these, or have experiened them recently!). I just took a little more time to think about it all, before posting.

Quote:

* Vanilla Hash functions are inherently insecure. They are non-random and cannot be considered secure. The only use of a hash function should be to "map" elements to a finite set for some higher level cryptographic operation that is independently secure (without the hash). Example the recent published schemes of Boneh, Boyen etc that do not rely on random oracles.

Well, it depends on the context. For example, a hash function may not be collision-resistant, generically (the birthday bound), but may still suffice in HMAC (as a secure PRF). Naturally, we wouldn't expect a PRF to be collision-resistant if the adversary had knowledge of the key. Different applications depend on different properties of the hash function (i.e., HMAC's security depends on a few assumptions about the compression function). Now, if we look at it in the context of your next paragraph, about keyed constructions that use hash functions, then sure, a hash function would be insecure. That is, a hash function wouldn't preserve integrity, but a MAC that uses one could.

In regards to a hash function, we do hope that it will at least emulate a random oracle, in behavior, such that the difference is negligible enough to prevent being distinguished by a polynomial-bounded distinguisher (i.e., discerning between computational indistinguishability from uniform distribution and uniform distribution). In the context of the random oracle model, however, we can show that for some schemes which are secure when using a random oracle, they will become insecure when we substitute the random oracle with any function (i.e., hash function) or function ensemble (i.e., MAC).

While there is criticism about proofs under the random oracle model, I feel that although it isn't indicative of inherent security (i.e., security under the random oracle model doesn't necessarily imply security of possible "real-world" implementations), it seems logical to assert that "it's often better to be secure under the random oracle model, than to be insecure under the random oracle model." In other words, you could look at the random oracle model as an abstraction for analysis; however, it can often be irresponsible to conclude that security under this ideal model implies security of a real-world implementation, when we replace the random oracle with some function or function ensemble.

Instead, it seems more conservative to use this model as a "sandbox", where schemes are evaluated, such that if the scheme is insecure in the random oracle model, we discard it. On the other hand, this sandbox may not be able to weed out every insecure scheme, so we can't conclude that a secure scheme, in this sandbox, will necessarily exist in the form of a secure, real-world implementation. I suppose, in reality, it may not be complete, as to be a solid abstraction for analyzing the security of a scheme, but it may suffice in partially evaluating schemes, for the purpose of weeding out some insecure schemes. Albeit heuristic, it appears to instill some added confidence that a scheme may very well be secure, in the sense of "structural correctness", if you will. It's nothing concrete, but they're important results.

When you mention Boneh and Boyen, are you referring to security based on, for instance, the decisional Bilinear Diffie-Hellman assumption (DBDH)? I've read a few papers along those lines - many of which discuss the security of schemes in both the standard and random oracle models.

Quote:

* Keyed hash functions can be used instead of normal hash for people and organizations who are very paranoid. The "key" is generated and made public by a trusted central authority so that anyone can authenticate files using the hash function. The central authority peroidically changes the key so that the hash not only authenticates the file, but also the time at which the file was hashed.

Wait, are you referring to a MAC construction that uses a hash function, such as HMAC? The authentication key used for the MAC should be secret; otherwise, a valid MAC can be computed on an altered message. Depending on the composition used, how the keys are generated is important. As well, who are the keys made public to, and how are they negotiated? In the case of a MAC, "anyone" would need to be defined as those who are intended to authenticate MACs and produce them; if the goal is to prevent the successful manipulation of some message m into an altered message m', without detection, then for anyone to have knowledge of the secret MAC key would produce the same result as not having a MAC at all (i.e., or using an unkeyed hash value). Except, in this case, you still have the overhead of the MAC, but none of its security. Overall, if the secret key of the MAC becomes public, anyone can alter a message and compute a valid MAC tag; thus, it's an insecure idea.

Quote:

Out of curiosity, do we know of even a single instance of collision in MD5? (I mean does anyone explicitly have two values m1 and m2 such that MD5(m1)=MD5(m2) ?) I guess from the definitions of MD5, it may be possible to manually construct two such values working backwards from somewhere in the "middle" of the hash.

Yes, there have been collisions produced for MD5; in fact, there's also source code available for producing collisions, based on Wang and Yu's attack (referenced in the previous post). Just recently (5 days ago), Jie Liang and Xuejia Lai published a paper on the Cryptology ePrint, outlining a fast attack algorithm based on Wang and Yu's two-block collision differential path; they also claim that their proposed "small range" searching technique improves the probability and complexity for finding collisions, when juxtaposed with the "Advanced Message Modification" technique in the papers by Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu, and Xiaoyun Wang and Hongbo Yu, respectively.

In August of this year, Jun Yajima and Takeshi Shimoyama showed that the conditions for the message modification technique by Wang and Yu weren't sufficient, so they proposed an extended set of conditions, which Liang and Lai now show aren't sufficient either, and give their own proposal. They give different complexities for their own attack - no more than 2^{35} operations - and existing attacks, by considering these conditions (i.e., Yu Sasaki, Yusuke Naito, Noboru Kunihiro, and Kazuo Ohta [1] - around 2^{33} operations; Vlastimil Klima [1, 2] - around 2^{36} operations; Xiaoyun Wang and Hongbo Yu [1] - around 2^{41} operations). Liang and Lai claim that they can find collisions within five hours, on a Pentium 4 1.70GHz. Consult all the above links, for extensive mathematical details.

Quote:

Also, I want to know if this sort of hash function is possible:
(a) Given a specific hash value it is hard to find a collision
(b) It is easy to generate chosen hash values with collisions

I'm not quite sure I understand your definitions of (a) and (b), so I'll sketch out what it appears it could be. First, with (a): given a hash function, h, h(x) is the output of h, for some unknown input x (potential message, m, m', et cetera). Find a pair of messages, m and m', where m != m', such that h(m) = h(m') = h(x). In a first-preimage attack, all you have is the hash function output, and wish to find a message that will hash to that value; however, you also mention collision, so that's throwing me off. Perhaps you want to find a collision, where both messages hash to the specific hash function output you have? Or, first, find a message, m, that will hash to a given hash output (first-preimage), then find a message, m', where m != m', such that h(m) = h(m') (second-preimage); generically, for a n-bit hash function, that would require 2^{2n} work.

With (b), you want to generate "chosen hash values" with "collisions." If you generate them, then that must mean you're applying a hash function to some known message, and you want to find another message that hashes to that same value. So, choose a message m, where h(m) is the hash output of the input m. Find a message m', where m != m', such that h(m) = h(m'). Maybe you want to choose many messages like m, and efficiently find corresponding messages like m' that hash to the same hash output? That would be a second-preimage attack, but it shouldn't be easy to find those (at least 2^{n} work for a n-bit hash function, where n is reasonably large). It's late, and perhaps I'm not understanding what exactly these two conditions, (a) and (b), are; this is mainly because I can see preimage and second preimage attacks against them, but the word "collision" in there is making me wonder if I'm thinking along the same line as you are.

Are you asking if there are hash functions which are not collision resistant, but still preimage or second-preimage resistant? If so, I think that's the current state of many hash functions. Also, consider Schneier and Kelsey's theoretical attack for finding second-preimages for SHA-1 with 2^{106} work, as opposed to the expected 2^{160}; this can be generalized for most all other hash functions. But, sure, assuming that's what you're asking - a hash function can be preimage-resistant, but not collision-resistant. In the real-world, this is generally more fortunate, than if there were preimage attacks instead.

Last edited by JustinT on Wed Dec 07, 2005 9:28 pm; edited 1 time in total

Also, I want to know if this sort of hash function is possible:
(a) Given a specific hash value it is hard to find a collision
(b) It is easy to generate chosen hash values with collisions

I'm not quite sure I understand your definitions of (a) and (b)

Well I actually described an unusual type of cryptographic object... Let me see if I can give an example using an analogy in asymmetric cryptography.

Imagine a signature scheme like RSA that is existentially forgeable... It is easy to generate a random valid message-signature pair (Generate random string m, then RSA signature=m, RSA message =m^e). However, given a RSA message m, it is hard to generate a valid RSA signature m^(1/e).

I wanted to see if such a type of hash function can be constructed (although I can't think of any applications at the top of my head). This hash function needs some sort of a "trapdoor", which is needed for preimage attacks but not needed for generating arbitrary collisions. However, knowing two or more preimages should not reveal the trapdoor (CCA security).