Why can SHA-1 be considered a secure hash function? That's something I still wonder about.

I understand the concepts of why modern asymmetric algorithms are deemed to be secure. They are founded on sound mathematical problems that are provably "hard" to solve, e.g. discrete logarithms in finite fields or integer factorization. The concepts of security claims and proofs are relatively easy to follow if one is aware of the mathematical concepts.

But when it comes to symmetric cryptography and secure hash functions the picture becomes much less clear. I understand that there exist a lot of results and analysis for block ciphers and digest algorithms, but what are these results founded on?

E.g. when it comes to block ciphers you can find a lot of proofs that cipher algorithm X is resistant against a certain number of known attacks. Or they prove that some property holds, e.g. every bit of the input affects the output, because this is deemed necessary etc. etc.

From the outside, the construction of cipher and digest algorithms looks like "trying to fiddle and mess with the input as much as possible" by applying bit shifts, XORs and so on.

What I would like to know now (I would be grateful for deeper insight in either):

a) Could you provide me with pointers to resources (books preferred) that explain the design and security considerations one has to take into account when constructing a

a1) cipher algorithm

a2) digest algorithm

that would explain things such as why an S-box has to look exactly the way it does instead of any other way and probably even more important for me and my understanding why it would be bad if it were constructed differently?

b) Does there exist or are their attempts for modelling these "bit fiddling operations" mathemtically(e.g. are "algebraic attacks" based on such a model?) ?

c) How do you "measure" the quality of a digest algorithm such as SHA-1? I.e. how can you say it's better to do a shift by two bits here instead of three or an XOR, and why are these operations the basis of SHA-1 in the first place? Because at the time it seemed like the only known thing that would "maximally mess" with the input? I'm asking because it seems as if most SHA-3 candidates were either based on cipher algorithms (because there are more theoretical results) or e.g. on new concepts such as sponge functions. To me the definitions of any of the SHA algorithms (MD5 too) still look like "Let's mess with this, shall, we?" - but what's the reasoning behind it? Why do it the way they did?

I'd be more than happy if you could give me insight into any of these topics.

3 Answers
3

I can't give you an answer that is going to leave you perfectly satisfied, because there is no such answer. How do we know that our algorithms are secure? Strictly speaking, we don't. We have no proof that SHA256, or AES, or RSA are secure -- it is widely believed that they are secure, but I could not give you a mathematical proof of that fact, and who knows, it is always possible that widespread beliefs are wrong.

Our belief in the security of these algorithms comes from the fact that a lot of really smart, knowledgeable people have tried really hard to break these algorithms, without making much of a dent at all. Of course, this is not a guarantee that no clever attack exists -- it is always possible there is some incredibly sneaky mathematical shortcut attack that no one has been clever enough to find, but the more people who try to find one and fail, the less likely that looks. For practical purposes, it seems unlikely that a garden-variety attacker is going to discover an attack that dozens of really smart people tried to find and failed.

Your immediate reaction might be, what the heck? Why are those cryptographers so lame? Why can't they prove any of their algorithms secure? Are they numbskulls? The answer is, there are fundamental reasons that make it very difficult to prove that an encryption algorithm or a hash function is secure (except in special cases). Roughly speaking, proving that an algorithm like AES or RSA or SHA256 is secure seems likely to be at least as hard as proving that P != NP (an infamously hard problem in computer science). We have very few tools for proving that an algorithmic task can not be completed efficiently. At its core, this is what makes it hard to prove that SAT cannot be solved in polynomial time (i.e., hard to prove that P != NP), and makes it hard to prove that there is no shortcut attack on AES (i.e., that AES cannot be broken). So it's not just that cryptographers are lame, it is that we are up against very hard problems that no one knows how to make progress on.

Note that nothing I said above is specific to hash functions, or to symmetric-key cryptography. It applies to all computationally-secure cryptography, including symmetric-key encryption, public-key encryption, digital signatures, hash functions, and many other standard primitives that we take for granted.

Your last question was: Can anyone teach me the theory of how symmetric-key algorithms are analyzed and cryptanalyzed? How do cryptographers analyze them? How do attacks work? No, I cannot teach you this within the time and space constraints here. There is an entire research field built up around these questions, with an intellectually deep literature on techniques for analysis of symmetric-key algorithms. (See, e.g., the FSE, CRYPTO, and EUROCRYPT conferences.) It takes years of dedicated study to learn this material. Unfortunately, I cannot teach you all of that in the space available here. The very short version is: cryptographers have developed a large suite of attack techniques, and as a starting point, any new primitive is first analyzed to see if any of those attacks will work. If the primitive resists all of the known attack techniques, then cryptographers spend time trying to design ad-hoc or custom attacks against the primitive. Cryptographers also study artificially-weakened versions of the primitive (e.g., with fewer rounds), to learn about the best attacks on those weakened versions in an attempt to extrapolate to the full thing. If after many person-years of effort, no one is successful in attacking the scheme, then people start to gain more confidence. More recently, there has also been some research on gaining confidence that the high-level structure is sound, or that all attacks of a particular class are guaranteed to fail, using ideas from the provable security community.

But at the end of the day, it is an art as much as a science. Vetting a new primitive is extremely expensive: it takes person-decades of effort from intensely-talented specialists. For this reason, smart users of cryptography generally try to use existing, vetted primitives, rather than inventing their own. If you invent your own scheme, it is extremely unlikely you'll be able to arrange for as much analysis and vetting of your own scheme as the standard ones have already received -- so don't do that. Don't "roll your own". Instead, build on existing, standard, accepted primitives, like AES, SHA256, RSA, etc.

One of my questions still remains unresolved. I still don't know why XOR, bit shifting and the like are used for hashes and ciphers in the first place - if there is any mathematical reasoning behind this, why it's exactly these operations and nothing else? Do you happen to know where I can find some background material on this?
–
embossJul 20 '11 at 0:21

@emboss I expect they get used because (a) they work, and (b) they can be implemented efficiently in software. For example, XOR and addition can be a cheap (1 cycle) way to mix a lot of bits efficiently. And other operations (e.g., table look-ups, bit operations) are a cheap way to get non-linearity. You need both non-linearity and good mixing to have a secure cryptosystem. Thomas Pornin has some good book recommendations. The original AES submission also had some justification for their design.
–
D.W.Jul 20 '11 at 3:19

Thanks! OK, maybe I should read "The design of Rijndael", I heard good things about it.
–
embossJul 20 '11 at 9:27

@D.W. says it well; in shorter words: the only know way to deem a cryptographic algorithm is to have hundreds of cryptographers gnaw at it for a few years. This is not ultimately satisfying, intellectually speaking, but one can still work with that (the whole of Medicine, for instance, is built on even shakier grounds, but it is still a very useful art).

For the specifics (i.e. how a S-box is chosen, what is "avalanche effect", how to study things with algebra...), see (as always) the Handbook of Applied Cryptography, which begins to be a bit old but still a very good reference, and can be downloaded for free. Another good book is Algorithmic Cryptanalysis by Antoine Joux.

Thanks for Algorithmic Cryptanalysis, didn't know that one. When checking it out, I found this - do you know that? Sounds exactly like what I was looking for, no?
–
embossJul 20 '11 at 9:30

@emboss: "boolean functions" in cryptographic is the code name for a rather specific field historically coming from the use of lookup tables on the output of one or several LFSR. I fear that the book you cite may be overly restrictive with regards to what you look for (e.g. it will probably not include discussion on the overall structure of a block cipher).
–
Thomas PorninJul 20 '11 at 12:23

Thank you, Thomas! I guess I will start with something else then :)
–
embossJul 20 '11 at 13:48

One of my questions still remains unresolved. I still don't know why XOR, bit shifting and the like are used for hashes and ciphers in the first place - if there is any mathematical reasoning behind this, why it's exactly these operations and nothing else?

One of the reasons why exactly XOR is used is that because XOR has an important property: reversibility. If you XOR a number with a key and then XOR the result with same key again, you will get your original number.