2 Answers
2

Wikipedia has a reasonably good explanation about the Merkle–Damgård construction. The idea is the following: SHA-1 is built around an internal "compression function" which takes as input the 160-bit state and a 512-bit message block, and returns a new state. The padding is designed so that it can be proven that a collision over the hash function necessarily implies at some point a collision in one of the calls to the compression function.

For instance, imagine that the padding merely consisted in appending zeros, up to the next block boundary. Then you could imagine getting a collision between a one-block message $m_1$ and a two-block message $m_2$, where the second block of $m_2$ is exactly $m_1$ (this would require the processing of the first block of $m_2$ to end up with exactly the initial state). In this case, there is no collision in the compression function per se.

This does not mean that the all-zero padding is weak; but the padding used in SHA-1 and other Merkle-Damgård schemes ensures that we can concentrate on the compression function alone, without taking into account multi-block messages: if we can "prove" the compression function to be collision-free, then so is the whole hash function. That's what the padding is for: to turn a collision-free compression function into a collision-free hash function.

This preprocessing is a kind of padding, which makes sure that the input size is a multiple of 512 bits (which is necessary for the rest of the algorithm). If we would only pad the input with zero, there would no way to distinguish (in the hash) the empty input from the longer input 0x00.

Both the appending of the 1-bit and the appending of the original size helps here that there is a complete dependency on the message length as well as the content of the message.