The length-extension bug is that, given (x,SHA-256(m),length of m), one
can easily compute SHA-256(m,padded length of m,x).

One consequence of the length-extension bug is that it's not safe to use
SHA-256(k,m) as an authenticator of m under a secret key k. The attacker
can use this authenticator to forge an authenticator of any message of
the form (m,padded length of (k,m),x).

Applications can work around the length-extension bug without very much
trouble, but these workarounds lose speed. For example, the bug doesn't
hurt the HMAC-SHA-256 authenticator, but HMAC-SHA-256 also has terrible
overhead for short messages.

If SHA-256(message) had been defined as h(h(h(IV,0mes),0sag),1e10), with
the final input block visibly distinct from all other blocks, then this
bug would have disappeared, and would not have required any effort for
applications to fix. Does anyone know why compressing an extra block
became more popular than tweaking the final block?

One way to view the length-extension bug is as as follows. An attacker
given oracle access to a uniform random compression function h, and to
the function H defined by H(message) = h(h(h(h(IV,mes),sag),e10),007)
where IV is a uniform random string (kept secret), can see that

H(message10007) = h(h(H(message),100),012).

The same equation is almost never true for independent uniform random
functions h and H. From this perspective, there may be many other ways
to distinguish the pair (h,H) from uniform, and one can reasonably ask
to eliminate all of those bugs, not just the length-extension bug.

Anyway, the techniques of http://cr.yp.to/papers.html#easycbc should
allow an easy proof that the pair (h,H) is indistinguishable from
uniform if H(message) is defined as h(h(h(IV,0mes),0sag),1e10). An easy
corollary is that H is a secure authenticator.

Nobody should be using this authenticator, or any other SHA-based
authenticator---the Wegman-Carter framework has faster authenticators
with stronger security bounds; see http://cr.yp.to/mac.html---but there
are other applications of hash functions, and other ways to see the
damage caused by the length-extension bug.

Suppose, for example, that we use H as the hash function in the
Rabin-Williams signature system. Can we prove that any generic attack
against the system---any attack that works for all pairs (h,IV), given
access to h and IV---can be turned into a factorization algorithm? What
people normally prove is that an attack that works for _all_ functions H
can be turned into a factorization algorithm; this isn't enough.

I think that the answer for h(h(h(IV,0mes),0sag),1e10) is again yes,
although I haven't worked out the details. The answer for a hash
function with the length-extension bug is surely no.

I also think that this is what the Coron-Dodis-Malinaud-Puniya CRYPTO
2005 paper is trying to say, but I haven't deciphered their definitions
yet, and unfortunately the paper doesn't contain proofs for most of its
claimed theorems. Hopefully a subsequent paper will do better.