I think we have a failure to communicate here. I am making two
claims. First, the primary protection against digest collision attacks is
the search time (and, for birthday attacks, storage) required to find
digest collisions, not any limit on the number of documents with a given
digest, so the third sentence of the proposed text is true but irrelevant.
Second, even granting that it were relevant, the argument against
normalizing the character set is also IMO wrong. Let us suppose that the
intended forgery is to insert the word "not" between "will" and "be" in a
specific sentence. Furthermore, let us suppose that a characteristic
normalization transform maps the Latin-1 character for 1/2 (U00BD), the
ASCII string 1/2, and the composed sequence 1 U2044 2 all to the same value
on the grounds that they all represent the fraction one-half, and let us
suppose that there are exactly 100 occurrences of the Latin-1 character in
the document, but none of the others. If the transform is applied before
digesting, substituting one of the other two forms for 1/2 for the original
has no effect on the digest, because the transform maps all three to the
same character sequence, so the forged document has only one possible
digest. If it isn't, each such substitution yields a different digest and
the total number of digests available for the same document appearance is
3**100, which is more than 1/3 of the total number of possible digest
values. Search time should still protect us, but the chances of finding a
valid forgery are now restricted ONLY by search time.
In short, normalizing prior to digesting AVOIDS allowing
inconsequential changes to change the digest. If I have misunderstood the
point of the section cited, I'm sure someone will correct me.
Tom Gindin
"Joseph M. Reagle Jr." <reagle@w3.org> on 07/07/2000 05:58:35 PM
To: Tom Gindin/Watson/IBM@IBMUS
cc: "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John
Boyer" <jboyer@PureEdge.com>
Subject: Re: Followup on I18N Last Call comments and disposition
At 10:52 2000-06-29 -0400, tgindin@us.ibm.com wrote:
>Well, it probably isn't even correct to call this a "Birthday Attack,"
I'm
>hoping someone else jumps in and tweaks the text, but I think the gist of
>what you are after is there.
>
>[Tom Gindin] The wording of section 8.1.3 is somewhat unfortunate
already.
>While it is true that transforms appear to increase the number of
documents
>which map to the same digest, that number is already literally
>astronomical. For SHA-1, for example, the number of documents of length
N
>octets in UTF-8 which map to a given digest is 256**(N-20) or
>2**(8*(N-20)). Larger hash algorithms may increase the number 20
somewhat,
>but a 200 octet message restricted to printable ASCII would still exceed
>2**1000. Not normalizing before digesting is what allows inconsequential
>changes to affect the digest.
I've tweaked the text slightly in the forthcoming draft, if anyone want to
suggest alternative text in future versions, please propose it:
8.1.3 Transforms Can Aid Collision Attacks
In addition to the semantic concerns of transforms removing or including
data from a source document prior to signing, there is potential for
syntactical collision attacks. For instance, consider a signature which
includes a transform that changes the character normalization of the source
document to Normalized Form C [NFC]. This transform increases the number of
documents that when transformed and digested yield the same hash value.
Consequently, an attacker could include a subsantive syntactical and
semantic change to the document by varying other inconsequential
syntactical
values that are normalized prior to digesting such that the tampered
signature document is considered valid. Consequently, while we RECOMMEND
all
documents operated upon and generated by signature applications be in [NFC]
(otherwise intermediate processors might unintentionally break the
signature) encoding normalizations SHOULD NOT be done as part of a
signature
transform.