>> -----Original Message----->> From: bacchi raffaele [mailto:bacchi_raffaele@lycos.com]>> Sent: Monday, 2011 December 12 3:45>> To: xml-editor@w3.org>> Subject: XML grammar error?>> >> Hi,>> I think that rule [20] (and other similar) are wrong:>> CData ::= (Char* - (Char* ']]>' Char*))>> The purpose of the rule is to match (reduce) any Char sequence not>> containing ']]>'.>> But this result is not achieved since the Char definition includes ']'>> and '>' so the exception part of the rule:>> -(Char* ']]>' Char*)>> is ambiguous. Most parsers solve the ambiguity by applying the rule>> "reduce as soon, as much as possible">> thus the rule will always mismatch because the first Char* reduces also>> the sequence ']]>' and the next terminal ']]>' will never match.>>>There is no ambiguity here. A - B matches if A matches, provided B does>not also match what A matches. The regular expression (in conventional>notation) /^.*]]>.*$/ matches any string that contains at least one ']]>'.>It is ambiguous in the sense that if there are multiple tokens of ']]>'>in the string, different matchers will match ']]>' in the pattern against>the first or the last. But that makes no difference to the meaning of>the pattern.>>Specifically, a leftmost-longest matcher will first match the first>Char* against the whole string, then attempt to match ']' and fail.>It will then reduce the Char* by one character and try again to match>']'. Iff there is a ']]>' in the string, it will eventually be matched>as a result of the shortening of the first Char*; the second Char* will>then match whatever is left. If there is more than one, the rightmost>will be the one that matches.>>By way of contrast, a DFA matcher will match the leftmost occurrence >of ']]>'. But as stated, exactly which ']]>' is matched is irrelevant.>>>> I think the rule (and other similar) should be written:>> Cdata ::= ( Char - ']]>' )*>>This will not work since it says to match a single character which is >not a three-character sequence. No single character can be three>characters, so it will match every character.>>>Paul Grosso>for the XML Core WG

Hi, I think the term(Char* ']]>' Char*)would be ok for a nondeterministic parser that tries all sentences with plausible parse.However a deterministic parser (for example leftmost-longest) does not"...reduce the Char* by one character and try again..."because if the rule was:(xxx ']]>' xxx)it had to try all the possible different length of xxx (elsewhere defined)forcing the parser to be actually nondeterministic for xxx and all its nested rules.

On the contraryCdata ::= ( Char - ']]>' )*is correct. According to Extended BNF (ISO/IEC 14977 : 1996):"Syntactic-term...When a syntactic-term is a syntactic-factor followed byan except-symbol followed by a syntactic-exception itrepresents any sequence of symbols that satisfies both ofthe conditions:a) it is a sequence of symbols represented by the syntactic-factor,b) it is not a sequence of symbols represented by the syntactic-exception...."and there is no constraint to have syntactic-term same length of syntactic-exception.The parser can match a Char and mismatch the sequence ']]>' then reduce the matched Char (advance 1 char in the source) then repeat while the 2 conditions are satisfied.