On Aug 12, 2009, at 22:55, Ian Hickson wrote:
> On Wed, 12 Aug 2009, Henri Sivonen wrote:
>> On Aug 12, 2009, at 12:10, Henri Sivonen wrote:
>>
>>> I think I'll create a wiki page with requirements and a proposed
>>> delta
>>> spec first, though, because others on #whatwg were interested in
>>> pondering alternative solutions given a set of requirements.
>>
>> Wiki page created: http://wiki.whatwg.org/wiki/CDATA_Escapes
>
> Wow. Please can we stick to just the current magic escapes and not add
> even more magic?
The current magic without all the magic that current browsers
implement lead to some incompatibilities with existing content. I
don't know how often a user would hit these issues, but when the
problems do occur, they wreck the whole page. Therefore, I think we
should seriously try to improve the magic so that it substitutes the
current browser magic better in practice while still not doing
reparsing.
Here are points that need research, in my opinion:
1) Would removing the escape flag from xmp, title and textarea
improve or degrade Web compat given no reparsing? To research this, I
suggest parsing a substantial body of Web content with the current
parsing algorithm and then grepping the text content of every xmp
element for |<!--.*</xmp| (ignoring case and letting . match over line
breaks). (Likewise for textarea and title, except rejecting hits where
any part of "<!--" or "</title" has been entity-escaped.) Basically,
if there are almost no hits, it would be safer to zap the escape flag
from these elements, because accidentally having <!-- eat up the rest
of the page is worse than terminating one of these element prematurely
very rarely.
2) Would making comments and escape runs close on --\s+!> improve or
degrade Web compat given no reparsing? To research this, I suggest
grepping |--\s+!>| a substantial body of Web content and analyzing the
hits.
3) Would making --!> and --\s+> close escapes improve or degrade Web
compat given no reparsing? To research this, I suggest parsing a
substantial body of Web content with the current parsing algorithm and
then grepping the text content of every script and style element for
|--!>| and |--\s+>| and analyzing the hits.
4) Would making <!-- not open an espace when there's non-whitespace
on the line before it improve or degrade Web compat given no
reparsing? To research this, I suggest parsing a substantial body of
Web content with the current parsing algorithm and then grepping the
text content of every script and style element for |^.*\S.*<!--| and
analyzing the hits.
Hixie, have you already run these analyses? If not, it would be
awesome if someone who already maintains the capability to run these
searches could run them. (I volunteer to perform the "analyze the
hits" parts, but I don't currently have the readiness to run the
searches.)
--
Henri Sivonen
hsivonen@iki.fihttp://hsivonen.iki.fi/