On Sep 13, 2010, at 5:54 AM, Henri Sivonen wrote:
> Currently, the way formatting elements is handled allows a site to cause the parser output to contain a dramatically larger number of elements than there are start tags. This isn't a problem most of the time. However, there are bad cases out there on the Web and it would be nice to make them not hang browsers and maybe do so in a consistent way across implementations.
>
> So far, people have expressed the most concern about the adoption agency algorithm, because it has two nested loops, which suggests the time complexity can skyrocket. I have not come across Web pages where the AAA is a problem. However, WebKit's (new) implementation limits both loops to 10 iterations (that is, the inner loop is limited to 10 iterations per each entry to the loop). I think it would be useful to run an instrumented parser across a large dataset of Web pages to find out how many iterations of the outer loop and how many of the inner loop are typical and then maybe even standardize iteration limits for the loops.
Here's a test case (not a live link but the source) that caused lengthy hangs in the older WebKit HTML parser:
http://trac.webkit.org/browser/trunk/LayoutTests/fast/parser/residual-style-hang.html
It's not real content, but is based on real content (from email messages).
I believe it will also cause unreasonable slowness in the spec's flavor of AAA if there are no iteration limits. Note that it does not require a very high nesting depth to exhibit unacceptibly slow O(N^2) behavior.
Regards,
Maciej