Firefox’s Shiny New HTML5 Parser

Why does Firefox need a new HTML parser? A few reasons. First and foremost, the HTML5 spec is the first HTML specification to actually describe how user agents (read: browsers) should parse HTML for rendering. Previously, it had been up to browser makers to decide for themselves how to go about parsing an HTML document and turning it into a DOM tree for display. Firefox’s new parser is the first browser implementation of that specification. If you’re interested in more details and have an afternoon to kill, have a look at the WHATWG’s Parsing HTML page. If you don’t have that much time to spare, Sivonen gives you the CliffsNotes version:

The HTML5 parsing algorithm has two major parts: tokenization and tree building. Tokenization is the process of splitting the source stream into tags, text, comments, and attributes inside tags. The tree building phase takes the tags and the interleaving text and comments and builds the DOM tree. The tokenization part of the HTML5 parsing algorithm is closer to what Internet Explorer does than what Gecko used to do. Internet Explorer has had the majority market share for a while, so sites have generally been tested not to break when subjected to IE’s tokenizer. The tree building part is close to what WebKit does already. Of the major browser engines, WebKit had the most reasonable tree building solution prior to HTML5.

So, if you now grab a nightly build of Firefox, you’ll be browsing the Web using the shiny new parser. What does this mean for web developers? Surprisingly little. As Sivonen himself points out, the most important feature of the new parser is that you won’t notice any difference. Existing pages should render in much the same way, with a few marginal improvements: the parsing engine now runs in a different thread from the browser’s main UI, so there are apparently some performance gains there; innerHTML now runs about 20% faster; and a number of long-standing parser bugs in Firefox have been fixed as a side-effect of rewriting the parser from scratch.

Sivonen points out that there is, however, one notable feature of the new parser that might excite some developers: HTML5 documents can include inline SVG and MathML mixed directly into the HTML markup, and they’ll be rendered as graphics and mathematical characters respectively. He’s put together a test page to demo these new features here (you’ll need to be running a Firefox nightly build to view it correctly). Here’s what that page looks like:

Demo page featuring inline MathML and SVG

The code that goes into creating that example is satisfyingly simple. View the source of that page to see what I mean.

Of course, this is a long way from being usable in everyday websites, but it’s always fun to have a peek into the future of HTML and related technologies, as well as the future of our favorite browser. Thanks to Henri for giving us a peek behind the scenes of the great work going on at Mozilla!

Louis joined SitePoint in 2009 as a technical editor, and has since moved over into a web developer role at Flippa. He enjoys hip-hop, spicy food, and all things geeky.

jim

(1) Fix the printing. This is the time. This is the place.

(2) Fix the memory leaks. I typically run 4-10 sessions of FF each with a dozen or so tabs open. As I open / close sessions / tabs, the Mem Usage (WTM Processes tab) grows and grows. Around 1GB, FF will finally crash and send in a report. As an engineer / job seeker / web site developer, there are virtually no flash games or any of that type of nonsense involved; just working through various job sites, checking email on Yahoo! and Google, checking out latest CSS techniques, and seeing what my friends on FB are up to.

[ as any good teacher will tell you, repeated this for your own good :) ]