Improving the CSS 2.1 strict parser for IE 7

We’ve already started talking about a few of the CSS changes that are going to be available in IE 7 when we release, but there are a few hanging points that we haven’t talked about yet or haven’t covered completely. There are 3 specific items I’d like to talk about:

To be very clear the root node selector was a bug. This was introduced by Chris Wilson back in IE 4 which is why we don’t let him work with the code anymore.

The root node selector has long been used to create rules that only work in IE. The general pattern would be to write rules that would match in all browsers first. Then use the child selector logic ( > ) to create more specific rules that would match only in browsers that supported these selectors. Prior to IE 7 we didn’t support these selectors so they were transparent to us. Finally, you would apply rules that would only match in IE by using the * HTML pattern to match the root node. Since no other browser supports or contains a node in the DOM located above the HTML node these patterns wouldn’t match.

So what happens when we start to match child selectors in IE 7? Well, it makes a big mess because we now match more rules not meant to be used from within IE and then those are merged with IE only rules. Because of the merge styles that were never meant to be combined end up changing the layout of the page away from what the author originally intended when they designed their cross browser matrix and testing patterns.

The best fix here was to disable the root tag matching logic because it wasn’t supposed to work according to the standards. We only do this in strict mode, since the new selectors only match in strict mode, and so we now use the same sets of rules that other CSS 2.1 compliant browsers would use and we ignore the IE only rules resulting in a page that will lay out as designed by the author. There are a few issues with this when we differ in our interpretation of the spec from other browser implementations but most of the time these are minor and have easy workarounds.

When the class selector support was originally written it was drawn up based on the CSS 1 spec which only supported a single class selector in each simple selector. We wound up keeping this behavior even after implementing portions of later versions of CSS. The end result is that we always threw out the extra classes in the selector and only kept the last one in the list and matched based on that.

Well, some sites use multi-class selectors so when we looked at doing CSS 2.1 selectors work it was pretty easy to upgrade our class selectors to allow more than one class to be applied. When in strict mode we now obey all specified class selectors per simple selector. While you won’t often use the feature it can be used for some interesting applications. One of my original test cases would use combinations of red, yellow, and blue classes to paint elements based on their combined color. A selector such as .red.yellow would paint and element orange if it contained both of these classes in a space separated list within the class attribute. Any elements not matching at least both of these wouldn’t match and so you can more accurately apply hierarchical styles.

We applied a very, very strict interpretation of pseudo elements in our parser and this would cause certain constructs to get thrown out. Basically we asserted that any pseudo-element had to be the very last thing in the selector. The spec only really mentions that there can be only one pseudo-element per selector and it must appear in the last simple-selector within the selector. Because of our strict interpretation if we saw any non-whitespace character or token after we just processed a pseudo-element we’d throw an error flag into the rule. This gave us the following behavior:

The parser is much more intelligent about when and how it applies the error flag in IE 7 and the two failures you see will now succeed. Truly invalid rules will still fail and you have to be careful not to apply multiple pseudo-elements or apply pseudo-elements that are in simple selectors that are in the beginning of a complex selector.

These small issues that made writing web pages according to spec a trial-and-error situation have been fixed for IE7. By improving the parsing logic it becomes more obvious how your selectors should be written and existing W3C documentation can be used to quickly come up to speed. It should be easy to introduce interesting layouts and formatting in your web pages without having to specify custom rules for each browser and hopefully IE 7 comes one step closer to making that a reality.

There’s been a massive improvement in the quality of this weblog recently. Thanks guys.

Like Olly, I don’t like the fact that the * html hack has been removed. It just seems like compliance for the sake of it. Sites using hacks in Strict mode are going to have to be updated no matter what.

There is going to be a need for CSS hacks for Internet Explorer 7. The * html hack in conjunction with html>body was an easy one until you took away * html.

Conditional comments are an imperfect solution. They force you to either bloat the size of the HTML documents, or use an additional external resource. CSS hacks don’t, making them more efficient in terms of requests to the server, compression, and caching. Yeah, it’s only a few K here, a few K there, but it all adds up, both in terms of load on the server, and in terms of slowing down your website, especially for mobile/dialup users.

In particular, the problem you cite could be fixed by the page author by simply moving the Internet Explorer 6 only rules to above the compliant browser rules in the stylesheet. I’ve always coded this way, I don’t agree with your assumption that the other way around is typical.

I’m sure some people will complain about the removal of the * html hack, but it is just that, a hack. Its a *hack*! Browser makers fixing these *bugs* is precisly why many of us have been warning against the use of parsing hacks for some time now. Especially when there are cleaner methods of targeting a specific browser that allow the preservation of compliance.

I, for one, am happy to see it go, and will be extremely pleased to find less of this nonsense polluting the stylesheets that I have to maintain and review.

No matter what, IE7 is going to be a different browser. It’s going to require extensive testing and planning regardless of which parsing hacks continue on. With any luck, it’ll be easy to switch off most IE6 hacks, and we’ll see the web move forward with a more standards-compliant IE.

Great work so far. I’ve been pro removal of the * html hack as well, since I think it’s not going to be necessary anymore. If you need to cater to IE only, use conditional comments instead. There’s not that much of a difference anyway.

As for the relaxing on pseudo elements, I don’t think it can hurt what you’ve done with it. We all get annoyed by a missing space or a wrong order of elements.

I really think IE is going in the right direction now. I’ve cursed a lot at it in the past years for things not working correctly, but it finally seems IE7 is going to be a huge improvement in terms of standards. Truly, good job.

theoretically, if the finalized rendering engine is every bit as good as firefox & safari… we should be able to leave our current style sheets containing * html in place and not have a thing to worry about in IE 7.

Besides, * html will still be needed for a few more years until the majority of windows machines no longer use IE6.

As someone else said, this is a sign of commitment. From here on out it’s either get the rendering engine perfect or go back on their word and continue letting * html do what we’ve used it for even in IE7 standards mode.

"One of my original test cases would use combinations of red, yellow, and blue classes to paint elements based on their combined color. A selector such as .red.yellow would paint and element orange if it contained both of these classes in a space separated list within the class attribute."

*combined* color?!?! while "cool", that doesn’t sound like a correct way to handle color assignments in multiple class names. would that only work for the color properties? how would it work for other props… like margin?

i’ve used multiple classes on elements to accomplish something like this: suppose there are a variety of elements on a page, each with a different class. each class specifies a certain kind of layout (color, dimensions, etc.). another class, "Inverse", can be added as a secondary class to an element to switch the colors (to black on white instead of white on black). to accomplish this, the color and background-color properties are overridden in the Inverse class definition. in this scenario, i would most certainly not want (nor expect) both the color and background-color to turn 50% grey!

<blockquote>*combined* color?!?! while "cool", that doesn’t sound like a correct way to handle color assignments in multiple class names. would that only work for the color properties? how would it work for other props… like margin?</blockquote>

As for the comment about letting me touch the code, I told Justin I was responsible for that one, and I saw his post before it went out. He’s done enough great work on our CSS support for IE7 that I’m going to be the last person to smack him for harshing on my six-year-old code. 🙂

[To be very clear the root node selector was a bug. This was introduced by Chris Wilson back in IE 4 which is why we don’t let him work with the code anymore.]

That is not polite to write it. Even if he is guilty for that bug, nobody else tried to fix it since then. So forget the past and work as a team.

But with all the information you provide about IE7 I’m very much looking forward to the final version. It’s going to be a great come-back for IE and it’ll be even better in Vista. Great work, can’t wait to get it!

You’re doing it guys, I’m actually getting entusiastic on this bug free IE coming up. [even thow this should have been done some time ago, I’ve jsut spent the last 2 or 3 years pestering about this particular piece of software and it’s mother company] [not to forget also that we’ll still have to code for the older versions for quite some time now]. But yes, you are going the right way, as shows your involvement with WaSP.

There’s an article on the IE Blog about the CSS parser bugs they’ve fixed for IE 7. It’s interesting that they are having to choose which parser bugs to fix, and in which modes, because people rely on those bugs as ways to &quot;detect&quot; particular CSS problems which only occurred in IE 6. Worse, there isn’t generally a 1:1 match between parser bug and CSS bug (although there are some objections, such as the Box Model Hack). Why is taking advantage of particular parser bugs any better than conditional comments, or user agent sniffing, which has long been considered harmful?…

First, thanks for sharing this explanation and spotlighting these three items. It’s important stuff to be aware of, no question. Keep it up! Working across IE7 and IE6 might not be the easiest nut to crack, but we can all see the forward motion and that is terrific.

Second, good catch there Ian!

Finally, that Chris Wilson reference was a riot, and I trust there are no hard feelings there in the slightest. (For context: Chris is "the lead program manager for the web platform in IE" … and "NOT Chris Wilson the drummer for Good Charlotte." Recall that Chris also made the first truly <a href="http://blogs.msdn.com/ie/archive/2005/07/29/445242.aspx">action-packed CSS post</a> on ieblog – which Justin even goes so far as to link back to at the start of this post. Let’s hear it for conceptual continuity!)

While I’m pleased you’re adding more standards support for CSS and fixing some selector bugs, please don’t introduce more. That one is *invalid* and *MUST* fail. DO NOT "fix" it in IE7!!!

Pseudo-elements are only allowed to be appended to the end of the last simple selector in the chain, and *must not* be followed by any other attribute selectors, ID selectors, or pseudo-classes. This is true for both CSS2.1 and CSS3 Selectors.

And once again a great thank you to the IE team, intelligent and interresting writing on what goes under the hood for us to know, and insight of the way you guys work.

You Go Guys, thanks to you the web may soon become a much better works for web devs to live in, and much more interresting from the client side of the rendering engine. You rock.

> does this "test case" confuse and/or surprise anyone else?

It doesn’t, it’s just a test for heck’s sake, what’s simple than checking if <span class="red yellow"> is as orange as it should be?

It’s a *test case*, it’s not supposed to be semantic or anything, it’s supposed to be… well… testing

> If these are strict-mode-only fixes, what happens if the user isn’t in strict-mode?

IE7 will behave as IE6-quirks I guess, since it allows the engine to not break non-standard websites and hacks for that behaviour.

> That is not polite to write it. Even if he is guilty for that bug, nobody else tried to fix it since then. So forget the past and work as a team.

You should turn on your Joke Parsing engine or something, because you obviously missed The Funny

> I wonder exactly what "strict" mode is.

Quirks and Strict modes were introduced in the world by IE5/Mac, fathered by Tantek Celik. The goal was to allow for high-level standard compliance (in strict mode) while not breaking the "legacy" behaviour of the IE4/NS4 ways (Quirks mode), it’s done via Doctype Switching. The right doctype will trigger Strict mode, a wrong one or none will trigger Quirks, and some browsers have an "almost strict" mode that doesn’t do much but keep image in block mode (images should be displayed inline, not as block, as default behaviour in strict)

> It seems better to make these fixes applied in all XHTML DTDs. (including "transitional")

no. What they chose is perfect, either you know enough as to have knowledge of Quirks/Strict and will realise what they mean and use them well, or you don’t and your page won’t get fubared by unexpected (although standards compliant) behaviour.

As a final note/question to the IE Team: will the XML prolog still trigger Quirks mode in MSIE or has that (fairly stupid me think) rule been dropped?

I really appreciate the posts by the IE team in letting us know what changes are being made! I must say I am happy and agree with whats being said and I think IE7 will bring IE in general very close if not completely up to date.

daybreaker…

All browsers render HTML in either Quirks or Standards mode.

Use this to detect which rendering mode your browser is rendering your page in.

If your page is not rendering in Standards Mode then you DEFINITLY have to use an HTML validator to help you clean up your code (because you DO directly edit your code and not use some visual editor right? ;-)).

Quirks and Standards mode have no relation to HTML transitional and strict.

HTML 4.1 Transitional

HTML 4.1 Strict

XHTML 1.0 Transitional

XHTML 1.0 Strict

XHTML 1.1

So in short if they are talking about strict (or transitional) mode, they are referring to a specific level of (x)html. If they are speaking about quirks or standards they are speaking about how the browser complies to compliant code or how it handles non-compliant code.

Non-compliant code is allowed to be subjectively rendered by the browser when in quirks mode. IE does a dam good job in this respect.

Ultimately you will want to at least aim for HTML 4.1 Strict or XHTML 1.0 Strict rendering in Standards mode.

I have to have faith that the IE team are going to all of the usual places to test the code (positioniseverything, quirksmode etc.).

If you have eliminated all of the major IE6 bugs and you get the same results as other browsers in the bug tests then I’m happy to know that the "* html" bug has been taken out.

As for CSS 2.1 support, bravo, I’m well happy about this. Web development just gets a whole lot quicker & cheaper when you can use pseudo-elements, multi-class selectors and such and it can also contribute to cutting down on html/css document weight reducing server load and making things cheaper again.

I do have one reservation though and it is this: How soon after IE7 is publicly released will it overtake IE6 as the dominant web leader? I’m sure I’m not the only one who would like to see it triumphed/marketed by Microsoft to get as many people as possible to upgrade in the least amount of time.

Here are some suggestions;

– Adverts, lots of adverts! (I’m amazed at how many adverts I still see for XP, despite it being released four years ago, transfer some of that marketing money over to IE7)

Will you be supporting the CSS3 convention of using two colons for pseudo-elements, not just the CSS2.1, single colon version? eg. ::before, ::after, ::first-line, etc. Other browsers already support this syntax, and it’s a very good habbit for authors to get into, as it clearly distinguishes between pseudo-elements and pseudo-classes.

Justin, a last question on these "bang-your-head-on-the-desk bugs" [Chris Wilson, July 29]:

a {display: block;}

p#nonexistantid a:first-line {color: yellow;}

p#u a:hover {color: red;}

<p id="u">

<a href="#">Can you tell me please<br /> why only the first line<br /> does react on hover?</a>

</p>

The day all these bugs are fixed and gone (and no new issues are constructed, see the warnings in comments by Ian Hickson and Lachlan Hunt), I don’t think I’ll look back in melancholy, but I must say that I got used to them, somewhat, because of the "hacks" that were /so/ surprisingly intuitive:

a:hover img {border: 1px solid fuchsia; }

/* uncomment to fix it */

/* a:hover {background-position: 0 0; }*/

<p><a href="#"><img src="#" alt="hover" /></a></p>

If IE joins the other browsers which actually allow for designing layouts that "would work ‘out of the box’", that would be a bright day.

More time left for serious things like gardening, less time lost on the absurdity of fixing browser bugs.

Guys get over the Chris Wilson comment. Chris already posted a comment. They are obviously friends and no foul was made.

Thanks for the update, it sounds like things are moving along nicely. I have no time or intrest in beta testing Vista or IE7 but I am sure by the time the final version is released, IE will be a dream to work with.

The biggest question that I have for the IE development team is.. <b>WILL any policies be made regarding the release of code fixes and updates that will let you guys continue to improve the software and add updates more often then the operating system is updated?</b>

I am sure much is fixed and more will be fixed but there are certainly things we would like to see supported immediately after the IE7 release. CSS/W3C/Standards are not perfect, there are still many improvements that need to be made. So if we can have future standards supported it would make like easier the day after tomorrow.

Whilst I can’t see myself switching away from Firefox any time in the near to medium future, changes like this encourage me to think that IE7 will, at the very least, restore its reputation as a credible modern browser.

I still don’t see any justification for retaining this bug even in non-strict mode. Surely point is that the bug is only used to work around bugs which will in IE7 now not exist, so there should be no justification for retaining it in any mode, strict or otherwise.

Thanks for the info Justin. Removal of * html hack is the most correct thing to do. Anyone using this hack when better solution exists (cond. comments) deserves to waste some time to fix it. (Just as I did 1.5 year ago :)).

Okay, so can we have a 1px dotted line that really is a 1px dotted line now instead of a dashed one? Pretty please? 🙂

I’m really glad to hear about all these fixes. Sounds like you’re going to make a lot of people happy (and a lot of websites encourage people to upgrade to a version of IE that works as it should for a change!)

> Will you be supporting the CSS3 convention of using two colons for pseudo-elements

Lachlan, AFAIK IE supports this syntax just fine already. I have been using the double colon syntax for quite some time, well since I learned of it. I’m not aware of any bugs from the use of the double colon syntax. This working may be the result of a bug though, so it’d be great if the IE team could clarify this and say whether they intentionally have support for the double-colon syntax.

While not the most extensive, this is probably one of the clearest resources on ther subject.

> Why?

> I still don’t see any justification for retaining this bug even in non-strict mode.

> Surely point is that the bug is only used to work around bugs which will in IE7 now not exist, so there should be no justification for retaining it in any mode, strict or otherwise.

If you know how to code, really, you will produce pages in strict mode. Always. You therefore will be aware of these backward-compatibility breaking changes (because they ARE breaking backward compatibility that Microsoft wants to maintain up to IE4/IE5’s behaviour). If you don’t, you may be relying on IE’s quirks without even knowing it, without trying to work around IE’s bugs but merely thinking that it’s a good way to code. And IE7 would break your website. And from everywhere on the web would the outbreak… well… break… that IE7 breaks T3h W3bz0r, which the IE team can’t afford to happen.

This can be understood, and their decision to be as strict as possible in "strict mode" while still being lax in Quirks is a Wise One, for this was exactly the reason why Tantek and the IE5/Mac team created Quirks and Strict modes 6 years ago.

> It would be most desirable to still have this behavior occur with the exception of having the XML declaration.

> As XHTML is a subset of XML the XML must (if you do so choose to use it) appear BEFORE the XHTML Doctype.

Even though I do agree with you, XHTML sent as text/html is not considered as XML, not even parsed as XML, it’s parsed as if it were HTML4.01 in whatever rendering mode you set. No more. No less.

I’d still appreciate that IE stopped dropping to quirks when I use XML prologue, but the reason why it’s done that way I understand (basically, it never makes sense to send XML prologues to MSIE in it’s current state, or in any state ’till it handles MIME types in a stricter way, and understands XHTML as application/xhtml+xml at all.)

> > I still don’t see any justification for retaining this bug even in non-strict mode.

> > Surely point is that the bug is only used to work around bugs which will in IE7 now not exist, so there should be no justification for retaining it in any mode, strict or otherwise.

Masklinn replied:

> If you know how to code, really, you will produce pages in strict mode. Always. You therefore will be aware of these backward-compatibility breaking changes (because they ARE breaking backward compatibility that Microsoft wants to maintain up to IE4/IE5’s behaviour). If you don’t, you may be relying on IE’s quirks without even knowing it, without trying to work around IE’s bugs but merely thinking that it’s a good way to code.

But we’re talking here about removal of the * html hack. You would only use that "If you know how to code, really". No bog-standard editor or lame HTMLer isn’t going to know anything about * html.

In other words, the only sites using * html are ones which are using bugs to work round IE6- problems, which should be fixed.

Alberto – the same stylesheet can apply to multiple pages. Although an id must be unique to a page it doesn’t have to be unique to a set of pages. You could then have a p with id="id" on one page, and (for example) a div with id="id" on another, yet want to distinguish them in the stylesheet.

> But we’re talking here about removal of the * html hack. You would only use that "If you know how to code, really". No bog-standard editor or lame HTMLer isn’t going to know anything about * html.

Problem is that this is a thing you think, not a thing you know, and this is a thing the IE7 dev team has no knowledge of. You can’t know if there ain’t some guy out there using * html for his styling.

So the IE team took the secure stance of not modifying anything (but for clear bug fixings that shouldn’t "break" pages) from IE6’s quirk to IE7’s quirk. They chose the safe path of not taking any risk on that matter, and I do understand: they can’t afford to have people saying that IE7 broke teh intarweb.

Because otherwise it might break a site that’s made using CSS and relies on the hack but isn’t set up to render in strict mode. It only makes sense to remove the hack if the reasons that required its employment are removed too, and they are *not* removed in quirks mode. Breaking backwards compliance is not something the new IE should do.

> Any example where a selector such as p#id would achieve something that in no other was was possible or that a mere #id would allow?

Please make sure you incorporate the CSS Attribute Selector because when it comes to creating a style for a checkbox or a text field its impossible without adding a class to each, because they both use the same HTML tag (INPUT).

But if we had the CSS Attribute Selector we could just specify a style for type=checkbox etc…

"Any example where a selector such as p#id would achieve something that in no other was was possible or that a mere #id would allow?"

The multi-page explanation someone else gave is good (and might also apply to dynamic pages where different elements might exist at different times with the same ID for whatever reason), but what came to my mind first was that p#id is more specific than #id. In a big, complex style sheet, you need every tool you can get your hands on to assert control over rule application.

Removal of the * hack is spot on. With all known major CSS glitches removed, IE7 will instantly use the standard CSS instead of the hacked styles. If you have used * html carefully to only fix IE CSS bugs, this is a win/win situation. Conditional comments are the proper way to address specific IE’s.

I have missed news on DOM standard support, will we have addEventListener?

Removing the * html thingy is a good thing because it immediatly makes code just wokr for versions prior to IE7.

So if you used * html {} for IE only code, to fix its non-compliant behaviour OR if you used html>body {} to overwrite IE6-tailored code for compliant browsers you should be fine, ‘cuz IE7 will render as FF does now AND take over the same rules.

This actually saves me from waking up at night having bad dreams about the launch of IE 7.

Removing the * html thingy is a good thing because it immediatly makes code just wokr for versions prior to IE7.

So if you used * html {} for IE only code, to fix its non-compliant behaviour OR if you used html>body {} to overwrite IE6-tailored code for compliant browsers you should be fine, ‘cuz IE7 will render as FF does now AND take over the same rules.

This actually saves me from waking up at night having bad dreams about the launch of IE 7.

"… it wasn’t supposed to work according to the standards. We only do this in strict mode, … CSS 2.1 compliant … a page that will lay out as designed by the author. There are a few issues with this when we differ in our interpretation of the spec from other browser implementations but most of the time these are minor and have easy workarounds."

This passage bewilders me and seem confusing. You claim to be CSS 2.1 compliant rendering pages as "designed by the author", and then you go on saying that there are cases where you introduce different interpretations than the specification than CSS 2.1 compliant browsers ?

Was the sentence just badly written with regards to logic, or for the malignant heck of not being CSS 2.1 compliant no matter what ?

The internal joke about Chris being "on top in the project" and not coding was much easier to get …