Exploring HTML Entities Again

Right now I am knee deep in HTML 5, CSS 3 and all related topics to support the new edition of HTML: The Complete Reference. Usually I don’t share too much about the sausage making that goes on during one of these massive efforts, but after hearing a few comments from students I figure maybe I should share a bit more as I go along this time, so every once in a while a sneet peek so here is one – entities.

So we should assume case sensitivity, but what is going on, which entities are case sensitive and which aren’t? Well roughly the pre-HTML 4 are not likely case sensitive while the post-HTML 4 entities are. HTML 5 tries to formalize this whole mess and documents that &AMP; &COPY; &LT; &GT; &QUOT; &REG; and &TRADE; can be written either way but nothing else. Roughly save the trademark these are as I said the pre-HTML 4 named entities and even that could be argued since it used to sit inappropriately in a charset no-mans land in ASCII 127-159.

Victory – a small detail knocked down for HTML 5 and an important syntax point that a quick perusal of HTML books (mine included) has gone completely unnoticed for a decade.

Entity Parse Problems

So when we mess-up markup like forget a tag or don’t close a quote we see the browser parser “fixing” things for us, albeit sometimes wrongly. Well it turns out that this also holds for entities. For example, given this &QUOTE; entity which is clearly a typo for &QUOT; it will render in one browser as it gets fixed and not in most others. No bonus points for guessing – yes IE blows it.

IE fixing entities for me?

Interesting they may actually be trying to be somewhat correct in their automatic insertion of the trailing ; in the entity. If you read the specification there is some suggestion that problems be rectified. However, the decision of how to fix an error in a predictable and consistent manner, well that isn’t agreed upon. [ Take a peek at the HTML 5 spec chaos for more info] On that note a very big change with HTML 5 will actually be to indicate what should happen in case of syntax errors. I guess if you can’t beat syntax into the heads of the masses you might as well codify how their nasty “tag soup” should taste.

Explorer Fully Gets Entities-Finally!

A little known detail that has alluded many Web developers is that up until IE8 many entities actually didn’t work. We learned this actually in the first edition of the HTML book over 10 years ago when we actually bothered to test the entities rather than just copying pasting the chart onto our site or book. Every edition was the same, a few changes here and there but a bunch of nasty boxes in places of the appropriate symbol’s under the most popular browser. Well it is time to report that things are a bit better now.

Finally Internet Explorer supports all the common entities

It Ain’t Over Yet

Year after year I seem to meet Web developers and students who think that just over the hill the green grass and blue skies of standards land awaits. Well kind readers such optimistic thinking may help you sleep at night but after doing this for quite some time it is clear to me that even with specs there are simply implementation mistakes and well market forces that will make this unlikely any time soon–if ever. Need some proof? Well even in entities which are much improved in browsers lots of little quirks exist, especially when we consider that Unicode is and has been here.

How about a unicode entity? Do those work? Maybe sort of, well not really, gotta go numeric otherwise it just spells out.

The fun continues with entities in Unicode!

Even then once you insert the entities you might wonder what they are going to look like. Consider the friendly snowman dingbat here in a variety of browsers.

The Unicode snowman revealing the differences in browser entities

Happy or sad, with hat or not, buttons, or snow our little dingbat shows that making little details the same across all browsers is about as likely as building a June snowman in San Diego. Not to spoil the ending of the book, but every chapter and appendix keeps showing the more things change, the more they really do stay the same.

This entry was posted
on Tuesday, June 2nd, 2009 at 3:24 pm and is filed under Technical.
You can follow any responses to this entry through the RSS 2.0 feed.