This afternoon I was asked by a friend what I would recommend to an old designer who wants to learn more about web standards, CSS, XML, and XHTML.

This is a perfect example of when an email response is better posted here for a wider audience (and Google). So here’s my answer: this is a comprehensive, informal, and somewhat long-winded roadmap for anyone who has heard about web standards, thinks they might want web standards, but doesn’t know where to start.

Stop! Before you do anything, the most important thing you can do for your learning process is accept that a) it’s going to take time, and b) you will be frustrated along the way.

But you’re not alone. Plenty of us who have taken the plunge into standards went through the same, and there is a growing body of work devoted to helping make your life easier. The old-timers had to figure out the hard way all the tricks and techniques we now take for granted; lucky folks who came in later (myself included) can benefit from their sweat and tears.

In the end, when your skill using standard-based design eclipses your skill using old-school table-based methods, you’ll look back and marvel at how much more sense it makes to layout a page with CSS. Oh sure, the actual methods used might not make much sense (considering there are better options in the CSS2 spec than what we currently use) but we have a major browser manufacturer to blame for their lack of support.

Uh, I seem to have wandered. Sorry, that’s another thing you’ll have to get used to: unbridled hostility against Internet Explorer. It’ll take some experience to understand why, but you’ll know exactly what we mean in short order.

So. On to the practical information you can start using immediately. First, run and buy Designing With Web Standards. Don’t even think about it, just do it. Got it? Good, now don’t let it collect dust — read it. Everything I’m about to cover is detailed in DWWS. It’s equal parts manifesto (WHY would you want to do this?) and tutorial (HOW do you do this?). You need it.

Now. The first thing to do is get into an XHTML mindset. Whether you choose HTML 4.01 or XHTML 1.0 Strict (there are reasons to go with either; ignore them for now, and then ignore them even more, until you’re ready for some mind-numbing drudgery), it all begins with a DOCTYPE. Telling a browser which language your document is marked up with isn’t only a good idea, it prevents unwanted rendering glitches that will otherwise drive you crazy. Think of it like this: if I want to fly to Chicago, I have to tell my travel agent of choice where I want to go. Sure, it might be fun to take my chances and see where I end up, but it’s not exactly practical. Setting a DOCTYPE ensures I get to my destination of choice without unintended side-trips to Vienna.

Next goal: well-formed markup. This is pretty easy to grasp, actually. Always quote your attributes (ie. <a href="link">, and <input type="text" />). Don’t improperly nest tags; close them in the order you opened them. And if you open a tag, close it again; every tag, or element, requires an opening and a closing correspondent.

A quick diversion here: somewhere along the way, tags became ‘elements’. Same syntax, different theory. Call them what you will, the proper label is now element; maybe it always has been. I don’t know. No one ever explained this to me.

So anyway, each element needs to be closed properly. If you’re using HTML 4.01, this doesn’t include stand-alones like <br>, <hr>, and <input>. With XHTML, it does. There’s a hack-ish, but practical syntax that degrades gracefully in older browsers: just add a space and a slash to the end. <br> becomes <br />, for example.

Now there’s one more little confusing thing about XHTML attributes: they always need to have a value, even when that value doesn’t make sense. Example: <input type="radio" checked="checked" />. You can get away with just checked in HTML 4.01, but XHTML needs the redundancy.

Finally, XHTML requires all your code to be in lowercase. HTML doesn’t differentiate, but XHTML uses XML syntax, which is case-sensitive. And the case ends up being lower.

That’s it for markup! You’re done! Take a breather, grab a cerveza, and chill out for the afternoon. Because that’s only step number one.

Now that we’ve got you writing proper HTML/XHTML, try running it through the W3’s validator. If you’ve crossed your i’s and dotted your t’s (or in this case, quoted your attributes and nested your elements) you’ll achieve a nice yellow-on-blue success message. Learn to love this colour/text combination, it can be your best friend.

Why is validating important, or even relevant? Because poorly-written markup is completely unpredictable; you end up relying on a browser’s error-handling, and even though most browsers are very good about this, it’s a bad practice to rely on it. Hey, it’s what got us into the non-standard, proprietary browser wars in the first place. (Microsoft was able to compete with Netscape in the Netscape-dominated world of 1995 because IE was built to handle errors exactly the same as Netscape.)

The single point is, validating helps you spot errors in your code, which in turn ensures more consistent rendering of your page. The very first technique I try when debugging problematic layouts is validating my code. You should too.

Yeah, okay, it’s tough when you first validate your first site and get back a list of 78 arcane errors. Unfortunately, though the validator helps, it’s not perfect. It’s run by volunteers who are doing their best, but sometimes the error messages are as helpful as Vogon poetry. The good news is that problems cascade, and if you can find one missing </p> tag, for example, a lot of the time you bump that number down to 24 errors without doing anything else. The short of it: it looks bad, but it’s often not.

Now that you’re validating, you’re following the letter of the law. But as with many real-world laws, at this point you’re adhering to only the rigid guidelines of the rules and missing the bigger picture about why those guidelines are there in the first place.

The next step is to take this well-formed markup structure you’ve built for your document, strip out the presentational attributes that have been deprecated in some of the more recent DOCTYPES, and move the presentation into a completely separate file. This is the infamous separation of structure and presentation, and this is where CSS comes into the picture.

It’s like this: your text is content. Content is nice, but without any hints about the content’s structure (which includes things like spaces and headers and lists) you end up with a jumbled mess of text. Completely unusable. Structure is an extra layer which breaks down that messy text into logical groupings and organizes them in a way that conveys extra information about individual elements in that document. How that information looks may be implied by the structure (for example, you’ll most often find that a primary page heading will be larger than the body text) but it doesn’t dictate it any further.

That’s where presentation comes in. Presentation is the formatting cue that tells the primary page header to be red, italicized, and 150% of the body copy’s size. Presentation is an extra layer of information on top of a document’s structure, that builds up the (non-visual) structure into something far more appealing to the eye. CSS is the presentational layer, and it can take a very simpy marked up document, and turn it into something amazing — view the css Zen Garden for a live demonstration of this in action.

So what’s the best way to start separating your presentation out of your structure? Consider any HTML element or attribute that offers a visual cue to be an offending piece of legacy code. It’s time to start killing those bgcolors and <center> tags. Here’s a pop quiz:

In each of the following examples, which attributes and tags would you remove to eliminate all traces of presentational structure?

Got your answers ready? Great, compare to the list below. These are proper structural elements without a trace of structural formatting:

<h1>This is my first web site.</h1>

<table>

<body>

<td><p>They're coming to take me away...</p></td>

That’s it? That’s it.

While it’s not explicitly stated in any spec, a further goal of this separation is to use the proper elements for the job. Using a table to layout a page is then, by this definition, an improper use of a table. In the above example, it might have even been prudent to go further and remove the <table> and <td> elements, but it’s hard to say without surrounding context. Tables aren’t deprecated; they can still be quite useful. But they should be used properly — to contain data that’s structurally tabular.

So we’ve stripped the formatting from our page. Hooray. What now? Those are some ugly elements, all Times-New-Romaned and linear. Where’s the interest? Where’s the compelling visuals we were promised?

Head back to the Zen Garden example. See the lovely designs? See how different they are? The key here is that underneath each is the same XHTML, just as bland as your currently-unformatted document. No really, it is.

Having a bland and ugly base is a good thing, in fact. What you may have noticed is that this unformatted HTML looks an awful lot like the web of 1994 (if you were around back then). In fact, with a few notable exceptions, that’s what it is. This stuff is as old as the web itself. <h2>’s have been around since the days of Mosaic. Which, as you may have guessed by now, will still handle a properly-structured page with surprising fidelity. Try saying that about a late-90’s table-fest.

The benefits don’t end there of course. Accessibility to those with special needs is almost free, search engine optimization is built-in, bandwidth costs go way down (along with development times), and on and on and on. Jeffrey Veen wrote up the Business Value of Web Standards late last year, and Roger Johansson expanded on the benefits and techniques of standards-based design with his recent Developing With Web Standards.

CSS is well-supported across all the major browsers today, and there are countless resources for learning the syntax, the basics of CSS layout, and the advanced theory behind high-impact CSS-based design. I’ll point you to a few of the better ones: WestCiv offers an ongoing, free CSS course that will help you get started and bring you up to speed in a hurry. Andrew Fernandez has indexed a huge listing of CSS resources that should help you no matter where your skill level is. Eric Meyer has written a bunch of books that you should have on your desk, including the project-based Eric Meyer on CSS and its follow-up More Eric Meyer on CSS (no, really, that’s what it’s called). His O’Reilly-published reference CSS: The Definitive Guide is in its second edition, and should be on your desk. Also check out Molly Holzschlag’s CSS: The Designer’s Edge, and Chris Schmitt’s Designing CSS Web Pages.

Going into the ins and outs of applying CSS and building layouts could take me many times the space I’ve already used up so far and some financial motivation. We’ll cut it short here, considering it’s taken a Herculean effort on your part to even get this far…

Getting plugged in is probably the single biggest piece of advice I can give anyone looking to get a start with web standards. Through ongoing reading and sharing of what you know, we all grow as a community. There are many of us active in the development community, and there are bound to be many more times coming on board over the next few years. We have a global communication network at our disposal; make sure to use it.

And if nothing else, we’re here to commiserate when you hit the wall. Hey, it happens to all of us.

Reader Comments

I’ve started designing with XHTML & CSS for about a half year now. Everyone told me XHTML and CSS was so great I would never turn back to tables. I never believed them because tables were great but now I love CSS and havn’t used a table for layout in months. Once past the frustrating first months, it gets real easy.

Great article. I’d just add a couple more resources that have been invaluable in getting up to speed and staying current with all of the recent developments.

1. Get yourself a good RSS aggregator (FeedDemon, Shrook, etc.) and start subscribing to all of the various blogs your likely to encounter during your research. I’ve found this to be the single best way to stay current with the many developments that are taking place in the standards community.

2. Take a look at O’Reilly and Associates Safari service. Many of the books mentioned in Dave’s article are available online. It’s a great way to affordably sample a large number of books on the various technologies you’ll be researching.

I must say well written article and it contains lots of info. That is exactly what I was looking for. I too want to take the plunge into web standards design and so this was definitely a worthwile read and helpful. Thank you very much.

“A quick diversion here: somewhere along the way, tags became ‘elements’. Same syntax, different theory. Call them what you will, the proper label is now element; maybe it always has been. I don’t know. No one ever explained this to me.”

…am i the only one who is concerned about this? the proper way to refer to things just seems to change every now and then, and even though no one seems to know why, or even how, they are quick to make sure everyone else does it.

there almost seems to be a subtle strain leaning towards groupthink in the standards community which could prove to be a liability in the future, not to mention simply unfortunate for all the reasons that groupthink anywhere is unfortunate :)

It seems to me that the proper terms are coming into more widespread use as people actually start paying attention to the specifications. The past few years seem to have started a shift away from the “bung a load of code together, and if it works in Both Browsers™, sell it” attitude.

Thanks Dave, I’ve been trying to explain this sort of thing to people for ages, and have always made a bit of a mess of it. What with the Zen Garden, which has always been a fantastic resourse for showing what can be done with CSS, this has just made my life a whole lot easier!

The tags are the actual delimiters; it’s rarely useful to talk about them, people usually mean “element” or “element type” when they talk about “tags”.

Think of the relationship between “element type” and “element” being the same as the relationship between “class” and “instance”, and “tags” being the equivelent of curly braces.

XHTML attribute redundancy:

I believe you need things like ‘checked=”checked”’, because in HTML, a bare ‘checked’ is actually the attribute value, not the attribute name. It would make more sense to have something like ‘checked=”true”’, but that would void the HTML compatibility.

In HTML some elements (in some circumstances) may omit their closing tags.

In X(HT)ML, all elements must have closing tags, though empty elements can have abbreviated “short” tags: <p></p> is equivalent to <p />. Note that this abbreviation, is strictly speaking not legal HTML. In HTML, the closing tag is often optional (and hence can just be omitted), but it can’t be abbreviated as above.

“Whether you choose HTML 4.01 or XHTML 1.0 Strict (there are reasons to go with either; ignore them for now, and then ignore them even more, until you’re ready for some mind-numbing drudgery)”

Aside from the bit about short tags, your advice holds equally well for both HTML 4 and XHTML 1.0. But once you tell people to write <img /> instead of simply <img>, you’d better tell them to slap an XHTML DOCTYPE on that puppy.

The first is a clear, concise introductory explanation of the box model for CSS formatting rules. The CSS and XHTML specs do a woeful job here.

The second is a simple introduction in where the main gotchas are with default attribute values. For example, which element types are block and which are inline by default. What elements by default need to be contained within a <p> or <div> tag to be standards compliant without changing that the display attribute in the CSS for the element. What elements have attributes that you must clear in the CSS in order to get the desired result?

These two things created issues that tripped me up the most when switching to standards compliance.

It should be noted that as of right now, XHTML should be avoided and you should stick to HTML 4.01 Strict unless you have a real need for XHTML (like MathML). XHTML doesn’t provide any advantages over HTML (as far as I’ve seen) besides aformentioned ability to include other XML namespaces in it.

In addition, XHTML should be sent as application/xhtml+xml, which Everyone’s Favorite Browser chokes on.

Re: XHTML vs HTML. According to http://www.w3.org/TR/xhtml-media-types/ XHTML 1.0 MAY be served as text/html, XHTML 1.1 SHOULD NOT, unless you have a very good reason. It isn’t MUST NOT, which means something quite different.

I serve my XHTML 1.1 as text/html. My very good reason is that Everybody’s Favourite Browser would otherwise choke on it :-)

I prefer XHTML 1.1 as it is the most strict flavour - which helps to keep me on the straight and narrow! When I was converting to a CSS layout, using XHTML 1.1 helped me pick out most of the layout stuff in the HTML so I could put it where it belonged - in the style sheet.

“I purposely glossed over the MIME type issue because it’s completely moot in 2004. This isn’t the time or place to argue the merits of HTML over XHTML.”

It’s not moot, it’s simply irrelevant to what *most* people want to do with (X)HTML. Those who care about the *technical* benefits of using XHTML care about MIME-type issues.

For the other 99% of the web-authoring population, MIME-types are irrelevant and only reason to prefer XHTML over HTML is that the former has a simpler syntax, presumably making it easier to learn to author *correctly*.

thanks for this. I was about to write a similar article with a slightly different audience in mind. In fact, reading your article, I think it’s suitable for both the table-jockey and the web virgin, so I probably won’t bother! I’ll paste the foreword here though, so you can see what I had in mind. Please don’t inerpret any of this as a dig at your fantastic roadmap, which I hadn’t read when I wrote this! I’d be interested in any comments.

Preface.

Imagine this is a school textbook. If you’re the kind of person who didn’t read prefaces in textbooks, don’t read this.

This article is written for an audience free of the emotional baggage of table-jockeys and the histrionics of gurus; a virginal audience that just wants to learn how to design modern web sites. This is me, one year ago.

It neglects to mention that there was ever a way of doing things other than the accessible, usable, semantically-meaningful, valid way striven for today. There are no snide references to broken browsers (except when the extant version is still broken). It doesn’t impeach the reader for sins they never committed.

One year ago, all these things elicited from me were, “Huh?”

It avoids these history lessons to provide a well-marked path of least resistance to modern web design for the untainted but computer-literate. If you initially take this path but wander off into the pretty woodland and dangerous swamps then your mileage may vary. But if, like me, you get off on acquiring useless knowledge then you may have more fun.

For me, one year ago, the many references to not doing it like we used to, or to educating clients stuck in 1997, or to doing the Right Thing were strictly *unnecessary*. It is *possible* to teach the Only Thing.

As an example, mentioning that tables were ever used for layout to someone who didn’t previously know this serves three purposes, one good, and two bad:

- It’s a useful counter-example when explaining semantic design
- It buries a seed that table layout is quick and easy (which is the oft-cited response of the Platonic antagonist the writer is attempting to convert).
- It takes up time that is initially better spent learning the basics.

I’m not suggesting the bad old days should be forgotten or hidden. Security through obscurity does not work. I’m not saying there isn’t a very large audience that needs and wants to be *re*-educated.

I’m merely trying to provide a quick and easy route to our way of doing things for web virgins, lest they do it another way, or perhaps even worse, they don’t bother at all.

So, let me expand on my other point: XHTML is easier for people to learn than HTML.

Consider the following three elements:
a) <img>
b) <img></img>
c) <img />
Which is valid HTML? Which is valid XHTML? And how can you remember the answer?

In XHTML, the rule is simple: every element has a start tag and an end tag. Empty element can be abbreviated with short tags. So b) and c) are valid XHTML, but a) is invalid.

In HTML, some elements have mandatory start and end tags. Some elements (in some circumstances) have optional start/end tags. For some elements, the end tag is actually *forbidden*.

The image tag is one of the latter. So a) is valid HTML, and b) is invalid. [c) is never valid HTML. That one, at least, is easy.]

I’ve made this sound a bit worse than it actually is. The elements for which the end tag is FORBIDDEN are, to the best of my knowledge, just those elements declared EMPTY in the HTML Spec.

But, for God’s sake, poring over the Spec should not be a prerequisite for learning (X)HTML. The syntax of (X)HTML is complicated enough [The <blockquote> element contains block-level content. The <q> element contains inline content. Geez! Which elements are block-level? What’s inline content? …] Complicated rules for start and end tags don’t help matters.

That was a great intro, and those articles that you linked to will be a great help to sell web standards to others. I’ve been working with web standards for a couple months now, and it can be frustrating, but once you get the hang of it, and get used to closing tags, quoting attributes etc… it comes pretty quick and easily.

For people who think it is too hard, or takes longer to code… it really isn’t once you get the hang of it… plus it looks so much prettier when you view the source of your page and it is all nicely formed as opposed to the blob of text that tables and dynamic sites can generate.

Oh yeah, the other thing that is great about using web standards, is that you get used to not using browser-specific tags, and it is much much easier to create cross-browser compatible sites…. now IE is the real pain in the ***…. it would be nice if Microsoft added some standards support to IE 6 with XP SP2…. but it probably isn’t too likely… so I think we’ll be dealing with it for a while to come.

The XML flavor of HTML has many benefits over SGML-ized HTML. I’m currently writing some JavaScripts* which fix IE’s CSS handling to some extent… but it’s necessary to use some XML features in the script**. That’s impossible, by definition, in SGML.

Furthermore, XHTML is more portable beyond the web because it is “just XML”: XSL (FOoT) style sheets can used to make PDF, Docbook, or other XHTML web pages. Several of the high-quality and cheaper (than SGML) XML editors can be used to edit and store those documents. And there are no document structure ambiguities (unlike HTML which could have no html, head, or closing tags among others), which makes XHTML easier to parse in the future (when you have to deal with legacy document in XHTML 1.0, and transform them to XHTML 10.01A, etcetera).

This is a tremendous article, and addresses some of the issues that I’ve been debating on a recent project of mine (http://www.practicalcss.com) in which I’m trying to work out a basic standards-compliant “template” of sorts.

Standards should be fundamental to web development. I’ve been heartened by the noise made in the past couple of years about standards-compliant design. Generally no longer seen as an obscure arguing-point by fringe elitists, standards are becoming an accepted part of the web development scene. CSS-based design has achieved a firm and permanent position alongside table- and graphic-based design – and one is slowly outpacing the other.

“but it’s necessary to use some XML features in the script**. That’s impossible, by definition, in SGML.”

I’m not sure what you mean, as XML is a strict subset of SGML.

Furthermore, unless you serve your document with an XML MIME-Type (application/xhtml+xml), then it will be processed by the browser’s SGML parser in *all* current browsers.

“Furthermore, XHTML is more portable beyond the web because it is ‘just XML’: XSL (FOoT) style sheets can used to make PDF, Docbook, or other XHTML web pages. Several of the high-quality and cheaper (than SGML) XML editors can be used to edit and store those documents.”

This is an imortant point. Just because it is *possible* to do everything with SGML that you can do with XML, doesn’t mean there will be off-the-shelf tools for doing so.

Just as the simpler syntax of XML makes it easier for humans to learn (IMO), it has also become more popular to write tools for.

Thus there are many more technologies and automated tools for manipulating XML documents than SGML documents. In the long run, that certainly makes it a better platform (unless you are willing to write your own tools, or pay through the nose for someone else to write them).

That’s a powerful argument, but not one that is relevant to the “99%” Dave was aiming at. As a rule, the people who care about namespaces and other advanced features of XHTML are the same ones who care about MIME-Types, etc. — the sort of esoterica that Dave explicitly wanted to sidestep.

“And there are no document structure ambiguities (unlike HTML which could have no html, head, or closing tags among others)…”

These are *not* document structure ambiguities. The rules of SGML syntax are more complicated than XML, but they are not ambiguous.

And what, the IE team is sequestered in a deep underground bunker with no access to the Internet?

I mean, the bugs in IE are well-known. Entire cottage industries are devoted to developing workarounds for IE’s broken CSS support, IE’s lack of support for PNG transparency, etc.

The IE team is *surely* already aware of the serious bugs in their browser. They’d have to be either brain-dead, or cut off from the Internet to be unaware.

I don’t doubt Scoble’s sincerity, but this is a little like those non-functioning thermostats they install in individual offices in large office buildings to give the occupants an illusory sense of control over their environment.

Personally, I can’t even get excited about the IE team “fixing” bugs. So what if they do? Is that going to change our coding habits? Probably not. IE6 is going to be the defacto browser for YEARS. The way I see it, IE7 will just be another browser that needs to be tested (although I don’t know *how* I will do that, since I don’t plan on upgrading…)

My assumption is that we better get used to the current browsers, because this is the way it is going to be for awhile. Firebird 3 and Opera 15 might be fantastic, but I still fear we will have to deal with IE6 in a decade. Heck, many organizations that should know better still have Netscape 4.7 installed.

Like many other posters (we’re all up on a wall somewhere?!), I recently discovered the thrill of standards compliance. Now I wonder, how can we get more web developers to abandon the non-compliant practices that are so prevalent?

Thanks for the summary, Dave. I am now adhering as best i can to standards, using xhtml 1.0 trans and an external stylesheet. my sites validate … for now. but i have only recently begun to see references to MIME types. can someone point me to a reference about what a MIME type is and why it is important? if i don’t consider / incorporate MIME types now, am i missing something? thanks in advance.

explains *why*. (The bottom line is that, if you serve XHTML as text/html, *even* if you validate everything, your site may still break horribly if you were to switch to application/xhtml+xml. The only way to ensure your site will work when served with the correct MIME type is to actually serve it that way.)

Right now, Gecko-based browsers (Mozilla, Firefox, …) and IE6 with the MathPlayer 2.0 plugin installed can handle application/xhtml+xml. Most everyone else should still get text/html.

Great article, I shall be pointing a few colleagues in this direction. I have one little comment to make, I thought this line in parapgraph 9 was a little confusing “close [tags] in the order you opened them”.

This should read “close [tags] in the REVERSE order you opened them”. If you closed your tags in the order you opened them, you may end with something like this:

*how can we get more web developers to abandon the non-compliant practices that are so prevalent?*

Good freakin’ question. What do we do with a web designer like my boss who says things like “Don’t reinvent the wheel”? You know what? If no one ever reinvented the wheel, we’d all be driving around in Flintstones cars.

Search this site:

About This Entry:

You are reading “A Roadmap to Standards”, an entry posted on 30 April, 2004, to the Porte collection. See other posts in this collection.