XHTML

What is XHTML, some of you may have heard the abbreviation and guess it has something to do with HTML and XML. That'd be a good guess. Below I hope to explain a little about what XHTML is, where it's come from, why it's a good thing, and how to use it when designing your web mark-up.

XHTML 1.0 is the W3C's first Recommendation for XHTML, following on from earlier work on HTML 4.01, HTML 4.0, HTML 3.2 and HTML 2.0. With a wealth of features, XHTML 1.0 is a reformulation of HTML 4.01 in XML, and combines the strength of HTML 4 with the power of XML

Case Of The X

HTML 4 as been around for a while, it is the basis of the modern web, however it is far from perfect and far from supported. Most people know about browser incompatibilities, but what most people do not realise is that HTML has a standard grammar; it is just that browsers do not comply.

Actually it is not the browser authors fault. To explain this we need first to look at a little history.

HTML History Lesson

SGML (Standard Generalized Markup Language) is a grammar used to mark-up documents for formatting, it was invented in the 1980's as a standard way of defining formats on electronic documents.

HTML is an SGML application. That is, it is a set of mark-up that conforms to the SGML grammar and tells people (or computers) how to format a document. SGML says an element must look like this, an attribute like this, etc.

As HTML evolved and the web became more popular, the SGML roots of HTML started to slip and browsers started allowing HTML authors get away with not conforming to the HTML standard. This was good for developers as it gives them an easier ride, however over time it adds inconsistencies between browsers and leaves us in the current mess of browser conformity.

How This Relates To XML

You've probably all heard of XML. It is a grammar like SGML, but it has been specifically designed for the Internet. It too defines how to structure documents in a rigid way, which makes it easy for programs (called XML parsers) to interpret XML documents. XHTML is an implementation of HTML as an XML application, in similar ways that HTML is an SGML application.

The ultimate goal of XHTML is to allow browsers to do away with their HTML parsers and basically become XML (and XSL and CSS) parsers. This has two benefits:

Browsers based on an XML parser can parse and display any XML document which it has a stylesheet for, whether it be XHTML, WML, or any other XML application (even one you invented yourself, thus eXtensible).

It removes incompatibilities, as the formatting of an XML document is defined in a stylesheet and an XML document is strongly typed (an illegal XML document will generate an XML parser error), so all clients will display documents the same.

But Browsers Only Support HTML 4

Wrong! XHTML is an implementation of HTML 4 in XML, thus it is HTML 4 compatible, it will be displayed fine in browsers which support HTML 4. All that XHTML really does is:

Cut out the crap

Make HTML a valid XML document

Force authors to create valid HTML documents

So XHTML Is The Standard?

Yes, XHTML has been a W3C recommendation for over 2 years (since Jan 2000 to be exact). You can (and should) use it right now.

"But why should I?" I hear you cry:

Do you hate browser incompatibility? Then use XHTML.

Are you fed up of having to check your HTML in hundreds of browsers? Then use XHTML.

Do you want to help push for a standardised XML based web? Then use XHTML.

Do you want your documents to be future proof? The use XHTML.

Do you take pride in your HTML? Then use XHTML.

You probably think that you can write proper HTML 4, I did, but if you haven't read and understood the HTML 4 spec then you can guarantee that your documents are not standard, try running your HTML through the HTML validator, you will be surprised, I know I was.

So How Do I Write XHTML?

Documents must be sent with the correct content type.

Web servers send HTML files with a mimetype definition of text/html, this tells the browser what type of document to expect from a given request. XHTML is not HTML and should not be sent with the same mimetype, XHTML should be sent with the mimetype of application/xhtml+xml.

However it's not quite that simple as Internet Explorer does not understand the application/xhtml+xml mimetype and interprets it as text/xml displaying your XHTML file as an XML file. Luckily the W3C allow us to send HTML conforming XHTML documents with the text/html mimetype which IE will understand and render our document correctly.

Best practice is to get your server to check the types of files the browser will except and send the content as application/xhtml+xml to browsers that support it, and as text/html to those that don't.

This defines the document as XHTML 1 and tells the browser which DTD to use. There are three DTDs for XHTML, Transitional is the best to use if you are moving from HTML 4 and want maximum backwards compatibility.

All tags must be lower case.

Yes, I don't care if you've used upper case tags before, XML is case sensitive so <B> is different to <b>, the XHTML DTDs define all tags in lower case only, so for your tags to work they must be lower case.

is not valid XHTML. You must put quotes around the attribute body such as:

<a href="http://www. example.com">Example</a>

All tags must be closed.

That is if you open a tag like <p> then you most also close it with </p>, makes sense really.

Empty tags must be closed.

This is probably the most alien concept from HTML 4. A tag is empty if it does not contain text between itself and a closing tag, however in a valid XML document when you open a tag you must also close it. So:

<img src="http://www. example.com/image.jpg">

is not valid XHTML. You must close the tag like this:

<img src="http://www. example.com/image.jpg"></img>

This is however annoying, so XML defines a shorthand way of closing empty tags like this:

<img src="http://www. example.com/image.jpg" />

Even the humble line break tag must be closed, ie:

<br />

You must nest tags correctly.

So you cannot open two tags, and then close them in a different order, eg:

<b><i>This is bold and italic</b></i>

is not valid, you must nest them correctly like:

<b><i>This is bold and italic</i></b>

Tag context

Last but not least and the hardest concept to get hold of, you must use tags in the correct context. When you place a tag in your document, it must have a parent that is valid for that tag. This is almost impossible to get right without practice or learning the XHTML spec inside out, so the best way to check if you have done something wrong is to check with the W3C validator.

XHTML1.1 should start with an XML declaration including an encoding type (especially if not UTF-8) followed by the XHTML1.1 DOCTYPE definition and then the "html" element including an XML namespaced language attribute.