What is XHTML?

XML expert Benoit Marchal discusses the differences among XML, HTML, and XHTML, including coherence, modularization, and where we go from here.

XML expert Benoît Marchal discusses the differences among XML, HTML,
and XHTML, including coherence, modularization, and where we go from here.

What is XHTML?

The name says it all: XHTML combines XML with HTML. More formally, XHTML is
an XML rewriting of HTML. What does that mean in practice?

XML and HTML have a lot in common. One of the only differences (but it's an
important one) is that XML is a generic markup language whereas HTML is a specific
language for hypertext documents.

Understanding the difference between XML and HTML is essential to understanding
XHTML so let me take an example. HTML is specific because it defines specific
elements, e.g. there is an element for paragraphs (<P>), an element for
images (<IMG>), an element for boldness (<B>).

XML, on the other hand, defines no elements. That's why it's generic. It is
up to the author do define the elements he needs in his document. For example
DocBook, which is an XML vocabulary for technical documentation, defines a paragraph
element (<Para>) but MathML, an XML vocabulary for mathematics, does not
define an element for paragraphs. There is no need for paragraphs in mathematical
equations so there is no paragraph element in MathML! Instead MathML defines
elements for sums (<sum>), exponentiation (<exp>) and other mathematical
concepts.

Both DocBook and MathML, which are specific languages, are built on top of
XML generic facilities. In fact, many other languages have been created on top
for XML. There are XML vocabularies for multimedia, graphics, real-estate, electronic
commerce and more.

This raises an interesting question: if XML is a generic language that is used
to create specific languages and if HTML is a specific language then why not
build HTML on top of XML? It has been done and it's called XHTML.

If you read the XHTML 1.0 recommendation, you will recognize the familiar HTML
4.0 elements (paragraphs, bold, images, etc.). No new element has been added.
However XHTML follows the XML syntax, therefore every element must have both
a start-tag and an end-tag. HTML only requires the start-tag for most elements.