Chapter 11. XML

Contents:

XML, the
Extensible Markup
Language, is a standardized data format. It looks a little like HTML,
with tags
(<example>likethis</example>) and entities
(&amp;). Unlike HTML, however, XML is designed
to be easy to parse, and there are rules for what you can and cannot
do in an XML document. XML is now the standard data format in fields
as diverse as publishing, engineering, and medicine.
It's used for remote procedure calls, databases,
purchase orders, and much more.

There are many scenarios where you might want to use XML. Because it
is a common format for data transfer, other programs can emit XML
files for you to either extract information from (parse) or display
in HTML (transform). This chapter shows how to use the XML parser
bundled with PHP, as well as how to use the optional XSLT extension
to transform XML. We also briefly cover generating XML.

Recently, XML has been used in remote procedure calls. A client
encodes a function name and parameter values in XML and sends them
via HTTP to a server. The server decodes the function name and
values, decides what to do, and returns a response value encoded in
XML. XML-RPC has proved a useful way to integrate application
components written in different languages. In this chapter,
we'll show you how to write XML-RPC servers and
clients.

XML documents generally are not completely ad hoc. The specific tags,
attributes, and entities in an XML document, and the rules governing
how they nest, comprise the structure of the document. There are two
ways to write down this structure: the Document Type Definition (DTD)
and the Schema. DTDs and Schemas are used to validate documents; that
is, to ensure that they follow the rules for their type of document.

Most XML documents don't include a DTD. Many
identify the DTD as an external with a line that gives the name and
location (file or URL) of the DTD:

Sometimes it's
convenient to encapsulate one XML document in another. For example,
an XML document representing a mail message might have an
attachment element that surrounds an attached
file. If the attached file is XML, it's a nested XML
document. What if the mail message document has a
body element (the subject of the message), and the
attached file is an XML representation of a dissection that also has
a body element, but this element has completely
different DTD rules? How can you possibly validate or make sense of
the document if the meaning of body changes
partway through?

This problem is solved with the use of namespaces. Namespaces let you
qualify the XML tag—for example, email:body
and human:body.

There's a lot more to XML than we have time to go
into here. For a gentle introduction to XML, read Learning
XML, by Erik Ray (O'Reilly). For a
complete reference to XML syntax and standards, see XML in
a Nutshell, by Elliotte Rusty Harold and W. Scott Means
(O'Reilly).