XSL Processing

This chapter is designed to give you a quick start into creating XSL style
sheets. Therefore, a minimum of theory will be presented. However, before you
can create even your first style sheet, it is important to understand the basics
of style sheet processing. As with the rest of this book, there is an emphasis
on creating XSL transformations.

When an XML document is loaded, the parser takes the document and scans all
of its components, which may include

Elements

Attributes

Entities

CDATA sections

Processing instructions

As each markup component is scanned, it is placed in a hierarchical tree
structure in memory. Once the entire document is scanned, the document tree can
be accessed through Application Program Interfaces (APIs) like the Document
Object Model (DOM).

In the case of XSL (both formatting objects and transformations), you can
write style sheets that also access this in-memory tree. From an XSL
perspective, this is called the source tree because it represents the
source document. The goal in XSL processing is to create a second tree that
contains the output you desire. This second tree is called the result
tree. To create the result tree, you use rules in your XSL style sheet
(called templates) to walk through the source tree, select components of
the tree you wish to process, and transform them. The result of applying a
style-sheet template is placed in the result tree. In the case of formatting
objects, the result tree will contain a formatted version of your XML document.
In the case of a transformation, the result tree will contain the transformed
XML document.

To clearly understand how this process works, consider the XML document in
Listing 2.1.

This XML document, which may have been the result of some database operation,
represents a typical invoice containing client information, a description of
services, cost of services, and so on. Although in practice, this document might
or might not be stored as a physical file, you may give it a filename,
invoice.xml, for the purposes of running this example.

For this first example, you would like to transform this document into HTML
so that you can display the information in a browser.

Figure 2.1 This conceptual view of
the source tree shows how an XML document is broken down into its constituent
parts.

Now you would like to walk this tree and create the result tree shown in Figure
2.2.

Notice that the result tree in Figure 2.2
does not contain XML elements. Rather it contains HTML elements.

How the result tree gets streamed into a document depends on how the style
sheet is applied. Recall from Chapter 1, "The Essence of XSL," that
the style sheet may be part of a static reference in the XML document instance.
In this case, the output is handled by the XML parser. On the other hand, the
style sheet may be applied dynamically by an application program. In this case,
it is up to your program to stream the results back out to a file, a browser, or
some other device.

Figure 2.2 The output from the XSLT
processor is a result. In this case, the result tree represents an HTML document.

Creating the Style Sheet

Let's look at a typical style sheet that might be used to transform the
XML document in Listing 2.1 into HTML. Listing 2.2 shows the style sheet.

Listing 2.2 This Transformation (invoice.xsl) Takes Listing 2.1 and
Converts It into HTML for Viewing in a Browser

For simplicity, the goal for this style sheet is to transform just four
elements from the source document: clientName, contact,
descriptionOfServices, and costOfServices. This also brings up
a good point: You only have to transform those parts of a document you wish.
Therefore, this transformation represents a departure from the structure of the
original source document.

The first thing you'll notice about this XSLT style sheet is that the
first line is an XML declaration indicating that it's an XML document. That
means this style sheet is a well-formed XML document that must validate against
an XSL DTD or schema. Where does it reference the schema? In most XML documents,
a DOCTYPE declaration is used to reference the schema. However, in XSL,
a namespace attribute in the <stylesheet> element refers to the
schema.

A Word on Namespaces

The namespaces mechanism allows you to uniquely identify element types that
you create. For example, imagine that you have created an XML document describing
a book chapter. You might create element types such as <chapterTitle>,
<subHead1>, <subhead2>, <chapterText>,
<codeListing>, <sidebar>, <footer>,
and so on. Now imagine that you want to merge the content from this document
with a document taken from a training manual. That document might also use
element type names such as <chapterText> or <sidebar>,
but define a completely different structure. Ultimately, you wind up with
name collisions between your document and the document you're attempting
to merge.

From the perspective of the document author, a namespace is a prefix you
can add to your elements that uniquely identify them. Typically, a namespace
corresponds to a Uniform Resource Identifier (URI) of an organization, such
as your company's Web address, or that of a specification document. Because
these URIs can contain long path names, namespace declarations allow you to
create an alias that is a shorthand notation for the fully qualified namespace.
For example, I might create a document that sets up the following

xmlns:myNS="http://www.beyondhtml.com"

The xmlns portion of the statement says, "I'm creating
an XML namespace." The :myNS is optional and is user defined.
When included, this sets up the alias for the longer URI. The portion after
the equals sign is the fully qualified URI. So, this statement creates the
http://www.beyondhtml.com
namespace and assigns it to the alias myNS.

As you can see, prefixing elements with myNS helps to create a unique
name for the elements in this document.

In XSL, the <stylesheet> element requires that you set up the
XSL namespace that points to a URI. The declaration tells the XML processor that
this is an XSL style sheet, not just another XML document. The URI that the
namespace points to varies depending on the version of XSL you're using.
The current XSL specification requires conforming XSLT style sheets to point to
http://www.w3.org/1999/XSL/Transform.

TIP

Note in Listing 2.2 that an alias, xsl, is established. Because the
alias is optional, it is unnecessary to include the xsl alias. In fact,
because it is user defined, you can choose any alias name you wish. However,
xsl is the de facto name used by virtually all style sheet
developers.

Also, because the alias is optional, it is not necessary to include it at
all. Omitting the alias means you can also omit the xsl: that's
prefixed to all XSL element type names. This can save you some typing and
eliminate a few hundred bytes from the size of your document. However, be aware
that both the source document or your transformation may contain element type
names that conflict with XSL's naming conventions. Therefore, it is always
prudent to include the xsl alias in your style sheets.

CAUTION

Before the XSL became a W3C recommendation in November 1999, processors were
forced to use non-standard URIs in their namespace declarations. If you run
into an error when using the current namespace, check the version of XSL processor
you are using and consider the following alternative namespaces.

XSL processors that follow the December 1998 working draft use the following
namespace definition:

xmlns:xsl = "http://www.w3.org/TR/WD-xsl"

Interim processors (such as MSXML 1) use the following:

xmlns:xsl = "http://www.w3.org/XSL/Transform/1.0"

The November 1999 (current) specification requires the following:

xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

Returning to Listing 2.2, the <stylesheet> element is the root
element of the document and is therefore the container for the rest of the style
sheet. You will learn about all of the elements that <stylesheet>
supports in Chapter 4, "The XSL Transformation Language." However, one
important element type is <output>, which allows style sheet
authors to specify how they wish the result tree to be output. Currently, you
can specify the result tree to be output as XML, HTML, or as text. Listing 2.2
instructs the processor to output the result tree as HTML.