Developing an XSLT Stylesheet

Introduction

While developing an XSLT stylesheet, we observe the following guidelines, which keep
our overall goals in mind as develop working components piece by piece. This helps
us to write more concise and accurate code, while also reducing time spent
troubleshooting. We recommend re-reading this page as a reminder before beginning to
write an XSLT stylesheet until you feel that you’ve internalized the model.

Input in, output out

Start by making sure that your XSLT can read your input file and generate output,
even if the output is just a placeholder. As you continue to run your transformation
each time you add a new step, it should continue to read input and generate output
(even if the output continues to include placeholder text). Any time you can’t read
input and generate output, fix the problem before you do anything else. It’s easier
to fix a problem as soon as you learn about it than to try to track it down in an
ocean of new code.

Add functionality one step at a time

This is the most important guideline for avoiding tedious, confusing, and
unproductive troubleshooting. When writing XSLT, you should add one bit of
functionality at a time, and then run the transformation to verify that the new code
works before moving on to the next step. You need to watch out for two types of
problems:

Your new code doesn’t do what you think it should do.

Something unrelated to the new code that used to work stops working when you add
the new code. This is called a regression.

The point of the coding, testing, and debugging in small cycles is that it’s easiest
to find and fix mistakes when you’ve written only a few new lines of code since the
last cycle. For example, if you are trying to apply templates to all
<p> tags in an XML document and format them in some way, it’s
a good idea first, before you think about applying the formatting, to ensure that
you are actually finding the <p> tags correctly. You can do that
by writing a template that matches <p> elements and just outputs
some placeholder text, and once that works, you can replace the placeholder with
more refined code that processes them the way you want. And when you do that
formatting, you need to test each feature as you add it, instead of writing the
entire block and checking only then. If you do the latter and something doesn’t
work, you’ll have set yourself up for painfully confusing debugging.

Use stubs

A stub in coding terminology is a snippet of code used to stand in for
something that will be developed later—basically a placeholder for functionality
that has not been written yet. You want to be coding only one piece of functionality
at a time, but sometimes you’ll want to use stubs to help keep your overall goals in
mind while working in different sections. For example, if you have a template that
will eventually output a table of contents, initially it might just output plain
text that says Table of contents to go here. This lets you verify, in the
output, that you’re calling the template and it’s returning output.

Document your code

Unless the purpose of a piece of code is self-documenting (obvious,
self-explanatory), describe its purpose inside an XML comment. (XML comment start
with <!-- and end with --> can contain
anything—including markup—except two consecutive hyphens.) This helps to keep your
code organized for your own use and makes it easier to collaborate with others while
working on XSLT for your projects. Your project teammates need be able to read and
understand your XSLT without feeling as if they’re solving a puzzle.

In practice

To demonstrate the use of these guidelines in practice, we’ve traced through the
steps of creating an XSLT to convert an XML file to HTML5. For this example we’ve
used one of Anton Chekhov’s letters, which you may remember from your first XML
assignments. We’ve marked up the letter as simple TEI-compliant XML:

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>To his Brother Mihail</title>
<author>Anton Chekhov</author>
</titleStmt>
<publicationStmt>
<publisher>Project Gutenberg</publisher>
</publicationStmt>
<sourceDesc>
<ab>This would be more thoroughly researched for a non-tutorial XML file</ab>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<opener>
<dateline>
<name key="taganrog" type="place">TAGANROG</name>
<date when="1876-01-07">July 1, 1876.</date>
</dateline>
<salute>DEAR <name key="misha" type="person">BROTHER MISHA</name>,</salute>
</opener>
<p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
and so you can judge how welcome that immense letter was. Your writing is good,
and in the whole letter I have not found one mistake in spelling. But one thing
I don't like: why do you style yourself "your worthless and insignificant
brother"? You recognize your insignificance? … Recognize it before God; perhaps,
too, in the presence of beauty, intelligence, nature, but not before men. Among
men you must be conscious of your dignity. Why, you are not a rascal, you are an
honest man, aren't you? Well, respect yourself as an honest man and know that an
honest man is not something worthless. Don't confound "being humble" with
"recognizing one's worthlessness." …</p>
<p>It is a good thing that you read. Acquire the habit of doing so. In time you will
come to value that habit. <name key="beecherStowe" type="person">Madame
Beecher-Stowe</name> has wrung tears from your eyes? I read her once, and
six months ago read her again with the object of studying her—and after reading
I had an unpleasant sensation which mortals feel after eating too many raisins
or currants…. Read <title key="donQ" type="lit">"Don Quixote."</title> It is a
fine thing. It is by <name key="cervantes" type="person">Cervantes</name>, who
is said to be almost on a level with <name key="shakespeare" type="person">Shakespeare</name>.
I advise my <name key="brothersChekhov" type="person">brothers</name> to read—if
they haven't already done so—<name key="turgenev" type="person">Turgenev's</name>
<title key="hamletAndDonQ" type="lit">"Hamlet and Don Quixote."</title> You
won't understand it, my dear. If you want to read a book of travel that won't
bore you, read <name key="goncharov" type="person">Gontcharov's</name>
<title key="frigatePallada" type="lit">"The Frigate Pallada."</title></p>
<p>… I am going to bring with me a boarder who will pay twenty roubles a month and
live under our general supervision. Though even twenty roubles is not enough if
one considers the price of food in <name key="moscow" type="place">Moscow</name>
and <name key="mamaChekhova" type="person">mother's</name> weakness for feeding
boarders with righteous zeal. <note>[Footnote: This letter was written by
<name key="chekhov" type="person">Chekhov</name> when he was in the fifth
class of the <name key="taganrog" type="place">Taganrog high school</name>.]</note>
</p>
</div>
</body>
</text>
</TEI>

How to begin

Our first step is to create a new XSLT file and to verify that we are creating some
output. We begin by creating a new XSLT file, adjusting the boilerplate information
at the top to specify that we’ll be outputting HTML5, creating a template rule to
match our document node, and applying templates. We can also add the most basic
structural components of HTML. We’ve deliberately made an error here, and we’ll
discuss below how to fix it.

At this point we select our two files from the dropdowns in <oXygen/>’s XSLT
debugger interface and run the transformation to make sure we are getting what we
want. Sure enough, the stylesheet is reading the letter as input and outputting
(still very basic) HTML.

Beginning to add functionality

Let’s begin by adding a template that will write HTML <p> tags
around our paragraphs. To do this, we create a new template that will match the TEI
<p> elements in the input and apply templates, creating an
HTML <p> element in the output:

When we run the transformation, we expect our input paragraphs to be output with HTML
<p> tags around them. Since that doesn’t happen, we know that
there’s a problem that the new template is revealing, and it turns out to be a
namespace error. Our first template matched the document node (/),
which isn’t an element and isn’t in a namespace, and now that we try and fail to
match our first TEI element, we discover that we have forgotten to include the
@xpath-default-namespace attribute of the
<xsl:stylesheet> element. By testing our code piece by piece,
we’ve narrowed the places we need to check for the error. Note in this case that the
error isn’t in the new template, but the new template made it visible; we have to
recognize that the new template is our first attempt to match an element from the
input document, and that tips us off to look for a namespace error. After adding the
attribute our entire program now looks like this:

When we rerun the transformation, our paragraphs are now wrapped properly in HTML
<p> tags.

Continuing the development

After verifying our code, we can move on to the next functionality. Our original XML
file contains many references to literary works and figures, which we’ll want to
style somehow in our HTML. To do this, we’ll need two new templates, one to match
<name> elements, and another to match
<title> elements, along the lines of:

So how do we verify that we’re finding and processing the <name>
and <title> elements? As a quick check, we can wrap our
<xsl:apply-templates/> tags in visual display tags to
italicize or embold or color or otherwise highlight our text. Although those tags
probably won’t be a part of our real output, we can use them as stubs to verify that
we’re matching correctly and that we need to write real functionality later. If we
wrap the output of processing <name> elements in
<b> tags and the output of processing
<title> elements in <i> tags, we can see
that templates are firing properly in, for example, the following output
snippet:

I advise my brothers to read—if they haven't already done so—Turgenev's"Hamlet and Don
Quixote." You won't understand it, my dear. If you want to read a book of travel that won't
bore you, read Gontcharov's"The Frigate Pallada."

With that confirmed, this would also be a good time to add some documentation to
remind ourselves that we’ll need to edit these later. We can start working with our
<name> template immediately, so we can remove the
<b> tags from there, and so that we don’t forget that we also
need to add real functionality for processing titles, we can add a comment, so that
the code block will now look something like:

Now that we’ve removed our placeholder <b> tags, let’s turn our
<name> elements into <span> tags by
wrapping our <xsl:apply-templates/> in <span>
tags, and to have some more specificity, let’s preserve their original
@type attribute value as the value of the HTML @class
attribute. Our new template rule looks like:

Oops! Running the transformation again shows that though our
<span> elements and being created properly, the value of the
@class attribute is the literal string @type, instead of the
value of the @type attribute inside the TEI tags. Since we’re using an
attribute value template (AVT), we can fix the error by
wrapping curly braces ({ }) around the XPath expression, along the
lines of:

More practice

Finally, for practice, let’s make a list of all of the individuals referenced or
addressed in the letter using a modal template rule for our
<name> elements. We can begin as in the Modal XSLT tutorial by creating an
<h2> header and an <xsl:apply-templates/>
tag with a @mode attribute value of toc inside a
<ul> element in the body of the document, telling it to
select all <name> elements with a @type attribute
that has the value person:

When we run it, our program spits out all of the names of people at the top of the
page inside of a set of <ul> tags, so we know that we’re
selecting them correctly, and we can move on to formatting them. We create a new
template rule with @match="name" and @mode="toc"
attributes to correspond to the <xsl:apply-templates/> element
above. Inside this rule, we create an <li> element for each name
in our list and apply templates inside it:

This is close to what we want, but we can improve it. For consistency, let’s convert
all of the names to all lower-case characters (in Real Life we would capitalize just
the first letter of each part of a person’s title and name, and we’d get rid of the
possessive endings where they occur). We can do this by replacing the
<xsl:apply-templates/> tag in our modal template rule with
<xsl:value-of select="lower-case(.)"/>. Let’s also sort the
list by adding an <xsl:sort select="lower-case(.)"> element
inside the <xsl:apply-templates> rule in the body. We can now run
the code again to get a more attractive list in alphabetical order.

We can now take stock of our entire stylesheet, which appears something like below (a
few extra features not discussed in the tutorial have been added for formatting
purposes):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
SYSTEM "about:legacy-compat">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Chekhov to Mihail</title>
</head>
<body>
<h2>Referenced Individuals:</h2>
<ul>
<li>brother misha</li>
<li>brothers</li>
<li>cervantes</li>
<li>chekhov</li>
<li>gontcharov's</li>
<li>madame
beecher-stowe</li>
<li>mother's</li>
<li>shakespeare</li>
<li>turgenev's</li>
</ul>
<h2>Contents</h2>
<p>
<cite>To his Brother Mihail</cite>
</p>
<p>Anton Chekhov</p>
<p>Project Gutenberg</p>
<p>
<span class="place">TAGANROG</span>July 1, 1876.</p>
<p>DEAR <span class="person">BROTHER MISHA</span>,</p>
<p>I got your letter when I was fearfully bored and was sitting at the gate yawning,
and so you can judge how welcome that immense letter was. Your writing is good,
and in the whole letter I have not found one mistake in spelling. But one thing
I don't like: why do you style yourself "your worthless and insignificant
brother"? You recognize your insignificance? … Recognize it before God; perhaps,
too, in the presence of beauty, intelligence, nature, but not before men. Among
men you must be conscious of your dignity. Why, you are not a rascal, you are an
honest man, aren't you? Well, respect yourself as an honest man and know that an
honest man is not something worthless. Don't confound "being humble" with
"recognizing one's worthlessness." …</p>
<p>It is a good thing that you read. Acquire the habit of doing so. In time you will
come to value that habit. <span class="person">Madame
Beecher-Stowe</span> has wrung tears from your eyes? I read her once, and
six months ago read her again with the object of studying her—and after reading
I had an unpleasant sensation which mortals feel after eating too many raisins
or currants…. Read <cite>"Don Quixote."</cite> It is a
fine thing. It is by <span class="person">Cervantes</span>, who
is said to be almost on a level with <span class="person">Shakespeare</span>.
I advise my <span class="person">brothers</span> to read—if they haven't already done
so—<span class="person">Turgenev's</span> <cite>"Hamlet and Don Quixote."</cite> You
won't understand it, my dear. If you want to read a book of travel that won't
bore you, read <span class="person">Gontcharov's</span>
<cite>"The Frigate Pallada."</cite>
</p>
<p>… I am going to bring with me a boarder who will pay twenty roubles a month and
live under our general supervision. Though even twenty roubles is not enough if
one considers the price of food in <span class="place">Moscow</span>
and <span class="person">mother's</span> weakness for feeding
boarders with righteous zeal. [Footnote: This letter was written by
<span class="person">Chekhov</span> when he was in the fifth
class of the <span class="place">Taganrog high school</span>.]</p>
</body>
</html>

In conclusion

The errors that we made deliberately in this tutorial are similar to those that we
make by accident during real development. By building one small component at a time
and testing frequently, we were able to find and correct our errors quickly. Testing
frequently may seem like extra work, but that’s true only if you never make a
mistake, and in our experience, your development process will be more robust and
productive if you 1) make sure you can always read input and write output; 2) add
functionality one step at a time, developing and testing in small cycles; 3) use
stubs; and 4) document your code.