Login

XSL Transformation with Perl

Perl may not be as well known as some of the other languages, but it boasts a powerful library of packages and modules that everyone can use to work with XML. In this article, Harish Kamath explains how to get started with the “XML::XSLT” package that allows you to transform XML documents by using XSLT style sheets using Perl.

Introduction

Talk about XML and its off springs, and the words “state-of-the-art”, “top-of-the-grade”, and “latest technology” are commonly mentioned. As to the technologies likely to be recommended by the “powers-that-be”: many may vote for the .NET platform, others will root for J2EE and a few open source fans should recommend PHP!

But Perl? It’s not likely to find a mention at all — sorry, Perl fanatics!

However, this lack of popular support does not mean that the language, which started as an effort to overcome the limitations of shell scripting languages, is left behind in the race. Powered by the efforts of Perl enthusiasts, CPAN — the resourceful Perl repository of re-usable modules and packages — boasts a powerful library of packages and modules that everyone (including you and me) can leverage to work with XML in their favorite scripting language. This includes XSL Transformations, the topic of our current discussion.

For the benefit of the novices, XSL Transformations is a handy XML off-shoot technology that allows one to “transform” an XML file. In simple words, it converts the document into another format by means of easy-to-understand instructions.

While you can learn more about XSL Transformations on the official website of the World Wide Web Consortium (W3C) – http://www.w3c.org – as well as by reading the numerous tutorials on this topic, I will concentrate on how to use Perl to transform XML documents using XSL Transformations.

On that note, I’ll assume that you are “now” familiar with XSL Transformations; if not, I have listed some URLs, later in this article, to get you started.

Time to flip the page!

{mospagebreak title=Getting started}

When it comes to Perl, all roads lead to CPAN — or one of its mirror sites!

After a quick search for Perl modules that implement the XSL Transformations specification, you’ll realize that you have several options. The reason is simple: the open standards, as defined by the World Wide Web consortium, give total freedom to programmers to develop their very own implementation.

Implementations such as XML::Sablotron leverage other languages to deliver faster performance. However, being a core Perl fanatic, I’ll concentrate on the XML::XSLT module, as it is coded entirely in Perl, our favorite language.

After a quick review of the installation instructions, you’ll learn that the XML::XSLT module requires your Perl installation to have some packages already installed. These include XML::DOM, XML::RegExp and XML::LWP, among others. Note that this list is not exhaustive; the modules that you may have to install will vary depending on the modules that you’ve already installed. It may be handy to have your neighborhood Perl guru guide you through the installation process.

{mospagebreak title=My Investment Portfolio}

Now that the installation process is out of the way, it’s time to get our hands dirty with some code. Consider the following XML document (say “portfolio.xml”) that lists the stocks in a sample investment portfolio:

While XML-savvy programmers will appreciate the manner in which the data is hierarchically organized, most market analysts and advisers, who are more at home with the ubiquitous Microsoft Excel, will struggle to make any sense of this document. And unfortunately, it is your job to keep them happy so that you can pay your monthly bills. Hence, the need to find a solution that will help convert the above data to an “analyst-friendly” format.

While this may not be fancy – or even useful – you’ll agree that any layman, let alone the stock analyst, will understand that the output is a listing of stocks in a portfolio. So, how would I bring about this transformation? As you might have guessed, the answer is XSL Transformations (or XSLT in short).

As mentioned earlier, I’ll assume that you are familiar with the nuances of the XSLT language. If not, it might be useful to refresh your memory by visiting some of the URLs listed below. But, before you do, don’t forget to bookmark this page and come back once you’re done!

As the title suggests, this article is about XSL Transformations with Perl and so far, I have not yet written a single line of Perl code. It’s time to rectify this little anomaly. Take a look at the next code listing: a simple Perl script that uses the aforementioned XML::XSLT package to transform the “portfolio.xml” document using the “portfolio.xsl” style sheet, both of which were listed in the previous section.

Execute the Perl script (say “portfolio.pl”) listed above — I’ll assume that the “portfolio.xml” and “portfolio.xsl” files are in the sample folder as the script — and you should see the same “analyst-friendly” output listed in the previous section.

It’s time to turn our attention to the “portfolio.pl” script! For starters, I’ve imported the “XML::XSLT” module and defined a couple of variables to store the names of the XML and the XSL files. Now, things start to get interesting. First, I’ve created an instance of the XSLT processor by invoking the new() method of the XML::XSLT module. Observe that I have passed the name of the XSL style sheet file as an input parameter to the method — a mandatory requirement.

Next, I have invoked the serve() method that does the dirty work of parsing the input XML document, applying the XSLT instructions from the style sheet and finally, print()ing the “transformed” output.

Finally, like all good programmers, I invoke the dispose() method to clear up any memory occupied by the processor object.

Simple and straightforward, wasn’t it?

{mospagebreak title=Demystifying the XML::XSLT processor}

The XML document in the first example listed only one stock in my sample portfolio, but things are a bit more complicated in real life. Consider the next XML file that contains several stocks in a single portfolio — after all, no one likes to put all their eggs into one basket!

Take a look at the accompanying XSLT style sheet. I’ve updated the code to make it modular in its approach. You’ll notice that each element in the XML file has its very own <xsl:template> giving you greater flexibility. Also, note the use of ASCII characters in the “portfolio” and “lasttradedprice” templates to generate a fancy output, as shown later.

Next, I have the glue that brings the two together: the Perl script. I’ve made some minor updates to the script from the first example. I will explain more about these changes after you have reviewed the next code listing and the subsequent output.

Yes, I have omitted the huge amounts of text that may have scrolled across your screen. The reason for this “strange” behavior will be unraveled in the next few lines as I proceed to de-mystify the updated Perl script.

As you can see above, there are no changes to the first few lines of the script. Next, take a look at the following code snippet:

Note the use of the “warnings” and “debug” properties in the new() method. As the names suggest, these instruct the XSLT processor to output useful — but verbose, as is evident from the output that scrolled across your screen — debug information. I have turned on both features by setting the value of the properties to “1”, and I recommend that you do the same to help rectify annoying errors, inevitable during the initial stages of any project.

Next, I have opted for the transform() method instead of the serve() method, used earlier. The most significant difference between the two methods is the return value — the latter, as we have already seen, returns a string that can print()ed on the screen without much fuss. It is not so with the transform() method; this returns an XML DOM object. Don’t take my word for it; print the return value and you should see something like this on your screen.

XML::DOM::DocumentFragment=ARRAY(0x1eeea9c)

Not pleasant at all. But there is no reason to despair; the XML::XSLT processor is equipped with a handy “toString” property that spits out the result of the transformation in a human-friendly format — as is evident from the output, listed above.

If you’re wondering why you would want another XML DOM object, the reason is simple: it is likely that the result of the transformation could serve as the input to another process that is expecting an XML file. In such a case, it would be handy to have a DOM object with the XML structure in memory.

{mospagebreak title=Error Management}

Before I conclude this article, let me show you a final example. It incorporates some basic error handling that ensures that the end-user is not confronted with a screen of cryptic error messages. Incidently, Perl is notoriously famous for generating such messages.

Definitely, a sight for sore eyes — not only is it cryptic and complicated, bu I’ll frankly admit that even I was at a loss to understand the reasons behind the error. However, sanity prevailed after I added bits of the “error-handling”, as seen in the listing above. Execute the Perl script in order to view the following output; I’ll continue to assume that the “portfolio.xsl” file has been deleted:

Sorry, Could not create an instance of the XSL Processor using portfolio3.xsl.

Alternatively, the script will spit out the following error message if the XML document was not found at its specified location:

Sorry, Could not transform XML file, portfolio3.xml.

To be frank, there’s no rocket science behind this, just some deft manipulation using the “eval” function, as seen below.

# some error handling here …if ($@) { die(“Sorry, Could not create an instance of the XSL Processor using $xslfile.n”);}

// snip

For the uninitiated, this “eval” function allows you to execute any Perl expression, and the results of the expression are stored in the special Perl “$@”variable. A quick check on the state of this variable helps you to determine if an error has occurred; if it is null, then all is well. However, if something goes wrong, Perl will store the error message in this variable, thereby allowing you to dictate the next course of action.

Conclusion

This brings us to end of the first part of XSL Transformations with Perl. Today, I showed you how to get started with the “XML::XSLT” package that allows you to transform XML documents using XSLT style sheets using Perl. After a simple example demonstrating the serve() method of the XML::XSLT() object, I demonstrated the transform() method, which returns an XML::DOM() object. Finally, I showed you how to add some basic error handling to your scripts in order to ensure better error handling in your Perl scripts.

In the next part, I shall show you how to play around with the XML::DOM() and XML::XSLT() objects in the same Perl script as well as demonstrate some fancy XSL Transformations that’ll keep you coming back for more. Till then, happy transform()ing!

Note: All examples in this article have been tested on Linux/i586 with Perl 5.8.0. Examples are illustrative only, and are not meant for a production environment. YMMV!