Integrating Services with XSLT

For all the magic that XML, SOAP, and WSDL seem to offer in allowing
businesses to interoperate, they do not solve the more traditional
problems of integrating data models and message formats. Analysts and
developers must still plod through the traditional process of resolving
differences between models before the promise of XML-based
interoperability is even relevant.

Happily, there's more magic out there: having committed to XML,
companies can take great advantage of XSLT to address integration
problems. With XSLT one can adapt one model to another, which is a
tried-and-true integration strategy, implemented in a language optimized
for this precise purpose. In this article I'll discuss issues and
techniques involved in bringing XSLT into web service scenarios and show
how to combine it with application code to build SOAP intermediaries that
reduce or eliminate the stress between cooperating data structures.

Integration Problem Patterns

Model mismatches come in many fun colors and flavors. I won't try to
enumerate them, but let's consider a few common patterns for our purposes.
Different patterns pose different challenges, not all of which, in fact,
can be met by adaptation and transformation alone.

Translating Content

Often, two types will carry the same information but will organize that
information differently. This can appear in both simple and complex
types:

Simple types often encode multiple values -- this is most common
with strings, where delimiters such as dots and slashes separate
multiple tokens:

Narrowing and Widening

Often the fundamental problem is translating between two fields or
structures, both of which hold the same data, but one of which is wider
than the other. Again, simple as well as complex types can exhibit this
disparity:

Two fields, both of which model a serial number, but whose value spaces differ:

Solution Patterns at the Business Level

In a recent
article, I described three broad categories of solutions to the
general service integration problem. Two of these work purely at the
business level and have no interesting technical component. The third
poses the challenge for this discussion: in this scenario the parties
agree that both models must persist and that adaptation between them will
be necessary at runtime.

However it may be wrapped in SOAP, WSDL, and other web services garb,
the problem of adapting one XML model to another is fundamentally a
problem of transformation. As such, XSLT is just about the perfect tool;
we ought to be able to derive behavior like this:

Adapter Design

How should a component be shaped to allow data and messages from two
different models to be interchanged cleanly? A few high-level questions
quickly arise: Who builds it? How vertical or horizontal is it? How
widely available? How static or dynamic?

The first is more a business question than a technical one; practically
speaking, message transformation performed either request-side or
service-side is workable. The answer to this question will imply some
lower-level choices. We'll mostly consider the service-side approach,
which will be more common.

The other questions involve quantities that will usually be correlated,
so that there's almost a single scale representing typical solutions: from
more private, vertical, and static to more public, horizontal and dynamic.
A very horizontal solution might be heavily parameterized to support a
broad range of adaptations at runtime. XML and XSLT both support dynamic
binding and decision-making quite well.

Generally, the best solutions will be standalone SOAP intermediaries.
Message handlers and other service-proximate components are not
sufficiently independent, even for more service-specific designs. (There
is also the practical problem for RPC-style services that most mapping
tools and runtimes will assume that the adapter and the service itself
expect the same vocabulary of messages, which can cause problems if the
runtime is validating messages en route against the same schema it uses to
generate service code.)

However vertical or horizontal, static or dynamic, the adapter will
need to know a few things to do its job:

The incoming and outgoing service semantics, expressed in WSDL and WXS

The mapping between them, which means transformations in both directions

Some message-routing information, which at its simplest will be a single URI for forwarding messages to a service endpoint

As with most intermediaries, it's going to be best to build this one
using low-level APIs that allow the programmer direct access to the XML
stream. High-level APIs and tools that perform data binding will fail to
expose the raw XML and without that it's quite hard to get the full value
of XSLT. Also, where these transformations are being performed, it may be
a good idea to validate both incoming and outgoing messages, and so the
component may invoke a WXS-validating parser as well as an XSLT
engine.

Transformation Design

In adapting a web service using XSLT, one can encounter just as broad a
range of problems as when building any XML-to-XML transformation. We can
identify a few patterns that are specific to the web service case, and
these are considered below.

Selective Replacement

Adapters are most often called upon to fix up what's different about
messages that are mostly the same. If the two models are completely
different, then what would motivate anyone to try to get them to
collaborate in the first place? Usually the schema in question will have
a great deal in common, even to fine type details, but will require
adaptation over some small part of the total model.

All this indicates that the transform will usually spend more of its
time copying nodes than it will making any changes. As such, a good
framework for the transform will rely on generic templates that match
patterns such as "*" and "@*|text ()", copying
and recursing as appropriate but checking for certain element and
attribute names as they do. Specific content will be exempted from the
generic rules and will truly be transformed to effect the adapter's
mapping. Note that it is not a good idea to rely on XSLT's built-in rules
for this, as they will copy only the character data and lose the XML
markup. So a good starting point for any service adapter might be
something like this:

Porting to a New Namespace

The two models in question will invariably occupy different namespaces.
Adaptation between services in the same namespace is not impossible, but
quite uncommon, even within a corporation there will be different
namespaces for different applications, projects, etc. So most transforms
will need to port messages whole from one namespace to another.

One simple technique combines a namespace declaration for the target
namespace in the <xsl:transform> element with a trap
for the source namespace URI in the element-matching template. When
namespace-uri () matches the source namespace, the element is
rewritten with the root-declared prefix for the target namespace and the
same local name:

This will assure that all elements from the old namespace are now named
within the new namespace. It does not deal with attributes, nor does it
clean up old namespace prefixes; these are both solvable problems which
aren't solved here.

Preserving Information

One might be excused for being a bit giddy at this point. Used as
directed, XSLT can make many annoying integration problems go away and
with relatively low effort at that. To sober up, just remember that
almost all integration issues will require bidirectional
transformation. That is, data that's transformed on its way in, and
perhaps stored somewhere, will eventually be requested and sent back out,
and it will have to look right to the requester.

Form is not the only problem here. It is important to avoid the trap
of inbound transformations that produce redundant results for different
inputs. In other words, there must be a one-to-one mapping between the
external and internal value spaces. Precisely preserving information is
key to service adaptation, and this is not always so simple.
Specifically, recall the problem patterns from the beginning of this
article. The first, content translation, usually occurs between two
naturally analogous value spaces -- for instance plain text and compressed
or encrypted text -- and thus isn't usually a problem.

Widening and narrowing, by their nature, present a real challenge, as
the whole point of this pattern is that the value spaces are of different
sizes or even dimensions. Programming-language compilers complain about
"narrowing conversions" for exactly this reason: translating to
a smaller value space introduces the risk of lost information. These
sorts of problems often require some sort of business deal to solve;
someone has to agree to a loss of precision. The example shown at the top
might seem to belie this statement: couldn't we find a more efficient
encoding of the printable-ASCII values and preserve 8 characters in 6?
Yes and no. The real trick there is in finding unused values in the 8- and
6-byte value spaces.

When to Stop

Finally, as wonderful as XSLT is, it's not designed to solve all
possible transformation problems. Generally, it's strong on structural
work using node sets and progressively weaker working with single values
and their components. String arithmetic, algorithms, and math are notable
weak points.

As such, adapter implementations often blend XSLT with the programming
language at hand. In the following example, Java code uses the JAXP to
trigger an XSLT transformation in each direction, in and out, but for the
outgoing messages the XSLT can only do part of the job; direct coding
using the SAAJ (Java's low-level SOAP API) is required to complete the
transformation.

Example

Additional design and implementation issues abound, and a complete
doctrine for creating service adapters is impossible here. I'll conclude
with a simple example that adapts a hypothetical standard for tracking
package shipments to company X's private model, which for legacy reasons
let's say they must keep in place. The standard model is

Company X has a very similar system and, perhaps, has even bent some of
its model to the standard already, but they encode tracking numbers in
single tokens, keep histories as whitespace-separated lists, and include
the current tracking number in the history instead of keeping it
separate:

Company X already has a tracking service in place, but of course it
expects messages in the internal vocabulary. It implements an adapter
that accepts the standard vocabulary, transforms the request using Incoming.xsl, forwards it to the existing service, and
transforms the response before returning it. (The outgoing XSLT transform Outgoing.xsl is combined with Java
code, as mentioned earlier.)

An incoming track request would travel as follows -- showing
just the transformed content here, and fudging the whitespace for
presentation:

The example happens to be synchronous over HTTP, implemented in Java,
and hosted as a standalone service on the same host as the legacy service;
these are all choices of convenience for this context and not necessary
in general. You can delve into details by downloading the working code.