Worth Repeating: Why Transformations

Reading about technology isn’t always easy. Few writers can pare down an issue to its essentials and then use common experiences to re-explain it with new relevance. For me, that clarity as a writer comes only after countless revisions, so I readily appreciate whenever I read a forum post or blog comment where someone has explained a difficult concept with ease. Today’s Worth Repeating highlights a concept that drives many ideas behind smart publishing today: how transformation as a design point in applications actually helps future-proof your content.

Practical content transformation!

In the forming wake of the W3C’s 1998 announcement of the first XML Recommendation, activity also kicked into gear for the XSL and XLink specification work. I was on the original XSL Working Group as IBM’s representative. I knew it was important work, but at the time I was still struggling with the real business value of these new standards. It was during those dimly lit days that I first saw this forum exchange by Paul Prescod about the relationship between XSL (the transformation component) and XML in particular. In Paul’s words out of his larger reply, where he was differentiating the roles of styles (CSS and XSL-FO) from transforms proper (with my [edits] for some missed keystrokes):

> Where does XSL fit in an XML application architecture?

I discussed this in a talk on XSL recently. Slides are at http://www.prescod.net/xsl/slides/17.html . The fundamental point is that you cannot predict how your data will be used in the future, so you cannot [d]ecide on the “optimal” encoding for it. Even if you knew exactly how it was going to be used, the needs of document renditions and data storage are often different.

In a rendition, redundancy is your friend. In document maintenance, it is your enemy. Actually, redundancy is probably the most important point. Often you want to get rid of redundant markup (“Why should I always wrap this series of elements when the wrapper can be logically implied?”). Often you want to get rid of redundant text (“Why should I type titles for these columns, when I use the same column titles for every table of this type?”) Sometimes you want to get rid of completely redundant elements: (“Why should I [instance] the chapter title both in the document, and in the table of contents, and in a dozen cross references”)?

In a rendition, data should often be sorted according to some rule that will help human navigation. In your document database, you probably want to allow authors to enter it in any order. You may even need to sort the same data according to different rules according to the rendering.

Transformations are the basis of all XML processing. I expect that within a few years all XML-processing applications will have transformation engines built in. Style application[s] are just the start.

Content analysts and DITA specialization architects can absorb and apply points like these from Paul’s incisive explanation:

By reducing redundancy in the information model, content authoring actually becomes simpler for authors because they are no longer required to inspect multiple instances of the redundant content.

To do this, specialization designers must change hats and be alert to instances of redundancy in the legacy sources or renditions they are analyzing. Usually you can replace these instances with pointers to the source version of content (as with DITA conref) or by simply generating the necessary rendition version of that content using context for correct placement of the copy in the new rendition.

Another term for the re-rendering of content is transclusion. Writers can better understand the value of single-sourcing repetitive content if their editing or previewing tools can actually visualize the scope of transcluded content at the point of reference.

Deduplication is the art of finding redundant information in existing content so that you can plan for the best way to manage a single source of that truth and replace the other instances with references to that source. Most migration vendors offer deduplication as part of their service. I noticed a recent announcement by Data Conversion Laboratory of their free sample Harmonizer service as an example of how you can started in understanding the scope of redundancy in your company’s information. Deduplication services can greatly help you make better use of XML transformations to improve the maintenance of that content. Please comment if you know of other such trial services.

As you read Paul’s post, what insights would you use to explain the value of transformation (or conversely the problem with redundancy) to writers whose main experience has been with copy/paste duplication?

One Response to Worth Repeating: Why Transformations

I have long felt that the standard SGML/XML elevator pitch, “separating content from formatting” had led many people down the wrong path. It gives the impression that an XML document is simply a normal document with the formatting stripped out.

No wonder then, that we still need to make the case for transformation.

My view is that XML has never been about publishing; it is about content management. It is about making content processable. It is about making content into a database. (Not placing it in a database, but *making it into* a database.) Look at it that way and there is no question of why you need a transformation language, and, indeed, no question of why you need a query language.