Introduction to DITA Conditional Processing

Introduction to DITA Conditional Processing

Dave Gash, HyperTrain dot Com

One of DITA’s primary strengths is combining discrete data chunks into cohesive documents. But it also excels at the other end of the spectrum—separating data chunks when necessary. This feature, called conditional processing, allows you to produce separate documents for different products, platforms, audiences, and more, all from the same input. This article introduces you to conditional processing and its control mechanism, metadata.

What is DITA?

Just kidding! Every DITA-related article in the world seems to start with this section, whether it’s needed or not. I’m pretty sure that if you don’t know what DITA is, you aren’t even reading this article. Movin’ on.

DITA Metadata

Try to say that five times fast.

Let’s first consider a basic DITA Open Toolkit build process. A build file collects information from a ditamap file, which in turn references a group of topic files. The build file also locates a set of XSL transforms appropriate to the requested output type and sends all this along to the DITA Open Toolkit, which collects the topics, applies the transforms, and produces the output.

That’s fine when we want all the content in all the referenced topics to be included in the output, but what if we don’t want all of the content? That’s where conditional processing comes in, the goal being to intelligently control which topics or parts thereof end up in the output. This control is achieved using metadata.

Metadata, often called “data about data,” is a characteristic or trait that helps identify, clarify, or classify an informational element. For example, an HTML paragraph tag might read <p class=”dropcap”>…</p>. Here, the content of the <p> element is the data and the attribute, the class=”dropcap” name/value pair, is the metadata; it classifies the type of paragraph (a CSS class in this case) so it can be processed correctly. Or, in an XML document, a tag might read <cost currency=”aud”>…</cost>. Again, the content of the <cost> element is the data, and the attribute, the currency=”aud” name/value pair, is the metadata; it specifies that the cost element should be taken as Australian dollars. Metadata is often coded as attributes, as in these examples, but not always.

Metadata has various uses, such as workflow support, searching assistance, and index preparation, but is really good at one thing—conditional processing. The primary function of conditional processing is omitting undesired content, or “filtering.” DITA provides four standard attributes to control filtering: audience, product,platform, and rev. It also provides a fifth attribute you can use to specify other properties, reasonably (if not uncreatively) called otherprops. Using these attributes, you can classify everything from individual elements to entire topic groups, applying appropriate metadata to the objects to drive the filtering process.

The big benefit in terms of editing and maintenance is that mutually exclusive content elements don’t have to be stored separately; you can put them all together in a single topic or map and leave out the pieces you don’t need at build time. This technique prepares the content so it can be conditionally processed, while simplifying maintenance by keeping logically related items physically together in a single source location. It’s a great way to cram a lot of stuff into a small space—sort of like the Kardashian sisters.

Put ‘Er There

There are three standard places where you can put metadata: on individual elements, on topics, and on map references.

Element metadata is used at the tag level to apply properties by which the elements can be identified and filtered during the build. Let’s say we want to customize the first step in a task by user experience level. We could use the audience attribute to attach the appropriate metadata to three versions of the same task, like this:

Using this markup, we can easily produce a task topic with steps tailored to the specific audience we’re trying to reach, regardless of PC expertise. (An additional version, <step audience=”doofus”><cmd>Box up your PC and take it back to the store.</cmd></step> may be included if necessary.)

Topic metadata is used at the topic level to specify characteristics with which the topic can be filtered. If we wanted to produce a review document containing all topics written by a given content provider, we could use the otherprops attribute to identify each topic’s author, like this:

While the use of otherprops to indicate author name is entirely arbitrary, it demonstrates the power and flexibility of having a generic, user-defined attribute. The topics can now be identified by author and filtered appropriately during the build.

Map metadata is used at the top of the metadata food chain to apply filtering characteristics to whole topics or topic groups within maps. We could, for example, construct a single map that allows us to produce a user guide for any of several product releases by adding rev metadata attributes to the topic references, like this:

We’re now able to select the correct installation topic (or a set of correct topics, regardless of number or hierarchical placement) for any current product release, from the demo version to 1.x to 2.x, without creating—and maintaining—separate map files. Also, recall that in a map, child topics (topicrefs inside topicrefs) inherit their parents’ attributes, so conditional processing metadata “flows down” just like other attributes. This characteristic allows you to affect whole groups of topics by placing just one filtering attribute on the parent.

How do you know in which layers to put your metadata? Well, it depends (I know, right?) on several factors: content complexity, number of authors, the variety of attributes you use, and so on. In general, assign metadata to the highest level of specificity that makes sense. For example, if you need to easily swap out entire blocks of content, use map metadata to control topics by groups. If you have topics that are similarly structured but different in content, use topic metadata to differentiate them. If you have broad, generic content with many small, specific differences, use element metadata to keep the content together but allow it to be easily filtered.

Testing, 1 2 3…

Here’s a great joke: “What do you call a musician with no girlfriend?”

[crickets chirping] Wait, that’s not funny, you say, and you’re right. But why is it not funny? Because it’s just a setup with no punch line. In comedy, tech pubs, and most other worthwhile human endeavors, preparation is useless unless you deliver the kicker—and that’s the problem with our examples so far.

Identifying unique elements, topics, and maps and applying metadata to differentiate them is only half the job. Metadata itself doesn’t do anything; it just sits there patiently waiting until it’s needed. To make it useful, we have to tell the build process what to do with it; that is, we have to define the filtering conditions for the build.

The ditaval file is the mechanism we use for that purpose. Like the map file and the XSL transforms, the ditaval file is read by the build and used to drive the filtering process as the output stream is created. The ditaval file essentially contains two things: conditions to be matched and actions to be taken when they’re found.

Ditaval conditions are defined with the <prop> (property) element, which has three attributes: att, the metadata attribute to search for; val, the metadata attribute value to match; and action, the action to be taken when the metadata attribute value is matched. Think of it rather like a CSS rule: look for elements that contain the metadata attribute att; if you find one, see if its value is equal to val; if so, perform the specified action. You can include as many <prop> elements as you like, in any order; much like CSS and XSLT, it’s a wonderful demonstration of declarative processing at work. Let’s look at some examples.

Earlier, we added the audience attribute as element metadata to some task steps (and presumably to other elements, topics, and topic references in our content repository). Now, if we want to produce a user guide for novices, we might code conditions in the ditaval file like this:

These conditions allow the novice audience elements through while filtering out the intermediate and advanced audience elements.

Next, we added the otherprops attribute as topic metadata to some topics, naming two contributing authors. If we want to produce a review document containing only those topics written by a single author, we can do it by excluding the other with a ditaval condition, like this:

<prop att=”otherprops” val=”AnnaGraham” action=”exclude”/>This will filter out Anna’s topics and leave us with only topics written by her colleague Otto.

Finally, we added the rev attribute to some topic references in a ditamap, identifying installation topics for demo, 1.x, and 2.x software versions. When we’re ready to produce an installation guide for the 2.x version, we can code ditaval conditions to exclude the others, like this:

The result will be our desired document, an installation guide for the 2.x product only, with the demo and 1.x topics filtered out. Thus, the ditaval file’s <prop> element becomes the killer punchline for the clever metadata setup.

Which reminds me: “Homeless” would be the answer to the above.

A Hippo in the Ointment

Now if you’re ahead of me on this, and you probably are, you’ll note that these examples seem to approach the document assembly process somewhat, well, backward. We don’t include the elements we want; instead, we exclude the ones we don’t want. Odd as it seems, that’s exactly how the “exclude” action works. Gosh, wouldn’t it be nice if there were also an “include” action? Well, there is … sort of.

The original ditaval scheme offered only the “exclude” action (and “flag”, which is beyond the scope of this article). But when ditaval officially became part of the DITA standard, the “include” option was added. It sounds promising, but—if I may use a phrase with which I’m painfully familiar—”it isn’t what it looks like!” For example, you’d think that the single <prop> tag below is equivalent to the two immediately above, including just 2.x content and excluding demo and 1.x content:<prop att=”rev” val=”2.x” action=”include”/>

But you’d be wrong. Yes, given that tag, the 2.x topics will be included, but so will the demo and 1.x topics. That’s because the default action for all elements, marked or unmarked, is “include.” Let’s say that again, because it’s hugely important: the default action for all elements is always “include.” Since that’s the case, you might be wondering if you could at least add that third <prop> tag to the first two, just to make your intentions clear. Yes, but that’s just like calling in your vote for American Idol—you can do it, but it won’t make any difference.

The reason “include” doesn’t work quite as intuitively as we’d like is because its primary use is for elements with multiple metadata values in the same attribute. The filtering logic for multiple values can get sticky pretty fast, so let’s leave that for another article. Bottom line, “include” doesn’t really do us any good in ordinary, everyday filtering, but that’s really not a bad thing; read on.

For now, we can safely say there is just one absolute, immutable rule for ditaval conditions. This rule is true regardless of your DITA OT version, authoring tool, or processing environment. It’s true for all maps, topic references, full topics, and individual elements, whether marked with metadata or not. It’s true all the time, for all builds, in all cases. The rule is this:

Everything not explicitly excluded is included.

At first blush this rule seems restrictive, but in practical terms it greatly simplifies the process of marking up content for conditional processing. We can now approach our content with a simple plan: add metadata to anything we might want to exclude later and leave everything else alone! Because the vast majority of content in any documentation set is included in most output formats (if not, you’re doing it wrong), it’s obviously easier to mark up some content you want to exclude under certain circumstances than to mark up all the content you want to include under most circumstances. Sweet.

Loose Ends

But as you might guess, that’s not quite everything. You can almost hear that fellow with the glass eye, cigar butt, and rumpled trench coat say, “There’s just one more thing.” (That’s an old-guy joke; if you don’t get it, you’re too young to remember most of the stuff we old guys think is funny. Now get off our lawn.)

We know that we add metadata to DITA elements and that we add <prop> conditions to a ditaval file so the build can properly filter the elements. But there’s our missing connection: how does the build process know where our ditaval conditions are? The answer is simple, if inelegant. We tell it where to look.

A build file contains a number of <property> tags (not to be confused with <prop> tags in the ditaval file), that provide the build process with required information, such as the input file location, the output file location, the desired output type, and so on. To specify the location of the ditaval file containing the filtering conditions, we just add one more <property> tag to the build file, like this:

This tag tells the build that the ditaval conditions file “dita.input.valfile” is named “userguide.ditaval” and should be found in the “myprojects\UserGuide” folder under the DITA base directory, “C:\DITAOT\” for example. The build can now load the filtering conditions from the ditaval file and apply them to the metadata attached to the various project elements.

Finally, although this article includes actual code snippets, I realize that hand-coding is so five minutes ago. Most good authoring tools now include user-friendly interfaces to the nuts and bolts of metadata, build conditions, and file locations, so that setting up and implementing conditional processing is relatively easy. But I figure when you’re admiring the dashboard, it’s still good to know what’s under the hood.

Conditional processing is at the heart of content specificity, and metadata is its control mechanism. Grasping the relationship between metadata and filtering is one of the “aha!” experiences we have along the road from linear narrative to structured authoring, a little epiphany that suddenly propels us forward in our efforts to get the most benefit from technology and makes our jobs a bit easier, a lot more productive, and yes, even fun.

About the Author

Dave Gash
HyperTrain dot Com
dgash@hypertrain.com

Dave Gash is the owner of HyperTrain, a southern California firm specializing in training and consulting for hypertext developers. A veteran software professional with over thirty years of development, documentation, and training experience, he holds degrees in Business and Computer Science. Dave is well known in the tech pubs community as an interesting and engaging technical instructor, and is a frequent speaker at user assistance and online publishing conferences in the US and around the world. He can be reached at dgash@hypertrain.com or through his web site, www.hypertrain.com