RuDi – A Ruby-based System for DITA document generation

RuDi – A Ruby-based System for DITA document generation

RuDi stands for Ruby-based Utilities for DITA processing. The idea is create a build system for DITA projects that is easy to use, and the same time very powerful — something that is accomplished by using the Ruby language and Ruby-based tools. (The project was originally hosted at Kenai. Until it finds a new home the project code is contained in the RuDi zip file.)

Goal of the System

The primary goal of the system is to separate the styling of generated documents from the code-generation process. That separation would let designers use a visual tool like DreamWeaver to manage CSS styles, sidebars, and layouts. The DITA “open source toolkit” could then be used simply to generate HTML, obviating the need to modify the code-generation process every time a change to the presentation-styling is desired.

The secondary goal of the system is to obviate the need for expensive Content Management System (CMS) to manage changing references when documents and folders are moved and renamed. DreamWeaver, in fact, demonstrates that such a system can be implemented without requiring an expensive database. All that is required is a way to track references as documents are stored in the system, which DreamWeaver does. (Which works superbly well, for a single-user system.)

Such a system solves a number of problems:

When a DITA file is renamed or moved, all links to that file need to be found and modified.

When a DITA file is deleted, all links to it need to be identified.

Outputs from the DITA Open Source Toolkit, which generates documents from DITA files, are not well-formatted. (It runs, but the results aren’t anything you can share with customers. So you wind up “customizing” it — a euphemism for fixing a multitude of styling problems, to generate usable outputs.)

Of course, DITA files can’t be stored in DreamWeaver. But a Subversion repository is a cost-efficient way to store files in a way that allows multiple people to operate on them, it may be possible to replicate DreamWeaver’s reference-management strategy in Subversion, using “hooks” to update references as documents are checked in.

The production problem can be solved using high-end editors like XMetal (great HTML output) and FrameMaker (great PDF output). Both of those organizations have resolved the issues with the Open Source Toolkit. But while they both have great WYSIWYG editing capabilities, they also have a high per-seat cost.

Perhaps even more significant for an organization with a large block of files, those editors generate documents manually, one file at time. Lacking a true build system, it is harder to maintain consistent styling across the document set, and easier for broken links to go unnoticed.

The other alternative is a high-end, extremely expensive Content Management System (CMS). Those systems solve the file organization problem, as well as the production problem, but they do it by putting the files in a database, where it is much more difficult to use an open source editor like Notepad++, for example, to do a search across all the files, so you can do a simple substitution.

Those systems do, however, have the advantage that all outputs can be generated with the press of a button, and they make it possible to use much less expensive editors. But with all the money it takes to put the CMS in place — and administrators to manage it — the cost of a DITA solution skyrockets.

Background: DITA, DreamWeaver, and Ruby

DITA is an impressive design feat. It defined a way to link many files together in flexible ways, while ensuring type-safety in the process. In other words, it guarantees that a text segment that is “transcluded” into a procedural topic will conform to the structural restrictions defined for that kind of topic. The ability to divide material into reusable chunks, and to guarantee that structural restrictions are honored, is a compelling feature of DITA.

At the same time, DITA adoption suffers from the need for expensive tools. The DITA processing tool (the DITA Open Toolkit) generates passable sample code, but falls short of production-quality output. Meanwhile, the need to include stylistic information renders the transformations more complex. Tools that solve the processing problem are expensive, because of the time required to create proprietary fixes for issues in Open Toolkit processing.

Another area of enormous expense is the need for a Content Management System. A collection of DITA documents is nothing so much as a mass of interconnected links. But when a file name changes, every file that links to it needs to change. And when a file changes locations, relative links within the file need to change, along with all of the files that link to it. And if, heaven forbid, a file is split into two, all of the links that refer to the orginal file need to be inspected, to see whether it is appropriate to link to the first file, the second, or both.

The RuDi project was created to address such problems. (It has been dormant for quite a while. But the problems it was designed to address are still present, and it does represent significant strides towards a solution, hence its continued availability.) To help achieve that goal, it is primarily built using the Ruby programming language.

Ruby’s claim to fame is its ability to create domain-specific language — mini-languages that are designed and customized for a specific purpose. Some of the better known examples are:

Rake The Ruby-based build tool. It looks like Make, but avoids Make’s whitespace weirdness, where spaces and tabs act differently, and where indenting, or failing to so, changes the way the instructions are interpreted. Because it avoids XML tags, it easier to write and read than the ANT build tool, but at the same time, it lets you write full Ruby-language procedures whenever you need to, and because you can define your own dependencies, it is superior to either of its main competitors. (This project takes it for granted that Rake will be the build tool of choice. It need not be, but it is recommended.)Learn more:Rake Rocks!

RSpec RSpec is the Ruby-language testing tool. It lets you write tests that have the form “expect X. result = (run some function). test succeeds if result is as expected.” It’s called RSpec, rather than RTest, because the collection of tests reads like a specification. In other words, it’s a testing tool that lets you create readable, writeable, and runnable (executable) scripts — scripts that are effectively behavioral specifications that you run as unit tests.

Those tools make it easy to express the solution for the problem you are trying to solve. The ease of expression translates directly into rapid construction of new solutions, and ready comprehension of existing ones. Learn more:Ruby Rocks!

Licensing Model

In the end, I chose the Common Development and Distribution License (CDDL), when the project was hosted at Kenai.com.

Project Goals

The overarching goal is create a DITA-processing system that produces professional-quality results, affordably. To do that, it addresses:

Document Generation The first goal is to make it easy to transform DITA files into HTML-based output, by separating stylistic design from code processing. Using DreamWeaver templates lets a professional designer work with visual tools. It also separates the design task from code transformations, which makes the transformations simpler. And after the transformations are complete, DreamWeaver will automatically apply template-changes to existing files. So the system achieves both automation and a desirable separation of concerns.Learn more:DITA Publishing using DreamWeaver Templates

Link Management The second goal is automate link processing, to ensure that links remain accurate when files change names or locations — and to do without requiring a mega-expensive content management system. The idea is to automatically generate and run a link-processing script when such changes have been made in a change-management system like Subversion. (To prevent links from being automatically adjusted, the changes can be made outside the system.)Learn more:Link Management Algorithms

File System Storage Using a file system for file storage has significant advantages over a database, primarily in the ability to create and run automated scripts on the collection of files. With the problems of document generation and link management solved, the need for an expensive Content Management system dissipates, leaving the file system as a far less expensive choice.

Note: A unix-based system is ideal for this purpose. In particular, it allows the construction of symlinks that act as a local stand-in for a remote file. That capability is needed to share common topics across DITA maps. Apparently the DITA Open Toolkit has a bug that surfaces if you try to use a conref to link in a file that resides outside the root of the map.

Proofreading The proofreading task can be made much simpler and easier with a list-based search-and-replace tool. That way, a list of common problems can be inspected, and changes can be made selectively. (Ideally, it will integrate with the authoring tool, so that surrounding text can be modified at the same time.)

Improved automation for software-documentation systems. For software documentation, there is tremendous scope for automation that has gone largely untapped. For one thing, it should be possible to run tests on sample programs, to be sure they still work as the product changes. Then, when the sample program is revised, it should be possible to automatically replace any sections of that code that were included in a document (so that the sample will be guaranteed correct) and, at the same time, alert the writer to places where changes have been made, so the surrounding text can be inspected for accuracy.

Another automatable-task surfaces for a tutorial a program that is built up in stages. Such programs are generally be built up in stages, working from a simple starting version and building out to a more complex final version, introducing the reader to new concepts at each stage. It is helpful to maintain a single source copy for such a program, and to generate each version

A third area that is ripe for automation is UI integration. At a minimum, the documentation should be referencing files used in the UI, to ensure that labels are correct. Ideally, those files should also define structural paths, so that instructions like “Click {menu} > {choice} > {tab} > {button}” are guaranteed to be accurate.

An End-to-End Solution The DITA publishing solution and link management capabilities can be integrated with Claude Vedovini’s DITA open platform for web-based editing. It could also operate on its own. In essence, it will be a fully integrated system that users can easily set up to produce well-styled, highly readable HTML pages generated from DITA sources. Those pages might include editing links for individual topics — links that go to editing tools that use Wiki-text, or perhaps an online editor like DITAStorm, or that invoke a desktop editor like oXygen or XMetaL. At the end of the day, changes made to those topics can then ripple through the PDFs and other documents that depend on them, and go out through a variety of delivery channels. The result will be a complete, end-to-end solution for DITA authoring, storage, and production.

Existing Contents

Ruby-based “fluent XML” module A module that illustrates the power of Ruby. It lets you make nested function calls to output HTML, without worrying about closing tags, and without having to output a collection of strings. And you don’t need to define functions for every HTML tag. Any undefined method name automatically generates an XML/xHTML tag with that name. Hash-map arguments to the call provide name/value pairs that become attributes for the tag.See:xml_builder.rb in the RuDi zip file.

Ruby-based XML Styles and Transforms (rXSLT) In the tradition of Rake and RSpec, rXSLT is a Ruby-based language that looks and feels like a familiar language (XSLT). That makes it familiar to many. And it has XSLT’s strength — the ease with which you can set up static transformations, so you don’t have to write “procedures” to do simple things. But at the same time, you can write pure Ruby code for conditionals and processing loops, whenever you need to, which is its primary advantage over XSLT, where such things are very difficult to do.

This project was based on Martin Fowler’s (typically) spectacular article, Moving Away from Xslt, in which he describes the many reasons for preferring a Ruby-based system for transformations (summarized above), and in which he also provides the short segment of code that carries out the crucial central task.See:xml_transform.rb in the RuDi zip file.

Manpage processing toolsIncluded in the project mostly because it was a reasonable place to put them.Learn more:Man Page Processing

Future Developments

Although I pioneered this project, my involvement in it came to an end in 2010, when my employment at Sun Microsystems was finished. These notes remain in place for anyone who wishes to take up the mantle…

DITA Publishing Tools

Uses the Ruby transformation engine to merge DITA content into DreamWeaver templates, as described in DITA Publishing using DreamWeaver Templates . The templates let designers focus on the results they want to produce, and minimizes the amount of code the production team has to write to generate it. Pipeline processing lets the production team specify transforms in more flexible and more powerful ways. This system doesn’t replace the DITA toolkit, but rather augments it by adding post-processing steps — which also means that the DITA toolkit can be upgraded at will, without having to retrofit dozens of customizations.

Sample DreamWeaver-based templates that allow CSS and styling to be modified by a designer, with changes automatically applied to all files in the site, so styling can be done using a visual tool, without having to code the DITA Open Source Toolkit to do it.

A set of rXSLT transforms to transform simple HTML generated by the DITA Open Source Toolkit into template-based HTML pages.

A Rake build script that runs rXSLT scripts on a set of sample files.

DITA Editing Tools

Ruby’s capacity for defining domain-specific languages can be used to create a rudimentary transformation language, of the form a => b. A file that lists multiple substitutions in that form would then, in effect, be an executable Ruby program. (The code would operate only on text. rXSLT would be used to transform tags in the file.) Learn more: Ruby Rocks!

Run interactively, that utility could be used as the basis for an interactive proofreading and style-checking tool, where you step through a list of search-replace pairs, searching or skipping each item, and doing individual replacements, current-file replacements, or global replacements on each.

Run in batch mode, the utility can be used to to replicate DreamWeaver’s site-management capabilities, fixing links when files and folders change their name or location. The package could then be integrated into the version control system (e.g. Subversion), making a poor-man’s CMS.Learn more:Link Management Algorithms

It could also be used to convert XSLT transforms into rXSLT format — for basic transforms that don’t have conditionals or procedures, at the very least. Conditionals and procedures could then be added with much greater ease to the rXSLT version of the transform.

Subversion checkin code that creates a list of changes and runs the replace-tool when a file name or location has changed.

Processing Tool for tutorial programs The original version of this tool used standard XML “processing” instructions to identify added and removed material for each version of a tutorial program. (Since the file is treated as one long bit of text, the processing instructions are pretty much the only things in the file, other than the code and the standard XML header.) That tool generated the code for each version of the program, and also generated HTML segments that display the changes in each version, using bold for added material and strikethrough for removed material. So running it to produce version “X” generates both that version of the program, and file segments that can be automatically inserted into the tutorial. (The surrounding text might then need to change, but the displayed segments of code would never be in error — which is the more difficult challenge, when writing a tutorial!)

Automated testing for tutorial programs. Using Rake to generate each new version of a program, it should also be helpful to generate matching versions of the RSpec unit test file, and to run those tests, guaranteeing that, in the future, the tutorial code continues to operate as expected, and has not been drawn into error by changes in the underlying platform.

Collaboration Tools

Current technologies make it difficult to carry a profitable design and decision-making discussion online, unless it occurs in real time, as in a face to face meeting. But a good collaboration tool could allow for thoughtful discussion and a careful review of the options, as described in Distributed Persistent Collaboration.

3 Comments

Trackbacks & Pingbacks

[…] Link Management. When a file or directory changes name or location, find and fix all affected references both inside the file, and in external files. (concept. Learn more: RuDi – A Ruby-based System for DITA document generation.) […]

[…] terrific writeup on XSL in Ruby: Moving Away From XSLT, the language was implemented as part of the RuDI project (Ruby Utilities for DITA processing), originally hosted at Kenai.com, to allow simple DITA to HTML […]

[…] Find or build a template system, where you can modify a WYSIWYG template (for example) and auto-convert the template into XSLT transforms. Learn more: RuDI – A Ruby-based System for DITA document generation […]