Today I was working on a project proposal for my course on Software Architecture. There was a strict limit of 5 pages on the document, including diagrams, and so it was necessary to be creative in how we formatted the document, to fit in the maximum possible content. I think that DocBook XSL, together with Apache FOP, generates really great-looking documents out of the box. Unfortunately, however, it does tend to devote quite a lot of space to the formatting, so today I learned a few tips for styling DocBook documents. These techniques turned out to be non-trivial to discover, so I thought I’d share them with others.

Background Information

Some customization can be done very simply, by passing a parameter at build-time to your xslt processor. For many of these customizations, however, DocBook does not insulate the user from XSLT. Specifically, it is necessary to implement what DocBook XSL refers to as a “customization layer”. This technique is actually fairly simple, once you know about it.

In short, when compiling your DocBook document, to, for example, html or fo, you would normally point your xslt processor to html/docbook.xsl or fo/docbook.xsl in your DocBook XSL directory. To allow for some customizations, however, you need a way to inject your own logic, and to do this, you create a new xsl document (e.g. custom-docbook-fo.xsl), which imports the docbook.xsl stylesheet you would have originally imported. By creating your own xsl document, you’re able to your inject customization logic, hence, this document is called a “customization layer”. This is not difficult in practice, but, as I said, it does not insulate the user from XSLT, which for me, was a bit shocking, as I’m not used to seeing and working with XSLT.

Easy Customizations

Two customizations I wanted to do were:

Remove the Table of Contents

Resize the body text

Both of these customizations require the user to simply add a parameter when calling their XSLT processor. In ant, this looks like the following:

The above params remove the Table of Contents, and set the body font to 11pt. Additionally, all other heading sizes are computed in terms of the “body.font.master” property, so they will all be resized when this property is set.

That’s pretty much all there is to it.

Harder Customizations

Two other customizations I wanted to do were:

Reduce the size of section titles.

Remove the indent on paragraph text.

To do this, I had to create a customization layer document in the manner I described above. It looks like the following:

Note that the content of the above document was mostly copy-pasted from various sections of Part 3 of DocBook XSL: The Complete Guide. All I had to do was guess at what it was doing, and substitute my desired values; I wouldn’t have been able to program this myself.

Customizations You Need a Degree in Computer Science to Understand

One of the first customizations I wanted to make was to reduce the font sizes used in the title of the document. Even with detailed instructions, it took me about two hours to figure out how to do this, just because the method of accomplishing this task was so unexpected.

In general, what you’re doing is the following:

Copying a template that describes how to customize the title.

Customizing that template with things like the Font size.

Using an XSL stylesheet to compile that customized copy to an XSL stylesheet. Yes, you using an XSL stylesheet to create an XSL stylesheet.

Include the compiled XSL stylesheet in your customization layer.

Optionally, automate this task by making it a part of your build process.

Holy smokes! Let’s run through a concrete example of this.

First, make a copy of fo/titlepage.templates.xml. I put it in the root of my project and called it mytitlepage.spec.xml. I then messed with the entities in mytitlepage.spec.xml to change the title font size. This was pretty self-explanatory. I then skipped a few steps, and integrated it with my ant script.

Now, whenever I build-fo, mytitlepage.spec.xml will be processed by template/titlepage.xsl in my DocBook XSL directory, producing the document mytitlepage.spec.xsl. I then import mytitlepage.spec.xsl into my customization layer:

And that’s it. It’s really not that difficult once you know how to do it, and you only have to wire it all together once, but it took a long time to see how all of the pieces fit together.

Conclusion

My advisor knows all sorts of tricks for Latex in order to, among other things, compress documents down to sizes to get them into conferences with strict page limits. I think this is pretty standard practice. You can do the same thing with DocBook, but expect a high learning curve, especially if you’ve never seen XSLT or are unfamiliar with build systems. I think DocBook is pretty consistent in this respect. But, in all fairness, I was ultimately successful: all of the resources were there to allow me to figure this out myself.

I was forced to learn Docbook for the SVG Open 2009 conference, which asks all of their users to submit in Docbook format. I found it cumbersome and confusing to set up, but, once I had put in place all of my tool support, I actually found it to be a very productive format for authoring structured documents.

Similar in concept to Latex, I now prefer to use Docbook for all of my technical writing. I like it because it’s XML (this is a matter of personal taste, but I like XML as a markup format), because it is environment-agnostic (I prefer to edit in Vim, but Eclipse includes great XML tooling and integration with version-control systems, and thus is also an excellent choice for a Docbook-editing environment), and because, thanks to the Apache FOP and Batik projects, it’s very easy to create PDF documents which include SVG images.

Still, I could never forget the initial pain involved in setting up Docbook, and so I’ve created docbook-ant-quickstart-project, a project to reduce this initial overhead for new users. From the project description:

Docbook is a great technology for producing beautiful, structured documents. However, learning to use it and its associated tools can involving a steep learning curve. This project aims to solve that problem by packaging everything needed to begin producing rich documents with Docbook. Specifically, it packages the Docbook schemas and XSL stylesheets, and the Apache FOP library and related dependencies. It also provides an Ant script for compilation, and includes sample Docbook files. Thus, the project assembles all of the components required to allow the user to begin creating PDF documents from Docbook XML sources quickly and easily.

I spent a long time looking for a similar project, and, surprisingly, didn’t find too much in this space. I did find one project which has precisely the same goals, but it relies on make and other command-line tools typically found on Unix platforms. Right now, I’m on Windows, and Cygwin has been problematic since Vista, so Ant and Java are a preferred solution. Also, by using Ant and Java, it is very easy to begin using this project in Eclipse.