Make it easy to use

Very few things are easy to use in absolute terms, but relative
improvements can have value as well. Relatively speaking, I think
I’ve made processing DocBook documents easier.

Recently, I attempted to reformat some documents that I hadn’t touched
in a while. What I discovered was that they were relying on
out-of-date stylesheets and a build environment that no longer “just
worked” on my system. I decided the quickest thing to do would be to
copy a more recent build system from another project and tweak it.

My build system of choice these days is Gradle. Now don’t
start, I’m sure there are lots of reasons why you might argue that
that isn’t “easy”. I’m a pragmatist:

Most of my XML work that isn’t directly in MarkLogic server
runs on the JVM.
That’s where my XML Calabash implementation(s) are,
that’s where Saxon is, etc. Gradle works well with the Java ecosystem.

Gradle is the first tool that made it practical for me to use
Maven repositories.

Maven deals very successfully with managing the software dependencies
of a project.

Gradle is cross-platform. I recently helped someone get a document
build system setup on Windows. Gradle worked flawlessly out of
the box except for the bits of my build where I’d lazily left in some make
and perl and a stray “cp” command.

The build system that I copied was the one for the
latest XProc spec. What I noticed
as I was tweaking it was that it depended on the DocBook XSLT 2.0 Stylesheets
artifact⊕A Maven “artifact” is just
a jar file. It’s a way of packaging up
a software dependency and sticking it on the web were build tools can find it.
The details are unimportant to you, the user, if all you care about is
formatting documents.
from Maven and it downloaded the stylesheets from
https://cdn.docbook.org/
to do the formatting. That shouldn’t be necessary,
I thought. (I’ve been burned several times recently by this
downloading step when attempting to build documents on trains and
planes, so I was predisposed to investing a little time in fixing it.)

[ Here we go. If you’re going to fall down a deep rat hole, make sure there’s
lots of yak fur at the bottom to pad your fall. —ed]

Indeed, downloading the stylesheet artifact from Maven wasn’t
accomplishing very much. It would be possible, I thought, to use the
stylesheets directly from the jar file if I could get a catalog setup
correctly.

I had written, and was using, an extension task that makes it easier
to use XML Calabash in Gradle. Getting the
catalog in place meant refactoring that task in significant ways.

But having that task automatically setup an XML Catalog for the
DocBook stylesheets seemed wrong. It’s just about pipeline processing, not
DocBook specifically.

So I wrote another extension task for formatting DocBook documents
specifically. Logically, that task had to be an extension of the
underlying XML Calabash task. That meant a bit more refactoring.

Along the way, I also corrected a problem with the underlying task
wherein the documents used by the task didn’t automatically get counted
as inputs and outputs for the purpose of Gradle figuring out what
tasks needed to be run again when documents changed.

Now that I had a place to stand, I added a bit of code to the DocBook
task so that it would construct a bespoke catalog for the stylesheets
in the jar file and insert that into the XML Calabash runtime.

Threading all these needles was a little tricky. I ended up putting
some debugging code in XML Calabash to help me out. Turns out I had to
fix some bugs related to catalog handling. I also applied a bunch of
pull requests and fixed a handful of unrelated bugs along the way.
(Including the bug that was causing the p:validate-with-relax-ng step
to swallow the message that described the actual validation error.
I anticipate much rejoicing across the land. Or in my office, anyway.)

That may not look “easy”, especially if you aren’t a software
developer. But if you install Gradle on your platform and run gradle myDocument,
you’ll get a formatted document.⊕Assuming, that is, that your document is named document.xml.
Replace that with the filename of your actual DocBook document. You can
change output.html into something nicer as well while you’re at it.
And the name of the task, if you wish.

If you’re curious:

The word buildscript and the block enclosed in curly braces that
follows is just boilerplate. You don’t have to understand it, but
what it says is, this project requires the DocBook XSLT 2.0
stylesheets, the Gradle plugin for running XML Calabash, and my XML
Catalog processor.

If you’re tempted to say “so what”, at least consider briefly what
happens when you run this through Gradle.

It will download the artifacts necessary: the three named
explicitly and all of the thirty or so dependencies that
you didn’t even know about.

It will cache them locally for you in some location you never have
to worry about. And if you have multiple projects that use DocBook,
it’ll share them across those projects.

If you end up using different versions of the stylesheets in
different projects, that’ll just work as well.

It will arrange for XML Calabash to run with the DocBook pipeline
to process your source document.

It will use an XML Catalog that will find the stylesheets
directly in the appropriate jar file.

And it will just work on Linux, on the Mac, and on Windows!

If you’ve been processing XML documents since the last millenium or if
you’re a software developer, none of those steps will seem difficult [except
maybe the Windows thing —ed],
but it’s still possible to appreciate that you don’t have to do any of
them. If you aren’t a software developer, some of those steps
probably read like complete jibberish. That’s ok, because you don’t
have to do any of them!

Right. Having got this far, being just about in a position to go back
and finish the job I started, it occurred to me that if the
stylesheets can be processed this way, shouldn’t it be possible to
process the schemas in the same way? [Can you say “displacement
activity”? —ed]

Of course, the answer is yes: SMOP.
Long story short, I took the DocBook schemas (4.5, 5.0, and 5.1) and
packaged them up in a Maven artifact with a little Java shim to
construct a bespoke catalog for them as well. Then I went back and
extended the DocBookTask so that it will use them if they’re
available.

If you want a custom stylesheet or a custom schema, that’s fine too.
Simply import or include the stylesheets or schemas using the standard
URIs; they will be resolved by the catalogs and no actual web access
will be required.

At the end of the day, whether you consider this easy or difficult is
going to depend on a lot of factors. I haven’t taken the time to
describe all of the options of the DocBookTask (e.g., how to make
PDF instead of HTML), and if you’re doing more than just formatting a
single XML file, you will probably need or want to learn a little bit
more Gradle.

I’m pleased with the results, however. So what if it consumed most of
a weekend and required updates to three projects and the construction
of a fourth. I’ve made it easy for you, right? Isn’t that the
important thing?

Please provide your name and email address.
Your email address will not be displayed and I won’t spam you,
I promise. Your name and a link to your web address,
if you provide one, will be displayed.

Your name:

Your email:

Homepage:

Do you comprehend the words on this page? (Please demonstrate that you aren't a mindless, screen-scraping robot.)

What is eight plus four?
(e.g. six plus two is 8)

Enter your comment in the box below. You may style your comment
with the CommonMark flavor of
Markdown.

All comments are moderated. I don’t promise to preserve all of your formatting
and I reserve the right to remove comments for any reason.