Docbook, XML and ebooks:Creating eBooks the old fashioned way

One of the most traditional ways to author content for multiple distribution channels is to roll up your sleeves, write XML and then convert it to your target format. For this exercise we will use Docbook. Without going into too much detail, Docbook was initially created in 1991 as a means to create computer software manuals and other technical documentation. Over the years Docbook has evolved into a general purpose XML authoring language. Along with the authoring standard, what structures we can use to author our content, the authors of the Docbook standard have also created a set of stylesheets to convert our base XML files into different formats. One of the formats that you can convert your XML files is epub.

The stock Docbook style sheets produce ePub 2 compliant books. This is ok for now as most readers that support ePub support this version. There is experimental support for ePub 3 compliant books, which we will follow for this article as it gives us access to all the multimedia features of ePub3.

As far as XSLT processors there are two that I recommend. One is Saxon; currently at version 9.4 and available from its publisher Saxonica on a trial basis. Yes, it is commercial software but after years of using it I highly recommend the investment. It is written in Java and provides a full set of features, extensions and advanced implementations of XML related technologies. For our purposes it’s enough that it will take the XML, process it with the style sheets and give us the output we want.

The second processor I recommend is XSLTProc. written in C and bundled with Most UNIX/Linux/OSX installations it can be downloaded/updated from the xmlsoft.org web site. Download and install both LibXML and LibXSLT and install them in the same order (LibXML first and then LibXSLT) or it will not work as you think it will.

The commands to create the ebooks using Xsltproc are:

xsltproc /Users/carlos/docbook/1.0/xslt/epub3/chunk.xsl ebook.xml

This produces an output that should look like this:

Writing OEBPS/bk01-toc.xhtml for book
Writing OEBPS/ch01.xhtml for chapter
Writing OEBPS/ch02.xhtml for chapter
Writing OEBPS/index.xhtml for book
Writing OEBPS/docbook-epub.css for book
Generating EPUB package files.
Generating image list ...
Writing OEBPS/package.opf for book
Writing OEBPS/../META-INF/container.xml for book
Writing OEBPS/../mimetype for book
Generating NCX file ...
Writing OEBPS/toc.ncx for book
< ?xml version="1.0" encoding="UTF-8"?> '

Final Details

We are done generating the content and the files we need in order to generate the eBook. To finish the process we need to do the following (taken from the README.epub3 file):

Manually copy any image files used in the document into the corresponding locations in the $base.dir directory.

For example, if your document contains:

<imagedata fileref="images/caution.png"></imagedata>

If the base.dir attribute is set up to the ebook1/OEBPS, you would copy the file to: ebook1/OEBPS/images/caution.png. You can get a list of image files from the manifest file (ebook1/OEBPS/package.opf in our example) that is created by the style sheet.

Currently the stylesheets will *not* include generated image files for callouts, header/footers, and admonitions. These files have to be added manually.

cd to the directory containing your mimetype files, which would be ebook1 in this example.

The first command adds the ‘mimetype’ file first and uncompressed. The -X option excludes extra file attributes (required by epub3). The numbers indicate the degree of compression. The -r option means recursively include all directories. The “sherlock-holmes.epub” in this example is the output file.

Validation

Because we have done most of the work manually we need to validate the result of our work. For that we will use the epubcheck3 tool available from its Google Code Project repository.