Lessons learned and musing about software tools, software testing, computational experiments, optimization, operations research, and other interesting stuff that I run across...

Friday, September 23, 2011

Using AsciiDoc for Mathematical Publications

Technical writing is an integral part of my research in computer science and operations research. I have a long history using LaTeX, which is very well suited for writing technical articles that contain mathematical equations as well as code snippets. Although LaTeX can readily generate postscript and PDF output files, I have been unimpressed with tools that generate HTML from LaTeX source. Thus, I was intrigued by AsciiDoc, which promises to generate PDF, HTML and eBook formats. AsciiDoc is used to provide online documentation for software projects, and authors can publish book through O'Rielly using this tool. Thus, this is a well-developed document generation tool.

The advantage of AsciiDoc is that you can use a simple markup language to generate complex documents in a variety of formats. Since this is a generic document-generation process, it is reasonable to expect that there will be limited control of document formatting. (If you want a lot of control, you should just use LaTeX!) However, there are several major limitations to the document generation and format control that limit what you can do with AsciiDoc:

Portable Mathematical Equations: There is only limited support for generating eBook documents that contain mathematical equations. I noticed that the ePUB standard was just updated this month to support MathML, so it is not clear that e-readers can handline MathML right now. Additionally, AsciiDoc does not support the generation of the MathML XML from a high-level description (e.g. LaTeX math equations). Thus, a user cannot easily prepare a document that generates both PDF (using LaTeX under the hood) and ePUB (using MathML under the hood). I guess we will have to wait a few more years to see robust publishing of mathematics for eBooks.

Formatting Mathematics: For whatever reason, the default formatting of mathematical environments in HTML is not centered or indented (as it is in LaTeX). Thus, it is much more difficult to read HTML documents containing mathematics. I tried resolving this using an AsciiDoc filter, without luck. I wound up rewriting the LatexMath macros to enforce this different formatting in HTML. Unfortunately, these revised macros do not precisely match the syntax used by AsciiDoc. {sigh}

Document Authors: The AsciiDoc markup language does not provide a convenient way to create a document with multiple authors. Yes, I am not kidding. There is a docbook configuration file that you can provide, which only works if the document generation process goes through docbook; in my example, that works for ePUB and PDF files. Thus, there does not appear to be a single, portable way for specifying multiple authors.

Citations: It is noteworthy that none of the examples of online books referenced in the online AsciiDoc documentation contain citations or a bibliography. The default format for bibliographies in AsciiDoc PDF files is as a numbered chapter or section, which differs from the normal convention in LaTeX (which I much prefer). Thus, my AsciiDoc book uses the colophon section, which is not numbered. However, that means that it does not show up in the table of contents. {sigh}

Another issue with citations is that the examples provided by AsciiDoc do not correctly generate hyperlinks in the PDF file. Basically, the bibliography section type provided by AsciiDoc does not work well with the dblatex tool used to generate the PDF. My solution was to not use the bibliography section type!

Finally, the examples provided by AsciiDoc include citations in a list environment, which means that the PDF output contains a numbered list followed by a bracket citation reference. Again, my solution was to avoid using the list; the bibliography is simply a sequence of paragraphs, each of which is a citation with its associated anchor.

Despite these issues, I am planning to continue developing the Coopr documentation with AsciiDoc. The lack of support for mathematical equations is a problem for ePUB documents, but most readers will be using this document to refer to the examples in python. However, I would not consider using AsciiDoc for developing more complex documents, like a book intended for publication. There is too much customization that would be needed to get past the current limitations.

I tried using MathJax without luck for the online documentation. MathJax generated some type of error that relates to the fact that there is a new version of MathJax, and the AsciiDoc examples don't work properly with it. I'm not sure I can recall the details ...

The AsciiDoc developers acknowledged that many of my concerns were real issues, but to my knowledge they have not resolved them. Having said that, I am happily using AsciiDoc in two different projects: