Navigation

You have an existing LaTeX document, and would like to create an EPUB document from it. LaTeX creates gorgeous printed works, but predates e-books by several decades. On the other hand, LaTeX is a markup language, and EPUB is basically XHTML, which is also a markup language, so there is a path. This article describes that path.

When I wrote The Dictator's Handbook, I started with a LaTeX manuscript and converted it to epub for electronic sales and distribution. If you do too, here is your situation:

You have an existing LaTeX document, and would like to create an EPUB document from it. LaTeX creates gorgeous printed works, but predates e-books by several decades. On the other hand, LaTeX is a markup language, and EPUB is basically XHTML, which is also a markup language, so there is a path. This article describes that path.

My LaTeX document didn't rely on too many external packages, a mark in my favor as each additional package puts you more at risk of some unforeseen incompatibility. To create the Dictator's Handbook I needed only \lettrine for the initial caps and \epigraph for the quotes that opened each chapter. I tweaked a few settings, like the number of entries to show in the Table of Contents, and inter-paragraph spacing, but with those exceptions, my document was somewhat uncomplicated. I can't be sure this approach would work with more complicated documents, having not tried it myself.

There are three steps to going from LaTeX to EPUB. None is particularly difficult, but the graphical tools don't provide a smooth transition yet, and you will have to do a bit of hand editing. This is annoying, but not difficult.

Convert LaTeX to XHTML (use: htlatex, found in the tex4ht package)

Clean up the XHTML (use: htmltidy)

Convert XHTML to EPUB (use: Calibre)

Tweak the EPUB's XHTML docs (use: any text editor)

And if you only want the EPUB for your own purposes and don't want to sell or publish it, you can actually skip the tweaks in step two.

Before you do anything though, you need to modify your source LaTeX file because ebooks don't use some features of printed books. Specifically, e-books create their own table of contents, so having one included in your text is superfluous and confusing; they don't use an index because everything is searchable. And footnotes in e-books are weird. Each one is an individual page, which extends the page count and confuses the flow of the book somewhat. So take your LaTeX files, copy them into a new folder, and work from there, leaving your original files intact for production of the printed book. Checklist:

Remove the line \maketoc so LaTeX doesn't produce a Table of Contents

Remove the index from your LaTeX source file.

Instead of footnotes, I created endnotes, which worked much better. You need to modify your LaTeX. Add the package endnotes and then do a search/replace so that every footnote becomes an endnote. The notes get hyperlinked, so you can jump back and forth from text to endnote -- nice!

You now have a LaTeX file you can work with.

Convert LaTeX to XHTML

There are two tools that convert LaTeX to HTML. The first is latex2html, a perl script that does a decent job of taking LaTeX files and outputting a series of linked HTML files. I experimented with it for awhile and learned it struggles with complex LaTeX and doesn't handle some of the formatting, including the epigraphs and smart quotes. But it will produce a file you can turn into a readable EPUB file. If you're looking for quick-and-easy, it will work!

It's as simple as:

latex2html -split 0 -nonavigation sourcefile.tex

to create sourcefile.html. the "-split 0" flag tells it to create one huge HTML file instead of breaking it up into chunks, as it would normally do. "-nonavigation" turns off the navigation bar at the top of the pages. At the end of a few minutes, you'll have an HTML document you can feed into the next step.

But htlatex, part of the tex4ht set of tools (tex4ht is the package name) does a much better job, retaining the formatting of even the lettrines (initial caps). To use it, type:

htlatex source.tex

You'll now have source.html. But it's HTML, not XHTML (the stricter, tighter, more bulletproof language required for EPUB. Turns out HTML was intentionally left sloppy in order to encourage people to build web pages without fear of endless errors). Use HTMLtidy to convert it. You're still going to have to tweak it, but you're much better off. In a terminal, issue something like this:

tidy -asxhtml -output seconddraft.xhtml firstdraft.html

That takes an HTML file called firstdraft.html and turns it into an XHTML file called seconddraft.xhtml (See brainbell.com for an explanation of the difference between HTML and XHTML). Doing this makes the next step a lot easier, because if you don't, the EPUB validator will find so many errors you'll never be able to get through all of them manually.

Convert XHTML to EPUB

Calibre is the right tool for this job, and it's a lovely piece of software undergoing intensive development. I didn't find a copy of it in my slightly old Linux distro's repositories, but no matter; it's distro agnostic (depending mostly on Python) and has an installer that, as far as I can tell, works on almost any Linux distro and BSD. Once you've installed Calibre, add a new file, and when prompted to select the file, select your XHTML file. Calibre will import it in XHTML format (that is, no conversion.) Now use Calibre to set the metadata, choose a cover image, and so on, and then convert it to EPUB format. It does so nicely, taking about a minute or two. If all you want is an EPUB version you can read on your own Nook, that's it, you are done!

But if you want to offer your ebook for sale, you've got some more work to do, and Calibre doesn't (yet) offer a clean way to do it graphically: you have got to "explode" the EPUB package into its individual XHTML files and edit them manually. To find out how much work you have, you need to validate your EPUB file. Very few publishing houses accept an EPUB file unless it passes a validation test with no errors. You can validate your file online at a site like idpf.org but unless you're lucky the first time, you'll be doing it a lot and you had just as well download the little java app (it's called epubcheck)from Google code and run it yourself.

java -jar epubcheck myfile.epub >> output.txt

The first time I tested my file, I got over a hundred errors and nearly passed out on my desk. Later, when common sense prevailed, I looked into the errors and realized there were only two or three different errors being repeated ad nauseum. It took some time with a plain old text editor to fix it.

First, you need to right click the EPUB file and from the context menu, select "Tweak file." Calibre will explode the EPUB into its components and allow you to open and edit them individually with your text editor. I used emacs and you'll soon see why, but any text editor will do. Here's what I had to do:

Fix the Table of Contents: Calibre had made a table of contents that was far too detailed (down to the subsection level, if I recall), whereas I only wanted the chapters to appear in the TOC. So I had to remove those entries from the TOC file by hand (five minutes' work).

Fix the Table of Contents Chapter Titles: Again, Calibre had automatically created titles that I didn't find user friendly, so I converted each one to something like "Chap 1: XXXXXX"

Fix a lot of remnant code that the EPUB validator didn't like. In my case, there were about twenty links where two spans had been inserted by Calibre, and that wouldn't work. I tested by removing one, and when that worked, did the rest. Here's where emacs was useful: it was a simple affair to run a macro that searched for the next instance, moved the cursor to the beginning and end, and then deleted the offending structure. I would not have wanted to do that dozens of times manually.

At the end of this process, you've got an EPUB file that will be more than fine for Barnes and Noble, Kobo, and many other online publishers. The only one who will still grumble is Apple.

Final Tweaks for the Apple Store

The last step is only if you intend to submit the EPUB to the Apple store as well for distribution through ibookstore (recommended, given the number of people using their ipads to read books right now). Calibre adds two files to the structure that contain Calibre-specific metadata. After you've got every other issue fixed, copy Calibre's EPUB file somewhere else, open it (remember, it's just a renamed zip file) and remove them. The EPUB will now also validate for the ibookstore.

Seem like a lot of work? It's really not, if you think about it. And as a bonus, you have not simply given money to some service provider to do it for you, you've retained total control over your publication's every aspect, and you've been able to write a book using the best software for the job. Worth it!