Blog Post

Tales From the Command Line: textutil

I really enjoy the overall experience reading books and articles on my Sony PRS-500 eBook reader, but dislike having to fire up Boot Camp or VMware into Windows in order to purchase books from the Sony eBook Store, especially when there are thousands of books in the public domain and tons of blog and article content on the internets for free.

The problem lies with getting this information onto said device. to make my life easier, I use a utility that first appeared in OS X 10.4 called textutil. As you will see, the utility of this small tool goes far beyond formatting content for eBook readers. As always, fire up Terminal.app and have it ready to roll as we delve once again down to the command line.

Formats A-Plenty

While I have a very specific and regular use-case for textutil, there are plenty of features that make it a highly useful and general purpose tool. (For most of the examples, I will be using text and HTML version of A Christmas Carol by Charles Dickens) The first lies in format support. textutil can convert from/to txt, html, rtf, rtfd, doc, docx, wordml, odt and webarchive. Picture a scenario where you receive a large number of HTML documents from an existing project that are just wretched and you really only need access to the raw text to begin anew. While you may have techniques for stripping HTML tags, textutil can do the heavy lifting for you with ease:

$textutil -convert txt ChristmasCarol.html

Since you can specify as many files as you like on the command line, batch processing an entire directory is just as easy:

$textutil -convert txt *.html

If you have an article broken up into many pieces and want to convert (or keep in the same format) and concatenate them into one large file just use the -cat option:

$textutil -cat html *.txt

If you have a look at the texutil manual page, you will see that you have complete control over the location, name and extension of output files and can even specify font name and style. This is very handy for my use-case since I have a certain base font size I like to use with the reader:

textutil -convert rtf -font Times -fontsize 14 ArticleToConvert.html

You also do not need to save HTML files from your browser first. The -stdin option lets you work some further command line magic (by pairing textutil with curl) to convert your data directly from the web:

Metadata Madness

textutil does its best to preserve file information, but you may not want to keep such data around or you may want to modify it in some way. The -strip option clears away all metadata while the -title, -author, -subject, -comment, -editor, and -company flags all take parameters that let you specify your own values for each field. You can add your own metadata keywords via the -keywords option and even modify the creation and modification dates through -creationtime and -modificationtime flags.

Unearthing textutil From the Command Line

While dropping into Terminal.app to do some conversions is fine, it would be easier for most users if there was a more accessible way to perform conversion tasks, especially if they are somewhat routine operations. For this, we turn to the power of AppleScript and its ability to make Droplets, which are nothing more than applications that respond to specific events. Fire up Script Editor and enter the following code:

Save the script as both a normal script (so you can edit it later) and then save it as an application (so you can make it a Droplet). Now you have a handy tool which you can drop any number of files on to batch convert right from the Finder. You can customize this script to perform the transformations you need and create as many droplets as you see fit.

You should check out Calibre – its an opensource eBook manager that can be used with the Sony to get items onto and off of it. It also can do some file conversion too for different eBook formats: http://calibre.kovidgoyal.net/