~ A blog about the complex relation between computers and history

Tag Archives: code

The sudden realization that the new MS Word format, .docx, is called Office Open XML for a reason made me spend the whole day in trying to figure out, how these XSL-transformations actually work and whether they could be used in converting these new .docx files to something more edi(ta)ble.

Turned out that the XSL transformations were in principle a pretty simple thing to do, just like a friend me had told. Here’s and example of how to convert a .docx file to LaTeX, in its crudes form:

First, you need to break open the .docx file. It basically is a simple zipped archive, so an ‘unzip testdoc.docx’ should do the trick; you’ll end up with several files and sub-directories, of which only the directory called ‘word’ is necessary for this test.

You can save that in a file called docxtolatex.xsl in the ‘word’ directory. Then, in that directory, run ‘xsltproc docxtolatex.xsl document.xml’, and you’ll have your screen full of the document, in LaTeX markup.

You’ll notice, that this XSLT only converts bold, italics and footnotes. But then again, that’s what I often only need to convert…