Using PHP to Fill a Word Document (Quick Tip)

The problem
I needed to generate a Word document from PHP (more exactly, from CodeIgniter). The Word document acts as a template which receives values from PHP and adds them to predefined placeholders. Also, the Word document must be able to contain dynamically generated images.

Candidate solutions:

using an “export to .doc(x)” library (e.g. PHPLiveDocx, PHPWord) – involves some complications: integration with CodeIgniter issues, the need to express the Word document markup using PHP (not good for a 10 page document, nor for template changes);

generating the Word document using COM – while promising, it involves requirements that aren’t always achievable: a Windows server with Microsoft Word installed on it; also, as Microsoft explains, this solution may cause slow processing on the server, due to the fact that Office wasn’t intended to be a server-side solution;

exporting to RTF – RTF is a documented format (in some >270 pages), so it should play nicely with PHP. After looking at some RTF documents containing images, I found out that (1) their size was increasing quite fast as new images were added and (2) the image format wasn’t quite easy to represent;

outputting (X)HTML, with Word headers and extension – a nice hack that gave me some hope. I used this idea in a different manner: I created a well formed Word document and then saved it as “Web Page, Filtered”. All went well until I bumped into this solution’s major drawback: inability to embed images – only links to them. Obviously, this affects the generated documents’ portability. Of course, I tried using Base64 encoded images; they were visible when I opened the document in a browser, but Word wasn’t able to display them. So I tried other hacks (like exporting to MHTML, or “Single File Web Page”), and serving that with Word headers and extension, but the result wasn’t good either.

My solution
While looking for solutions, I tried Microsoft’s Word 2003 XML format (also, check out the XML schemas (schemata?) for this format). I followed the same pattern:
– create a nicely formatted document,
– export it to Word 2003 XML,
– serve it from PHP with special headers and .doc extension.

It was pure bliss to find out that in this format the images are saved in a familiar base64 encoding (easily doable using PHP’s base64_encode()). Problem (almost) solved! “Almost” because now I have to generate the proper XML tags surrounding the embedded images – it only requires some research.

So, with only plain text you can have a properly formatted Word document with embedded images! … and PHP can handle plain text quite easily…

I’ve tried to open the Word 2003 XML document in OpenOffice.org 3.1. Unfortunately, Writer wasn’t fooled by the .doc extension and opened the document as plain text. Only after changing the document’s extension to .xml, the editor opened it correctly. So, the documents are portable after all.

How this looks in my CodeIgniter context?The View

First, I took the generated XML document, prepared it a bit and placed it into the application’s views folder. The “preparation” consisted in:
– changing the file’s extension from .xml to .php;
– making sure the file format is UTF-8 without BOM; the line ending I chose is CRLF, but you may try the non-Windows alternatives (CR or LF);
– reformatting the XML from a huge one-liner to a more readable form (Microsoft Expression Web does a great job at formatting XML/(X)HTML documents);
– removing the tab characters (\t) from the resulting document, so that they don’t mess the output;
– changing the XML headings for them to be interpreted as intended:
from:

You may also try to use {templates} (just remember to load the template parser library beforehand: $this->load->library(‘parser’);).

The Controller
I prepared the data to be sent to the view as usual. Before loading the view I added several headers, just to make sure everything goes OK:

#$doc_data is the array that is sent to the view
$filename = "report".date('dmY-his').".doc";
header("Content-Type: application/xml; charset=UTF-8");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("content-disposition: attachment;filename=$filename");
$this->load->view('templates/default', $doc_data);
return; //needed so that any redirect() after this line doesn't mess up things, like it did in my particular case

I hope this helps. Oh, did I mention that the same idea can be used to obtain formatted Excel spreadsheets (see xls-sample.xls)? :)

P.S. A more maintainable and proper solution would be to use XML and XSLT (something like this, albeit the example refers to the Office Open XML format), but this might not be as quick as simply filling placeholders in a view.

Update (5 July 2010)
See a trivial demo here. Use it to generate a sample document; then drag the document to Notepad or another plain text editor to observe the xml.

Hello,
First you need to design your document (including formatting and page setup) using Word. After that you export the document to .xml (Word 2003 XML) and you do just as I described in this article. It’s pretty straightforward.
If you want to edit document properties afterwards look for the document properties in the .xml, they’re quite easy to figure out.

Sorry, but have you read and tried the approaches mentioned in the article? It was all about saving Word documents in a special xml format with a .doc extension. These documents can be opened by MS Word and OpenOffice.org. The alternatives are also mentioned in the article, so take your time, explore them, find out their strengths and weaknesses, and choose an appropriate solution for your concrete needs. I’m afraid that’s about all I can do for you.

First, you need to study a bit the way Word XML 2003 handles images. You’d have to create a simple document in this format, in which you should include an image. See how it’s nested in the XML hierarchy (i.e. tags like w:pict, w:binData, v:shape, v:imagedata). Then you need to output the image’s content in a base64 encoded form and to convert pixels to points for the appropriate attributes (height, width). Again, after a bit of studying there should be no problem.
Here’s a glimpse of what I mean:

I had a problem with the encoding of special chars, for example with the ä (Umlaut, used in german). MS Office didn’t opened the affected documents, so i had to do some conversions. It don’t work with your demo neither.

Sorry for answering so late and thank you for pointing out this encoding issue. The demo was set up only to show how the “trick” described above works and didn’t go all the way to handle encodings and whatnot.