Pages

Wednesday, February 13, 2013

PyRTF is a Python library that enables programmatic creation of RTF (Rich Text Format) documents. RTF files are compatible with Microsoft Word and many other leading word processors such as OpenOffice, and also compress well. PyRTF makes it fairly easy to generate RTF content programmatically, with many features such as sections, paragraphs, headers and footers, tables, etc.

Some years earlier I had done some interesting work with RTF using Java, as part of developing a product at a startup. The work basically involved reverse-engineering part of the RTF specification / format, and then writing custom Java code to generate RTF from the data in J2EE application. The RTF files could be imported into MS Word and Adobe InDesign.
The code was written in such a way as to try to keep style and content separate, so that each could be varied independently. It worked, to an extent.

Below is a simplified version of examples.py from the PyRTF package; I modified examples.py to create only one simple file instead of 7 increasingly complex ones. Save this file as small_example.py in the examples subdirectory of the directory where you extract PyRTF:

4 comments:

Mark
said...

PyRTF hasn't been updated since 2005, which, whilst not a problem, suggests you'd be pretty much on your own using it.

It turns out raw RTF isn't that hard to write, and the specs are readily available, as is a useful pocket guide. My app churns out RTF exports which can be potentially hundreds of pages long with a fair bit of structure by using an RTF Django template.

@Mark: Thanks for the tip about PyRTF not being updated nowadays. But yes, that does not prevent us from using it for stuff it already supports well.

Yes, raw RTF is not hard to write. In fact I mentioned in the post that I did that, in that Java project I worked on earlier. IIRC, I did look at the RTF spec but it was not too user-friendly, which is why I resorted to reverse-engineering the format by creating an RTF doc incrementally in Word (having first just a single letter as the content, then adding a word, a word in bold, then a paragraph, etc.) and then looking at it in a hex editor. This enabled me to figure out what characters were used as RTF markup for different types of content, such as a paragraph, bold text, italic text, etc. The rest was straightforward: just intersperse that markup as needed with the content pulled from the DB via Java.

Actually, I have just been looking into doing this too. And pyrtf-ng (http://code.google.com/p/pyrtf-ng/) is the old version. There is a newer version of the pyrtf-ng code base on launchpad. https://launchpad.net/pyrtf