Un-riching Rich Text Format

Let’s just call it Legacy because I’m not going to say what the source is. It is a single-user desktop application that after a little investigation (i.e. searching the online forum for the app) was found to be storing its data in tables readable by MS Access. I wanted to get at this data, analyze it, maybe do some smart things to it, and then present it via Apex.

Step 1: Get the data into Oracle.

Simple matter of exporting from MS Access via ODBC. At least, it was simple once I replaced the Oracle ODBC drivers with the latest download from OTN. Before that I was getting a number of annoying TNS errors.

Step 2: Transform the data.

Most of the tables are easy-to-understand normalized relational tables. One of them, however, has a column that came through as a CLOB containing strange values like this:

Now based on my knowledge of the application I knew that this column was used to store small pieces of text (typically 8 to 30 short lines), with some amount of formatting (e.g. fonts, alignment, etc.). Again the online forum came in useful in that a side comment from one of the developers (regarding a small bug undocumented anti-feature) revealed that they stored the formatted text as RTF – Rich Text Format. Should have known from the opening 6 bytes in the data.

Somewhere in these oceans of rtf codes were swimming the plain text I craved. So Googled RTF, skimmed this old RTF specification, and ended up with this admittedly poorly-performing PL/SQL, which for the 651 rows in this table, each with an RTF of average 3KB, works just well enough for my purposes. As it turned out the only RTF codes I was interested in were \fcharset and \*, both of which I used to ignore bits of text I didn’t want in my output. Oh and \par, which denotes the end of a paragraph. I can run this script once a month on the freshly exported data and apply the full weight of Oracle’s analytic capabilities on it.

The code below exemplifies the use of a pipelined function. This is not a good idea, by the way, if you want to use it in regular queries, e.g. a view. In my case, however, I only wanted to call this from within PL/SQL, and then only once a month. Because of the way parameters work with functions like this, I had to call it with dynamic SQL (execute immediate).

That was kind of fun, but I’d rather not have to deal with RTF ever again, thank you.