If you are using any word processor or editor in a group situation,
such as a technical writing team, or an office, then it will probably
be in your interest to set up templates for authors to use to ensure
consistency, reduce effort, and help automate conversation of
documents between formats, such as building web pages from office
documents. If you are also trying to store and manipulate content in
XML but want to use a word processing environment for authoring, then
well-crafted templates are even more important.

In this article, I'm going to explore some of the ways that OpenOffice.org's Writer application (I'm
using version 1.1.2 on Linux and 1.1.3 on Windows XP) is open to
customization and configuration. I'll walk through some of the
techniques I used to set up the first templates I built with the
application in my quest for an interoperable, XHTML-ready system of
templates and styles which will work across Microsoft Word and
Writer.

Here are four techniques you might like to use if you are
maintaining templates: (a) using an unzip tool to rip open the Writer
file format and get at the parts (b) using XSLT to automate production
of a large set of styles (c) adding a keyboard-accessible menu to
apply those styles, and (d) automatically generating a number of macros
to help in (c). I will illustrate the techniques using my application,
but they are easily adaptable to other situations.

If everyone's going to write for the web (and it looks a lot of people are going to), we need the web equivalents of Word Perfect and Wordstar and Xywrite and Microsoft Word, and we need them right now.

Some discussion flowed around this, with some claiming that
OpenOffice.org is an adequate solution right now (see Tim's addition
to his page) and others speculating that a new application may be
required. A wiki even appeared in
which the issue could be discussed. I joined in the
discussion and decided that OpenOffice.org and Word are both part
of the solution until something better comes along, kicking off a
project to create configuration layers for both Word and
OpenOffice.org as general purpose XHTML editors for generic
documents. In the course of the work, I have come to appreciate the
open, XML-based goodies in OpenOffice.org which is just as well,
because I would not like to customize it using the
Graphical User Interface (GUI) or deal with its macro language,
although I look forward to decent Python scripting, which appears to
be on the way.

OpenOffice.org is an open source office suite, which includes a
pretty decent word processor, Writer. Like any decent word processor,
it has a number of customization options, and like any software, it
has its own set of strengths and weaknesses. It does have a
customizable XSLT stylesheet that can be used to generate XHTML from
any word processing document, but this produces far from ideal output
unless you go to some lengths to customize it, as it is simply
impossible to produce sensible mappings from word processing
documents to XHTML in all cases. Templates are a necessity to enable
authors to work with a set of styles that will map to XHTML.(Another
major issue is that unless you actually run the XHTML export
stylesheet manually after you have saved the document in the normal
way and extracted the content, you do not even get access to the
images in your document. So at this stage, I consider the XHTML export
to be a work in progress.)

Hack 1: Unzipping and Manipulating Writer Files

Let's start with the basics: the file format. You can read about it
in detail in a forthcoming
O'Reilly title, which is available online in
draft OpenOffice.org XML Essentials—Using OpenOffice.org's
XML Data Format. We're only concerned with the Writer
application here, rather than spreadsheets and suchlike. We will be
dealing with Writer documents .sxw and Writer
templates .stw, mostly the latter. These files are both
actually ZIP files containing all your document data, with all the
configuration and textual content in XML.

Three Ways to Unzip Files Using Windows

On a Windows XP system, you can use the built-in zip function by changing the name of an OpenOffice.org file to end in .zip, at which point you can right-click on the file to explore it as though it is a directory or extract it to another location.

Or use any old zip application, possibly adding to the file associations so that you can unzip OO.o files with a right-click.

An approach I like is to grab the UnxUtils utilities, which are ports of GNU utilities to Windows.

Download UnxUtils.zip, unpack to c:\Program Files\unxutils, then add the path to the binaries C:\Program Files\unxutils\usr\local\wbin to your system path.

This gives you a selection of GNU staples for use on Windows, very handy for people like me who keep typing ls in windows instead of dir, not to mention being able to use zip and unzip from the command line exactly as I have shown in this article.

First hack, a quick exercise:

Create a new OO.o text document and type in it, something like "Hello world".

Save your new one-paragraph epic as test.sxw.

Unzip the content. On a Unix-esque system (Windows users, see the sidebar), you can probably type this:
unzip -d test test.sxw

And you will be rewarded with some component files in a directory called "test":

Open up the content.xml file in a text editor or an XML editor. This is where the, um, content of your document is kept.

Ignore everything except the part you just created. You'll find it in a text:p element, which is what? A paragraph.

Duplicate your "Hello world" paragraph.

Save content.xml

And re-zip it back together as an open office document, possibly by changing into testdir and typing zip -r ../newdoc.sxw * to give you a new document called newdoc.sxw.

If you have been careful not to break the document, then you will have a new Writer document with "Hello World" in it twice.

Now you're hacking OpenOffice.org. Why? You might like to automate
some kinds of document processing, create documents, or in an extreme
situation, make changes to a document when you don't have a copy of
OpenOffice.org. Try that with a Word ".doc" file! (Actually,
don't. See my previous article on
how to turn Word documents into XML.)

Hack 2: Adding Styles to a Template

Next step is to do some real work, this time on a template. We're
going to make a whole lot of styles. A style is a named set of
formatting instructions, so you can make parts of your document look
and function alike with the application of a single named label,
rather than having to laboriously hand-format each part of the
document. Instead of having to remember that all your headings are 18-point Helvetica, you assign a heading style to each and let the machine
format them for you. This is (a) lazier, (b) easier to change when
Helvetica goes out of fashion, (c) going to let you build a table of
contents simply by harvesting anything labeled as a heading, (d) going
to make generating XHTML easy, and (e) highly recommended.

So here's the spec for this application, where we want to transform
Writer documents into XHTML. We need styles for headings, ordered and
unordered lists with different flavors of numbering, block-quote
styles for quoting blocks of text at different levels of indenting, and
paragraphs that can be nested to continue a list item. Using these
styles, we will be able to reliably create XHTML documents from both
Microsoft Word and OpenOffice.org in a fairly consistent manner. Word
processors are really only good at flat sequences of paragraphs, but
we can use well-designed styles to create nesting for XHTML.

Family

Type

Styles names

1

2

3

4

5

Paragraph (p)

p

Heading (h)

h1

h2

h3

h4

h5

Heading (h)

Numbered (#)

h1#

h2#

h3#

h4#

h5#

List item (li)

Numbered (#)

li1#

li2#

li3#

li4#

li5#

List item (li)

Bullet (*)

li1*

li2*

li3*

li4*

li5*

List item (li)

Uppercase Alpha (A)

li1A

li2A

li3A

li4A

li5A

List item (li)

Lowercase Alpha (a)

li1a

li2a

li3a

li4a

li5a

List item (li)

Lowercase Roman (i)

li1i

li2i

li3i

li4i

li5i

List item (li)

Lowercase Roman (I)

li1I

li2I

li3I

li4I

li5I

List item (li)

Continuing paragraph (p)

li1p

li2p

li3p

li4p

li5p

Blockquote (bq)

bq1

bq2

bq3

bq4

bq5

Definition List

Term (dt)

dt1

dt2

dt3

dt4

dt5

Definition List

Description (dd)

dd1

dd2

dd3

dd4

dd5

I will leave detailed discussion of how this mapping from list
styles to XHTML will be done for another time, but I do provide a
couple of examples here so you get the flavor. The items in brackets
are the style names that you would use in the word processor. The
example would look pretty much the same in OO.o as it does here in
XHTML give or take a bit; check out the source of this page to see the
HTML:

(Style: li1*) A list bullet

(li1*) And another

(li2#) And a numbered item

(li2p) With a follow-on paragraph

(li2#) And another numbered item

(li1*) And another list item introducing a quote:

(bq2) From somebody else.

(These style names have been chosen for their brevity, regularity,
and the fact that they do not overlap with built-in or "standard"
styles in either OO.o or Word, making the job of converting between
formats simpler.)

That's a lot of styles to set up using the point'n'click method,
way too much like work for me, so my approach was to create a
blank template, open it up to see how it worked, and then use XSLT to hack
the styles.xml inside a Writer template file
(.stw) which contains, you guessed it, definitions of the
styles for this template. I did create the heading and
plain-paragraph styles by hand using the GUI, but the lists were too
fiddly to do that way.

For this part of the exercise, we are going to be operating on a
template rather than a document. To get a template:

Open a blank document in OO.o.

From the File menu, select Save As.

From the "Save as type" drop-down, select "OpenOffice.org. 1.0 Template".

Type a name, template, and the result will be new file called template.stw.

Unzip the template into a directory called template (unzip -d template template.stw).

To add styles, we want to transform styles.xml using a stylesheet which you can get here.

Copy styles.xml to old-styles.xml

On my Fedora 2 Core Linux machine, the transformation is a matter of typing:

xsltproc --novalid add-styles.xsl old-styles.xml > styles.xml

See the sidebar for advice about how to run transforms using Windows.

Using XSLT from the Command Line on Windows
or Elsewhere Using Java

The hardest part of writing this article was finding a simple way to use XSLT from the command line on Windows. The most promising candidate is called nxslt, and it uses .NET which is really easy to install using Windows update, but for some reason, it doesn't work for these open office hacks. So my best recommendation, if you don't want to go through a great deal of mucking around, is to take the advice in this xml.com article and use Saxon, which apparently means, in modern times, that you need to get yourself a recent Java runtime environment, probably from Sun. I navigated that maze, then downloaded Saxon 6.5.3, unzipped it into c:\Program files\saxon, and I was able to run stylesheets like so (adjust all the paths as required):

No guarantees, but if you take this option, then all you need to do
is reverse the order of the parameters in the examples here; input
document first rather than stylesheet.

If you want to check out the result, then skip
ahead to the part where you re-constitute a template.

Unfortunately Saxon does not have an option to turn off
validation in the source file. You will need to figure out how to
get it to see the DTD files, possibly by the brute force approach of
copying them into wherever you're working. Failing that, simply
remove the DOCTYPE declaration from the source file to stop Saxon
from looking for it, then put it back in the result. (We didn't call
this article "Hacking Open Office" for nothing). That is, cut and
paste this bit:

The first thing we need to do is to add style definitions. We do
this by finding the beginning of the place where the outline styles are
defined, using a template with an appropriate match
attribute, and slip in some other styles first.

This calls a named template make-styles, which takes as
parameters the family and type of style, as set out in the table
above. This template is used recursively to generate five levels of
style definition.

The recursion starts with a default level parameter of 5, and then
it calls itself, passing $level - 1 to the level
parameter until at $level = 0 it stops. The result is the
same as a construct like a for-loop.

This has a xsl:choose to select different formatting
for different families of paragraph style. Bulleted and numbered
styles don't get any formatting in this part, as their indenting and
so on is set further down in the named
template make-lists

Hint: You can do a lot with OpenOffice.org via
experimentation; use the GUI to set up some styles, save the
document, and have a peek inside to see what happens. Then you can
extract the relevant bits and use them in stylesheets or other
code.

Writer not only has styles for paragraphs and sub-paragraph
text-spans, but it has separate styles for lists. This can cause a few
headaches, because the correspondence between the two is a bit
fluid. You can link a paragraph style to a list style, but that does
not prevent you from later choosing a different list style. And more
problematically, each list can have multiple levels. (Yes, I have
heard of conditional styles, and no I don't think they will help in
this case).

For the project I'm presenting here, the two goals are to (a)
inter-operate with Microsoft Word, via Word .doc files,
which Writer is fairly good at reading and writing, and (b) create a
template that can later be used to create good-quality XHTML. OO.o's
list styles will cause problems for Word, which has a tighter mapping
between list levels and paragraph styles and a looser way of
combining them. There will also be trouble when creating XHTML. The
problem is that in 'normal' use of OO.o, it is very easy to end up with
paragraphs that are not formatted exclusively with styles. For
example, if you want to mix unordered lists and blockqoutes, then you
could end up with a very complex set of interactions between list and
paragraph styles and custom formatting that a stylesheet may not be
able to reliably decode.

So, my approach is to try to work with a one-to-one mapping between
paragraph styles and list styles. This is a compromise, but it means
that authors can work with paragraph styles exclusively. This is achieved by creating a list for each of our paragraph styles that has
bullets or numbering and then setting all the levels in that list to
have the same formatting, so that it does not matter if they
inadvertently get changed.

Working with Lists in Writer

When you have the insertion point inside a list, two things happen that you need to be aware of:

To the right of the 'Object Bar' (toggle it on and off under View / Toolbars / Object Bar to see what I mean), a left-facing arrow will appear.

Click the arrow, and the toolbar is replaced with a list-specific
set of buttons for changing the level of list items within the list and
restarting numbering .

You will need the Restart Numbering button, though, to force
numbering to restart when you begin a new instance of a list. You may
like to use View / Toolbars / Customize to add the restart numbering
button to the main object bar (it's under the Numbering category).

An item will appear in the status bar, bottom right of your
Writer window, for example, Level 1 : li1*.

I have designed all the list styles presented in this article to
have the same formating at all levels, so clicking the level-changing
buttons will have no visible effect.

Finally, sometimes applying a paragraph style that is linked to a
list style does not have the desired effect. In this case, you may need
to click on the Numbering On/Off or Bullets On/Off buttons a
couple of times to clear an existing list.

The final step in this hack is to re-constitute
the template. Zip the contents of the directory back into a template:

An alternative technique you might like to consider is to import
styles from your new template into an existing one--meaning you could
maintain several templates containing discrete sets of styles (lists,
headings, character styles). To import, use Format / Style / Load, and
browse to a file. You can select which kinds of styles to import and
whether to overwrite existing ones.

We have now covered two techniques for OpenOffice.org customization: unzipping documents and templates and adding styles by hacking the
styles file.

Hack 3: Adding a Styles Menu with Keyboard Shortcuts

Now that we have a new template, it is possible to apply the new
paragraph style, using the 'Stylist' (hit F11 to toggle
it on and off), and largely ignore the list Styles unless you get into
trouble. But applying styles in OO.o is painful. There's no simple way
to map styles to keystrokes, and even the stylist does not let you use
the keyboard to help select the style you want. The next stage is to
show how you can add a new 'Style' menu to the application, with
keyboard shortcuts.

The first problem is that there is no way to add a style to a
menu. First we have to add a macro and then call that from the
menu. And not just one macro--we need a macro for each and every
style. Fortunately, we can automate this process. We will tackle the
problem by starting with the menu, then using the menu to generate the
required macros. This approach means that if you want to hand-code all
or part of a menu, you can still use the stylesheet here to generate
macros for each style mentioned in the menu.

This is what the new menu will look like:

A new styles menu with keyboard access via ALT key combinations.

OO.o has a configuration system for changing menus. It is very hard
to use, and poorly implemented, so we will spend as little time in it
as possible. All we need to do is make one small change to the main
menu, and OO.o will save it as XML in the configuration directory, at
which point you can grab it and hack it using XSLT, or manually add to
it.

Open Writer.

From the Tools menu choose Configure.

Click the Menu tab.

Hit the New Menu button.

Close the dialog box.

What you have just done is make a change to OO.o's configuration,
which it will write out into a configuration directory. Where that is
will depend on your operating system. To find out where:

From the Tools menu, choose Options.

Under OpenOffice.org, in the list of categories at the left, select Paths.

Find and note-down the path for User configuration.

Close 00.o completely, including the quick start application it leaves in the Windows system tray.

Find the user configuration directory you just wrote down, and there should be a file called menubar.xml

I have supplied a sample stylesheet that
works in a way that is very similar to the setup stylesheet covered
earlier. It generates a hierarchical menu of each of the families of
styles, adding them to the old menu bar and spitting out a new
menu bar. Parts of this are hard-coded to provide the menu hierarchy,
but there are recursive parts to handle the repetition involved in
creating all those macro calls. Here is a fragment of the stylesheet
that creates the menu for 'li' styles; there is a level
parameter used here as in the previous stylesheet.

This stylesheet is designed to load, via document(), a
data file (wp-interop-styles.cml)
containing names for all the character or sub-paragraph text
styles. I generated this list by grabbing all such element names from
the XHTML recommendation and putting them into an XML data file. (To
use this as-is, you will have to either set these styles up by hand,
download the latest sample
template from my web site, or add to the setup stylesheet covered
earlier.)

The ~ character is used to indicate the appropriate keyboard shortcut.

Now we have a new menu for OO.o which will always be visible. If you start up Writer, (remember to shut down OpenOffice.org completely first) then you will be able to point and click to apply styles or use the keyboard, starting with ALT-S and then hitting the underlined characters to delve into the menus (at least you will once we install the macros needed to apply styles).

Hack 4: Generating Macros

The final stylesheet uses the menubar we just generated as its
input and creates a text-output (not XML) that can be pasted into
OO.o's macro editor.

You should now be looking at a tree control, showing the various documents and templates you have open.

Click on new template / Standard to select the standard library of macros. This will probably be empty depending on your starting setup.

Click New, to create a new module, and name it WPInteropStyles.

You should now be looking at a macro-editor window.

Paste the contents of macros.txt into the macro editor, replacing all the boilerplate code that's in there.

Clickety-click your way out of the macro editor, and save the template.

A bit of detective work will show you where the macros live within the file format once you save. Hint: look in META-INF/manifest.xml to see where your macros are stored within the Writer file-package.

In this article, I have covered a few techniques that will be of
interest to template maintainers working with OpenOffice.org writer:
how to crack open the file format, how to maintain large sets of
styles, and how to customize menus and macros, all without using anything
except standard tools, zip, an XSLT processor, and a text editor. All
this can, of course, be further automated with a programming language of
some kind, even a batch file. There are some changes coming in version
2 of OpenOffice.org, but all these techniques will be forwards
compatible, although some things like the location and name of the
menu-bar files look like they will change.