My report on OOXML and ODF

Disclaimer: Work on this in the Norwegian government has
been going on for years. I worked on this for four months, producing
a 45-page report. This blog posting oversimplifies most of the way
through in the interests of brevity.

The full report is here,
and if you can read Norwegian you can post your feedback in the form
on that page.

Ever since ODF and OOXML burst onto the scene in ISO SC34 I've
tried to avoid getting pulled into the mess. I was quite successful at
this for several years, until one day one our managers at Bouvet suggested we bid for a
contract to write a report for the Norwegian government (strictly
speaking, the Agency for Public
Management and eGovernment (Difi)). The report was about whether
to recommend/require ODF and/or OOXML in the Norwegian public
sector. I couldn't come up with any valid excuses for not doing it,
and so we sent in a bid, and in the end won the contract.

The context of the report is that in Norway the government issues
a reference
catalogue listing the standards that the public sector is required
or recommended to use within various usage areas. Earlier versions of
the catalogue required use of ODF in two usage areas, and listed OOXML
as being "under observation". So for publication of editable documents
on public web sites ODF has been the required format in Norway for a
while now. (Note the word "editable"; documents which are not meant to
be edited by the recipient must be published in either HTML or PDF.)

My task was to take into account recent developments in the field
and make recommendations for how the report should be updated with
regards to four specific areas of usage. Basically, this meant
following up the "under observation" part. Should the catalogue also
make OOXML required or recommended for some of these areas? Or
something else entirely?

Method

Approaching a task like this was not easy. What recommendations
would make sense? And how to justify them? What does the Norwegian
public sector actually need? That last question gave me a place to
start. If I could put together some use cases that should show me what
functionality users would need. I could then check the description of
that functionality in the standards, and also do some testing to see
if interchange of documents using this functionality would work in
practice.

So this is what I did. I came up with a small set of use cases for
each of the usage areas. Very briefly, it goes like this:

Web publishing

#1: Forms (fill out, send in electronically)

#2: Templates (proposed templates for various kinds of documents)

#3: Contracts (proposed standard contracts, to be edited)

Attachments to emails from public sector to private

#4: Forms (receive via email, this time)

#5: Contract writing (with a private-sector supplier, for example)

Attachments to emails within public sector

#6: Collaborative authoring

#7: Interchange of budget data

The list was produced through interviews with colleagues and
various representatives from the public sector. I realize the list is
very short, but remember that most documents are not meant to be
edited by the recipient, and for these documents the public sector is
required to use HTML or PDF. Note also that, as you'll see, adding
more use cases is very unlikely to change the final conclusion.

From these scenarios I then drew up a short list of the necessary
functionality:

Basic formatting (paragraphs, lists, tables, etc; all use cases)

Change tracking (#5 and #6)

Comments (#5 and #6)

Spreadsheets with formulas (#7)

Spreadsheets with macros (#7)

Forms (protected against editing with a password; #1, #2, and #4)

Sunset on Canary Wharf

The specs themselves

Now, I was asked to consider two specifications only:
ECMA-376:2006, which is the very first OOXML standard (not the one
later published by ISO), and ODF 1.1. Together these two documents run
to 6783 pages, which was a bit much for me to digest and consider in
the limited number of hours I had at my disposal. I therefore decided
to focus on the specification of the specific functionalities in the
list above (except the first one), and to look for general reports of
problems in the two specifications to get a feel for the quality of
each.

For ODF 1.1 the results were basically as follows:

General quality

Lots of errors, and quite a few holes where things basically are
not specified at all. The mistakes I found were mostly minor (that
is, very limited in scope).

Change tracking

The handling of change tracking in running text is quite fair,
but doesn't seem to be complete. Change tracking in tables, lists,
formulas, etc is missing.

Comments

Looks perfectly fine to me. I'm not sure about the parts that
describe how comments are placed, but then no-one seems to implement
that, anyway, and positioning of comments is not that important.

Spreadsheets with formulas

This was my first surprise. The specification of formulas just
isn't there. Section 8.1.3 of the spec discusses formulas, but is
very vague. There's no formal grammar, no list of functions, no
list of datatypes, and no evaluation model. Basically, it says
formulas should start with "a namespace prefix", then "=", and has
some informal prose on how to refer to cells and ranges. That's all.

Spreadsheets with macros

This didn't really come as a surprise: no macro language or API
for macros is defined. There's a defined place to put the macros, an
attribute for saying what language you used, and various bits of
documents have places where you can put event handlers, but that's
all.

Forms

There's a fairly big and detailed section on forms with various
types of controls and so on. To my surprise, there are even
mechanisms for connecting this to databases (not relevant for our
purposes, but interesting, anyway). There's also a mechanism for
making a section of a document read-only, and to do it you put a
hash of the password into an attribute. Unfortunately, nothing is
said about how to produce the hash, which rather reduces the value
of the mechanism.

For OOXML the results were like this:

General quality

As everyone knows there's lots of errors and mistakes in the
ECMA-376:2006 specification. Even the RELAX-NG schemas that come
with it turned out to have syntax errors in them.

Change tracking

OOXML spends 120 pages on this, a lot of them duplicated. The
functionality is very detailed, going into table changes, formatting
changes, list numbering changes, etc etc. I couldn't digest it all,
but as far as I could tell it was solid.

Comments

Perfectly fine.

Spreadsheets with formulas

People have made much of the date problem (no pre-1900 dates, 1900
is incorrectly specified as a leap year), but this part of the spec
is mostly quite solid, and the date problem does not appear to be
very relevant for the public sector. There is a formal grammar,
datatypes, function definitions, etc etc. Yes, there are errors and
so on, but at least it's specified in full detail.

Spreadsheets with macros

Essentially the same as for ODF: not specified.

Forms

This is described in extensive detail in the spec. The XML
modelling of forms looks like it's a direct translation from the
original binary format (which it probably is), so it's not exactly
beautiful, but as far as I can tell everything you need is there and
fully specified. The read-only protection mechanism is fairly
complicated (because it's connected with the encryption mechanism),
but again looks fully specified.

In short: ODF 1.1 has a huge gaping hole in it as far as
spreadsheets are concerned and is full of errors and omissions.
ECMA-376 appears to have all the necessary functionality, but is also
full of errors. I made no attempt to judge which of the two has the
greater density of errors.

Both specifications also have stability issues, although this is
worse in the case of OOXML than for ODF.

The implementations

Wikipedia lists a good number of implementations for both formats,
so I picked the ones that, as far as I know, have a reasonable set of
functionality. Then, for OOXML I considered those which could write
OOXML, and for ODF those which could write ODF. Interestingly, there
was a reasonable number of each, and for both formats there was a
choice of more than 2 implementations on each of the Linux, Mac, and
Windows platforms.

In theory it therefore looked like both formats could be used. If,
that is, the tools really supported the formats well enough. The only
way to verify that was by testing. I made very simple test documents
for each of the functionalities listed above in the reference
implementation (MS Office for OOXML, OpenOffice for ODF), then opened
these in the other tools. If successful, I would make some changes,
save to a new file, and open in the reference tool again.

The results were downright depressing. For OOXML, in most cases
none of the tools came up with usable results. For change tracking
NeoOffice actually worked. And for spreadsheets NeoOffice and Gnumeric
both worked fine. (IBM Lotus Symphony and Google Docs also read the
spreadsheets correctly, but they can't write OOXML.)

For ODF, in most cases only IBM Lotus Symphony (which is really a
fork of OpenOffice) was successful. For comments Microsoft Office (!)
and AbiWord also got it right. For spreadsheets the latest Gnumeric
for Windows also got it right.

In short, if you want to use ODF or OOXML today, then apparently
for OOXML you must use Microsoft Office and for ODF you must use
OpenOffice or IBM Lotus Symphony. Or, alternatively, you can use
another tool and do lots of manual cleaning up.

I realize that the testing I describe here is very superficial, and
I would not on the basis of this testing have made the claim that
interchange between tools works. But most of these very, very simple
tests failed. My conclusion is that if not even the simplest cases
work then real documents are definitely not going to work.

Field, Flåm, Norway

Conclusion

By now I guess the conclusion should be obvious. I couldn't
recommend either format. Both specs are of very low quality, and for
neither format do you have much of a choice of tools. For the public
sector this would essentially mean having to agree not on a format,
but on a single tool to be used sector-wide. The purpose of creating
standards should be to achieve interoperability, but in this case that
just hasn't happened yet.

Having said that, ODF 1.2 looks like it will satisfy nearly all the
shortcomings with ODF 1.1 that my report identifies. Similarly, it
looks like the next OOXML version (ISO/IEC 29500:2008 amendment 1)
will solve most of the OOXML issues. If the implementors follow up and
improve their converters things will look much brighter.
Unfortunately, this is going to take a couple of years.

So my conclusion in the report is that both standards should be
listed as "under observation" for all usage areas.

(Note that this describes version 0.9 of the report. Version 1.0 is
due within a month. Feedback over the next week or so is very much
welcome.)

Now what?

If previous experience with the OOXML/ODF war is any guide, now
follows the part where lots of people get very upset. That's life, I
guess.

I went to this job with a genuinely open mind, curious about what I
would find, and was really disappointed with the outcome. I knew the
specs had problems, but I really thought they were better than this.
That the tools were as poor as they are came as an even bigger
surprise. In the end, given the results I got I really had no choice
about the conclusion.

Maybe there is more to back this up in your full report, but there appears to be a "logical leap" from your implementation tests to your conclusions about the standards. For example, OOXML has 120 pages on change tracking, which you say is "solid". But then you report that it was only interoperable with NeoOffice. Does your report explain why this is so? And why you conclude that the spec has problems, when that appears to contradict what you said earlier about that section of the spec being "solid"?

The Kesan and Shah study a few years ago had a similar approach, and that approach also caused their paper to fall short of illuminating what causes situations like this. Unless you are willing to grapple with what exactly is causing a specific interoperability flaw in an implementation, you are unable to distinguish between the cases of:

A) Implementations that are unable to implement a feature interoperably.

or

B) Implementations that are unwilling to implement a feature interoperably, or even to implement it at all.

Historical, we've seen A avoided even without standards, such as the level of interoperability with legacy 1-2-3 wk1 files or Office XP-era doc files, based on vendor file format disclosures. And B is still possible even in the presence of mathematically perfect specifications.

I'm not saying that it is impossible to identify an interoperability defect and trace it directly to a defect in the standard. But it is incorrect to assume that 100% of interoperability defects are caused by errors in the standard, and 0% are caused by implementation defects and 0% are caused by intentional efforts to reduce interoperability. To really understand what is going on, you need to talk to the vendors, and ask them why a particular feature does not work. In my experience working with ODF implementers, A is almost never the reason.

It is worth noting that standards like HTML achieved their highest levels of interoperability, not by publishing amendments and corrigenda, but by encouraging vendors to implement the standards fully. In other words, interoperability was only achieved by a change in attitude by the vendors, not by a change in the standard. Often we'll see a desire for interoperability to occur first with the vendors, then worked out technically and only then standardized. That's how we got EcmaScript. In many -- perhaps most -- cases the standard does not drive interoperability, but documents that interoperability that is already being achieved. To take it in the other direction is like saying a marriage license makes a couple fall in love.

If you look at the ODF story, we went from Microsoft refusing to implement it to Microsoft committed to implement it for at least the next 10 years. None of this involved the change of even a single line in the ODF standard.

In any case, you might also want to look at the W3C Note "Variability in Specification" which gives a much-needed framework to discuss the relationship between specification conformance and interoperability.

Also, I wonder, as you took into account "recent developments in the field", did you find room to mention the several multi-vendor ODF Plugfests we've had? Or the work of the OASIS ODF Interoperability and Conformance TC? (We have a recent report on the topic you should read, if you have not already). I think these activities are very relevant. I don't read Norwegian but I see your report cites the Microsoft-sponsored work at Fraunhofer, but I didn't see any mention of any interop work at OASIS or the OpenDoc Society's Plugfests. I hope this is an oversight.

In any case, if you have indeed identified "lots of errors" in ODF 1.1, I hope you will submit a list of them, either to the OASIS ODF TC's comment list directly, or to SC34/WG6. Although I think the impact on interoperability is minor, we're always pleased to fix reported errors.

I can't speak for the quality of the OOXML specifications, especially IS 29500. I have how uneven the specification can be in some spots, but I've never gone deep into much of it.

After an 18-month immersion in the OASIS OpenDocument TC, I can't fault your appraisal there either. ODF 1.2 will be tighter in spots, and there will be a spreadsheet OpenFormula specification, though some of the places you identify in ODF 1.1 are still dodgy.

Considering that you had limited time to carry out your appraisal, I would say that you have done a very creditable job. Bravo!

The report you have written is very interesting and for the first time, it seems that the one writing the report has actually looked into each spec and not just copied unsubstantiated claims off the internet.

Thank you for this :-)

I made som app-testing using a "real" document some time ago. You can see the results at

I dissagree on your "do not include neither format" - but I'll see if I can include that somehow in my feedback on your report.

Lars Marius - 2010-05-10 03:15:54

@orcmid: Thank you. :-)

@Rob: First of all, thank you for taking the time to comment in depth. It's always valuable to get some critical feedback on a piece of work like this.

There is little in my report, or in my work, to indicate why only NeoOffice appeared to support OOXML change tracking correctly. OOXML documents with change tracking written by Go-oo wouldn't open in Word at all (not sure whose fault that was). And AbiWord ignored the change tracking completely. So I have little basis for saying why this failed, although my impression is that AbiWord doesn't implement this at all.

As you say, it would be valuable to know why the different implementations of both formats failed, but short of tracking down and interviewing the developers that's difficult. And in the limited time I had for this work there was just no way there would be time for something like that.

> But it is incorrect to assume that 100% of interoperability defects
> are caused by errors in the standard, and 0% are caused by
> implementation defects and 0% are caused by intentional efforts to
> reduce interoperability.

That's certainly true. Ultimately, for our purposes, it doesn't matter that much. We can't recommend this standard for use in the public sector if interchange doesn't work. Why it doesn't work is secondary.

Having said that, I put so much emphasis on the quality of the specs because they give some indication of what one can expect from implementations, and also to some degree point to what one can expect in the future.

The general picture I see in both the specs and in the implementations is one of immaturity. That is, the quality of both are headed in the right direction, but it takes time, and we're not there yet.

The report makes it very clear that I expect both the specs and the implementations to improve in the years to come, and that I expect the conclusions in the report would be different if it were written in, say, 2012 or 2013.

> you might also want to look at the W3C Note "Variability in
> Specification"

That does look interesting. I never heard of it before, but will have a look.

> Did you find room to mention the several multi-vendor ODF Plugfests
> we've had? Or the work of the OASIS ODF Interoperability and
> Conformance TC?

I saw these, but did not mention them in the report. However, now that you mention it, I realize that it would be useful to cite these in the section that discusses reasons to believe interoperability will be better in the future. So I'll do that in version 1.0. Thank you for suggesting it.

> I don't read Norwegian but I see your report cites the Microsoft-
> sponsored work at Fraunhofer, but I didn't see any mention of any
> interop work at OASIS or the OpenDoc Society's Plugfests. I hope
> this is an oversight.

I cite the Fraunhofer report because section 3.4 of the report is essentially about that report. It's also the source for a mention of a problem with list counting in 3.2 somewhere.

I wouldn't call not citing the ODF interop work an oversight, but I agree that citing it will improve the report.

> In any case, if you have indeed identified "lots of errors" in ODF
> 1.1, I hope you will submit a list of them, either to the OASIS ODF
> TC's comment list directly, or to SC34/WG6. Although I think the
> impact on interoperability is minor, we're always pleased to fix
> reported errors.

My main source for the assertion that there are lots of errors is the ODF error database, to which you kindly provided the link. I trawled the errors in the there, and checked some of them against the spec, and even described those in the report. I also found some errors either mentioned elsewhere or on my own, but as far as I can tell these are all in the database already (and all of them listed as fixed in ODF 1.2, if I remember correctly).

The exception is the problems with change tracking in ODF. I've discussed this briefly with Patrick, and he suggested I report this in the database, which I intend to. However, first I need to get version 1.0 of the report out. Then I want to take the time to write a proper error report.

Lars Marius - 2010-05-10 03:25:45

@Jesper: Thank you for those links. I may refer to those in version 1.0. It's interesting that your results are so close to mine.

Interesting report, Lars. I look forward to reading the final version (in an English translation, of course).

You mentioned that for ODF you must use OpenOffice.org or IBM Lotus Symphony. Given that those applications are from the same code base, and Symphony has recently been re-aligned with the OO.o 3.x code base, do you see those as two separate implementations, or two variations of one implementation?

Lars Marius - 2010-05-10 12:52:22

@Doug: I don't know if there will be an English translation. Difi says they are negotiating a general agreement for English translation services, and once that's in place they may have some reports translated if there are requests for them. This report might be a candidate, but it's really not for me to decide.

There is actually a section in report considering how to count the OpenOffice derivations. I looked at Go-oo, NeoOffice, OOo, and Symphony, and decided that Go-oo and NeoOffice did not count. Symphony I decided to count as separate, as apparently they parted company with OOo at version 1.1.4. However, the report does note that the tools are very similar.

I realized they were going to add OOo 3.x code into the next version, and said as much in the report, but these slides make it sound as though they're going a bit further than that. I guess I could say currently there are 2 different tools, and in the future there will be 1.5 different tools. Tough call, this.

(Anyway, thank you for the link. Nice to finally see some detail on Symphony.)

Is interoperability possible among web office sites?
&#8544;. Purpose
1. Online document transmission. As a part of Internet, is it necessary to realize direct sending and receiving documents among web office sites? Should online documents transfer protocol be “advancing with time, harmonious sharing”? Whether online documents transfer protocol should transition from the web prior stage (SMTP plus attachments) into web stages such as http or xmpp or html5 or web Socket or SPDY or not?
2. Online document interoperability. After completion of document transmission, can the interoperable webpage document be realized? In other words, the webpage editor of another website may open or edit this file, even may realize collaborative editing or visit cross-domain ?
3. How to generalize? What methods will be better to persuade existing 20 web office providers (such as Google docs, MS office 365, icloud, zoho.......) to accept this kind of interoperability cross platform?