Where everyone knows your name: ODF 1.1 formula support in Office SP2

Plus 2 updates

There is a slightly interesting standards aspect to the ODF 1.1 interoperability problems that MS Office SP2 is caught up in. I have not downloaded SP2 yet, so I am relying on what the various blog items have described as the problem.

It looks like just another of those tedious things that should go away in the next release of applications, when the other applications catch up: it is difficult not to get frustrated with this, but that is the incremental approach ODF decided to take.

The problem is that the ODF standard recommends sticking a prefix to the front of formula, so that the formula notation can be clearly identified, but most this causes applications to protest and do some default behaviour (strip the formula, replace the value with the formula, etc.). [[I don't expect any decent interoperability in formulas until we have wide support for both Open Formula and OOXML formulas, and that will require all applications to have proper support for the standard named libraries of functions: I'd guess that in 2 years time it will be realistic to demand interoperability.]]

But how could the other applications' developers know ahead of time what namespace Microsoft would choose, before Microsoft did? Whenever any application uses this feature it will generate data that the other applications cannot understand. (The easiest solution is for vendors just to strip out any starting prefix, I suppose.) That is just the nature of this kind of beast. You get a short-term incompatibility that gets fixed when everyone implements a more inclusive policy or smarts about the new namespace.

[UPDATE: I don't think I was clear enough here. My view is along the lines that Microsoft should try to allow as much interoperability as it can by at least accepting the ODF notation in incoming spreadsheets to whatever extent is convenient even if they generate output in the traditional syntax, and that other vendors should do the same in reverse by coping with importing in the tradiational syntax and Microsoft's namespace to whatever extent is convenient. And, in the medium-term, support both Open Formula and OOXML syntaxes and function libraries for maximum interopability.]

Here is what IS26300 says about spreadsheet formula:

Formula

Formulas allow calculations to be performed within table cells. Every formula should begin with a namespace prefix specifying the syntax and semantics used within the formula.... (s8/1/3)

Here is what the Microsoft release notes say about spreadsheet formula (click on page 7):

The standard defines the attribute table:formula, contained within the element <able:table-cell>, contained within the parent element <office:spreadsheet table:table-row>

This attribute is supported in core Excel 2007. This attribute is supported in core Excel 1. When saving the Table:Formula attribute, Excel precedes its formula syntax with the "msoxl" namespace. 2. When loading the attribute Table:formula, Excel first looks at the namespace. If the namespace is "msoxl", Excel will load the value of Table:formula as a formula in Excel. 3. When loading the Table:formula attribute, if the namespace is missing or unknown, the Table:formula attribute is not loaded, and the value "Office:value" is used instead. If the result of the formula is an error, the element <text:p> will be loaded and mapped to an Error data type in Excel. Error types not supported by Excel are mapped to #VALUE!

(I would suppose that a namespace prefix means there is a namespace USI declared too in the XML. The specification could be clearer on this.)

To my mind either the problem is in the short term only and intrinsic to the ODF feature, or the problem does not lie with Microsoft for making their choice, nor with other implementers for making their choices, but with the ratty choice of markup used for this feature in ODF 1.n itself. However, I do think that a reasonable person could think that MS should have opted to ignore 8.1.3 for the sake of immediate interoperability too: swings and roundabouts.

Obviously spreadsheets are a case where efficiency is paramount. So the normal solution of adding an attribute to each cell could expect a lot of resistance. But sticking that attribute value inside the formula is a real hack that could only ever cause problems.

The solution should be to dispense with the feature entirely: make it a table or column-level feature, for example with an attribute on the table element.

I suppose that the reason behind it is the idea that you might cut and paste or link between worksheets or ranges with different notations, and so you would want to notate and preserve the information. I don't hold myself out to be a spreadsheet expert, but from what I can make out most spreadsheets work by translating the external formula syntax into their own internal format. So I don't see why we need to support round-tripping of different formula notations in the same table.

I think the ODF TC should fix this up for ODF 1.0/IS26300. ODF 1.1 and ODF 1.2 quickly. Deprecate this feature and add table (or column level) equivalent. It is a dud feature which causes problems.

And the ODF TC might perhaps re-examine the wisdom of having features which, when used as specified, cause short-term incompatibilities. They then might consider whether this is in fact the course that they are continuing to take by deferring looking at alternative content until ODF Next Gen, by which stage it will be too late and we will be having this same conversation about some new Next Gen features.

Of course, the claim is that Microsoft will have chosen the particular conforming markup that would break with external data, to keep their lock-in. For all I know, it may be completely true. I don't have any information that would suggest it though. It would seem a fairly dumb tactic however, since it just takes other vendors to support the namespace prefix: what lock-in would you get from something that is so easily defeated?

I think we can see the old issue though: the expectation that interoperability will be easy and come easily; and that it can come from small incomplete standards. I think I made the comment before, but it wouldn't surprise me if ultimately ODF (including external referenced standards) will end up being closer to the size of OOXML (at least, with OOXML Strict) than of ODF 1.0.

Update 1

I see some items saying that apart from the namespace prefix issue, the problem is also that the generated formulas don't use the standard ODF syntax.

The wording of the ODF standard seems unambiguous: both the syntax and the semantics of a formula are determined by the namespace prefix (see the clause quoted above.) So different syntaxes are explicitly allowed. ODF then goes ahead and describes a "typical" syntax, but it does not give a BNF or anything like a formal description of it. For example, there is no text describing the "." prefix, but the examples use it. Wording like "can" and "is an example" are those of non-normative text.

At the very least, it is regrettable drafting. One of the disciplines that ISO review is supposed to bring is clarity in the usage of could, should, must, can, may and other terms: the status of every part (whether normative required, normative optional, normative deprecated, normative disallowed or informative) should be clear. Indeed, a real effort was made to make IS26300 conform to the ISO requirements (see IS26300 s1.2 Notation) but the use of the particular ISO fast-track procedure by OASIS (the PAS procedure) in effect by-passed this review at ISO. The result: a standard with holes.

Now this is not to claim that SC34 review would have found all these problems automatically. And I am sure that this is being looked at by OASIS ODF TC. But standards are hard. They say the sign of a good writer is that they welcome a good editor: writing and speccing is just plain hard, and the only way to get a decent result is to plug away with reviews: the more that review does not happen up-front the more it must be done as part of maintenance.

Update 2

Interestingly, when I wrote that blog I read the ODF spec to require the . notation, which is how Rob Weir sees it. But now that I have looked at the spec more closely, I don't think ODF actually has any words that supports that argument: in particular, the dot notation given by s8.3.1 is needed by other functions and I don't see any wording that makes it a requirement for fomulae. Indeed, the statement of purpose page for the ODF Open Formula effort at OASIS even mentions that currently spreadsheets can use "arbitrary" languages.

Alex Brown has a very sensible take on this:

It's just I notice different reasonable and qualified people (and not just MS) interpret this differently. That indicates the standard is faulty.

I would agree with Marbux's comment on that blog:

I agree with you about the practical interoperability difficulties posed if Microsoft does use Excel markup rather than OpenFormula markup in its ODF implementation. But it would be a largely avoidable interoperability barrier, not an issue of conformance.

And I certainly still hold the position

What is required? All spreadsheet implementations need to provide the feature set that supports Open Formula and OOXML: all those functions and library referencing.

Tags:

You might also be interested in:

4 Comments

@Rick: "in the medium-term, support both Open Formula and OOXML syntaxes and function libraries for maximum interopability."

ODF 1.2 will mandate the use of OpenFormula in conformant documents (see extracts of the latest ODF 1.2 draft below). I don't see why you recommend to support both Open Formula and OOXML. OOXML formulas must of course be supported by filters that read an OOXML spreadsheet. But I see no use of supporting OOXML formulas in an ODF document. Could you please explain what you meant exactly ? Thanks.

Note that the requirement to prefix formulas with a namespace prefix is (currently) maintained in ODF 1.2.

Extracts of OpenDocument-v1.2-cd02, dated 28 April 2009

1.4.2.4 Conforming OpenDocument Spreadsheet Document
(D4) A conforming OpenDocument Spreadsheet Document shall meet all requirements of a Conforming OpenDocument Document, as well as the following additional requirements:
[...]
(D4.4) All table:formula attribute values shall be namespace prefixed and the namespace prefix shall correspond to the "urn:oasis:names:tc:opendocument:xmlns:openformula:1.0" namespace.
(D4.5) All table:formula attribute values shall conform to the syntax defined by OpenFormula.

16.17.1.24 formula
A string containing a formula. Formulas don't have a predefined syntax, but should start with a namespace prefix that specifies the syntax used within the formula.

17.645 table:formula
The table:formula attribute specifies a formula for a table cell. Formulas allow calculations to be performed within table cells. Every formula should begin with a namespace prefix specifying the syntax and semantics used within the formula. The formula begins with an equal (=) sign and can include the following components:
● Addresses of cells that contain numbers. The addresses can be relative or absolute, see 8.2.1.2. Addresses in formulas start with a “[“ and end with a “]”.
[...]

Luc: Thanks for the clauses: I think it is always useful. They look like they need more work: any clause with "don't" looks like an early draft, so no panic. I would expect some language recommending Open Formula for example.

Why support both syntaxes in the medium term? Postel's law.

I don't expect users will see any sense in an application that can import .XLS with Excel syntax being unable to import ODF with exactly the same formulae. Not supporting a syntax you already support in the name of interoperability is borderline perverse; I suppose some people might think a clean break is better (no pain no gain!)

AndréR: I don't understand your comments. They appear to be directed at some other blogs.

Perhaps you could respond to the things I write here in my blog, here, and respond to the things other people write elsewhere on those pages?

For example, I don't believe I ever called anyone part of a tinfoil hat brigade, though it is a good term. In fact, the first reference I can find to it when Googling "tinfoil ooxml" or "tinfoil Jelliffe" is from Rob Weir in a blog item about how awful it would be for a standard not to fix the specific version of the notation used and how clear the ISO language "shall" is. I don't know if there was some prior name-calling, but if someone chooses a name for

News Topics

Recommended for You

Got a Question?

Do you have a question about O'Reilly's products and services? Share an idea! Report a problem...