Can I mention that it’s also in ODF?

Can I mention that it’s also in ODF?

I try to keep the discussion on this blog primarily focused on the area I care most about, the technology behind Open XML.

Every so often I've mentioned ODF, especially when discussing the translation and harmonization updates, but in terms of discussing the format itself I've assumed that other folks have more interest in it than me, so I've left it to them to discuss. Surprisingly though, many of the ODF folks would rather discuss Open XML than ODF. I haven't seen a lot of vocal pro-ODF blog posts out there, but I've seen tons of anti-OpenXML posts.

Of course it's entirely possible to be both pro-ODF and pro-OpenXML, like the ODF Editor, Patrick Durusau, for example.

But there's also a camp that identify being pro-ODF with having to attack OpenXML. They don't seem to care that deeply about discussing ODF, just about stopping OpenXML. Those are the folks I refer to as being "anti-OpenXML".

I've noticed a common thread within the anti-OpenXML camp in the past few months (actually they've been doing it longer than that). They claim that a "flaw" in Open XML is totally unacceptable despite similar "flaws" in ODF because "we aren't talking about ODF, we're talking about Open XML, and two wrongs don't make a right."

Basically the anti-OpenXML folks' goal is to constantly keep focusing on any aspect of Open XML they can claim is a flaw, and then try to blow it up into a huge deal (see some of Rob Weir's latest posts as examples of this). This is easy to do of course; it's like one of those trashy entertainment shows reporting on the Oscars, and ripping on everyone's dress. It's easy to tear things down and try to position everything in its worse light, but of course if you try to turn the attention back at them they quickly change the subject. This is why the anti-OpenXML folks are very nervous about the discussion ever turning towards ODF. You see, as anyone who's worked with the spec knows, ODF has flaws just like every other spec, some of them could even be considered to be major. It's still developing into a useable spec with folks like Patrick Durusau leading the way in a constructive and calm manner.

So let me take some of the issues that have been raised about Open XML and show how they apply equally and often more so to ODF. This isn't to denigrate ODF in any way, it's to demonstrate that the anti-OpenXML folks' political games are just that – games that should be left to the politicians. So here goes:

Spreadsheet Dates

Folks have been discussing this one for about a year now; it's about the date format used in SpreadsheetML. In the Ecma responses and even in the BRM, we made a number of changes to make everyone happy. Here is where we are now with dates in SpreadsheetML:

There is a new date datetype for cell values, and the only way to store into that datatype is with ISO 8601.

For calculations primarily, there is often the need to convert from a date into a serial value or back. For this purpose you need to know the date base, or epoch, and in the ISO version of ODF there are essentially 3 date bases (only one of which is transitional)

The leap year bug, and issues around dates before 1900 are now only allowed in transitional conformant files, but if the file is strict it cannot use these.

Let's take a look at ODF now:

You can store a date as an xsd date type (which is actually different from ISO 8601 in a few ways including the way to treat the year zero). What many people don't know is that in OpenOffice you can also store a date as a serial number. These two date formats are exactly the same in OpenOffice:

Now, the two values above are considered the exact same date, on one condition, and that is if date-value tag looks exactly like this:

<table:null-datetable:date-value="1900-01-01"/>

Here is what the ODF spec says for this element:

"The <table:null-date> element specifies the null date. The null date is the date that results in the value "0" if a date value is converted into a numeric value. The null date is specified in the element's table:date-value attribute. Commonly used values are 12/30/1899, 01/01/1900, and 01/01/1904"

Of course, the null-date value (essentially the epoch) could be set to anything, and that would change how you interpret that first date I show above. So while the anti-OpenXML folks are claiming that there are too many date bases in Open XML, in ODF you have an infinite number of date bases to deal with.

The values set here also affect calculations of course, but that never came up in the ISO review of ODF because as we all know by now there is no definition for spreadsheet calculations/functions. The upcoming draft of OpenFormula though which the OASIS folks are working on does indeed have these same behaviors where you convert a date into a serial value using the epoch before you perform the calculation.

Printer settings

In Open XML, printer settings for all three document types (WordprocessingML; SpreadsheetML; PresentationML) are already fully defined in the standard. Part 4 clause 4.3.1.26 defines a number of relevant printing properties for presentations. Part 4 clause 3.3.1.61 defines page setup properties and 3.3.1.68 defines other print options which affect printing of a spreadsheet. Part 4 clause 2.6.19 defines the section properties of a wordprocessing document and contains many print related settings. So there are a large number of printing properties fully defined in the Open XML specification based on the usage in the existing corpus of legacy documents.

Beyond these print settings however, there is also an option where a specific printer manufacturer can store their own settings with the document. These may include things like whether or not to use a staple, or other more advanced professional printing behaviors (like binding, etc.) that may be specific to that particular printer and the functionality it supports. It is outside of the scope of DIS 29500 to define a generic mechanism for the printing industry to use, but as the Ecma response states, when the industry decides to create a standard for representing these types of printer settings, the spec should be updated to reference that standard.

If you look at ODF on the other hand, no printing properties are defined (zero). Instead it is up to applications to use the generic extensibility mechanism defined in section 2.4. Beyond that no guidance is provided for how to store any printer settings. Because of this, applications like OpenOffice use the extensibility mechanism to add their own set of properties, very similar to what's already fully defined in DIS 29500. For the settings that are printer specific however, they use a binary blob of data as follows:

Password Hashing and Encryption

We had a bunch of people complain that the hashing algorithm for passwords in Open XML was not clear enough in terms of the documentation. We completely cleared that up between the responses and the BRM to the point where anyone can recreate the same hash, and there is a very robust and flexible way of specifying which has algorithm you use to stay up to date.

What does ODF do here? Well, here is what is in the spec that went through ISO:

To avoid saving the password directly into the XML file, only a hash value of the password is stored.

Great, it's hashed! But how do you create the hash? If I'm writing an application, how do I implement it to work with other applications?

Allows binary formats

There have been complaints that binary formats are allowed in OpenXML. We cleared that up during the review process leading up to the BRM.

Let's look at ODF though:

Section 2.4.2 Base Settings:

Allows for a config:type=base64Binary

Section 9.3 Frames

Frames can contain:

Text boxes

Objects represented either in the OpenDocument format or in a object specific binary format

…

Section 9.3.2 Image

The image data is contained in the <draw:image> element. The <draw:image> then element contains an <office:binary-data> element that contains the image data in BASE64 encoding.

…

If required, the draw:filter-name attribute can represent the filter name of the image. This attribute contains the internal filter name that the office application software used to load the graphic.

[Brian Note: No clue what the list of filter-names are, where they come from, how you structure them, or how you are supposed to get interoperability from this]

Section 9.3.3 Objects

A document in OpenDocument format can contain two types of objects, as follows:

…

Objects that do not have an XML representation. These objects only have a binary representation, An example for this kind of objects OLE objects (see [OLE]).

…

The <draw:object-ole> element represents objects that only have a binary representation.

Section 9.3.5 Plugins

A plugin is a binary object that is plugged into a document to represent a media-type that usually is not handled natively by office application software. Plugins are represented by the <draw:plugin> element

Weak Conformance?

Some folks have tried to claim that Open XML's conformance is weak. The conformance of Open XML is pretty well defined, and is set up to allow for applications to come along and only support a particular piece of the schema without having to implement the rest (imagine a server app that only operates on the data in a spreadsheet but doesn't deal with the formatting, charting, etc.). We also have conformance classes, so that a government for example could say it wants to buy an application that is conformant only to the strict schema, and not the transitional. This is all set up in the conformance clause.

What does the ODF conformance say?:

There are no rules regarding the elements and attributes that actually have to be supported by conforming applications, except that applications should not use foreign elements and attributes for features defined in the OpenDocument schema.

Reuse of existing standards

ODF claims to reuse a number of existing standards, but it's never quite that clean. Jesper has some great examples of a couple odd cases:

Calendar Definitions

There were a number of complaints claiming the Open XML only defined a sub set of the calendars fully and needed to define them all. This was done, and now every calendar definition has a normative definition.

Here's an example of some of the calendars defined and allowed by Open XML:

The values for this calendar should be the representation of the French strings in the corresponding Arabic characters (the Arabic transliteration of the French for the Gregorian calendar).

hebrew (Hebrew)

Specifies that the Hebrew lunar calendar, as described by the Gauss formula for Passover (Har'El 2005) and The Complete Restatement of Oral Law (Mishneh Torah) (Maimon n.d.), shall be used.

hijri (Hijri)

Specifies that the Hijri lunar calendar, as described by the Kingdom of Saudi Arabia, Ministry of Islamic Affairs, Endowments, Da'wah and Guidance (Kingdom of Saudi Arabia, Ministry of Islamic Affairs, Endowments, Da'wah and Guidance n.d.), shall be used.

japan (Japanese Emperor Era)

Specifies that the Japanese Emperor Era calendar, as described by Japanese Industrial Standard JIS X 0301, shall be used.

korea (Korean Tangun Era)

Specifies that the Korean Tangun Era calendar, as described by Korean Law Enactment No. 4 (Korean Law Enactment No. 4 1961), shall be used.

saka (Saka Era)

Specifies that the Saka Era calendar, as described by the Calendar Reform Committee of India, as part of the Indian Ephemeris and Nautical Almanac (Calendar Reform Committee 1957), shall be used.

taiwan (Taiwan)

Specifies that the Taiwanese calendar, as defined by the Chinese National Standard CNS 7648 (Bureau of Standards, Metrology and Inspection of the Ministry of Economic Affairs n.d.), shall be used.

number:calendar attribute (section 14.7.11)

The attribute may have the values gregorian, gengou, ROC, hanja_yoil, hanja, hijri, jewish, buddhist or an arbitrary string value. If this attribute is not specified, the default calendar system is used.

Yup, that's it.

Fast Track Process

While the anti-OpenXML folks are trying to claim the fast track process is flawed, I would say the result of the fast track process is an even better spec that what we had when we started. Remember that the standard from Ecma (Ecma 376) has already been implemented by a ton of other companies, so even if it had room for improvement, it was already serving its purpose.

Accessibility

Obviously there was already a lot of great accessibility functionality built into Open XML, and as a result of the ISO process it's become even richer (thanks to some great feedback from Canada and New Zealand).

ODF created an Accessibility sub-committee after ISO approval and then added an accessibility annex to their version 1.1, which has never been brought to ISO.

Closing

I'll close by repeating that I don't intend this post to be anti-ODF in anyway. Instead I'm pointing out that there will always be designs that folks disagree with, and areas for improvement in a standard. If we waited for it all to be perfect, then we'd find the industry had moved on to something else. It's important to get a stable spec out there for folks to start building on. We can then see how it's used, and make improvements/corrections from there. This is what you see in ODF, and we'll see the same thing with Open XML.