Standards and Enablement

I’d like to synthesize some thoughts I’ve been having in recent weeks. But before I do that, let’s have a joke:

A Harvard Divinity School student reviews a proposed dissertation topic with his advisor. The professor looks over the abstract for a minute and gives his initial appraisal.

“You are proposing an interesting theory here, but it isn’t new. It was first expressed by a 4th Century Syrian monk. But he made the argument better than you. And he was wrong.”

So it is with some trepidation that I make an observation which may not be novel, well-stated, or even correct, but here it goes:

There is (or should be) an important relationship between patents and standards, or more precisely, between patent quality and standards quality.

As we all know, a patent is an exclusive property right, granted by the state for a limited period of time to an inventor in return for publicly disclosing the workings of his invention. In fact the meaning of “to patent” was originally, “to make open”. We have a lingering sense of this in phrases like, “that is patently absurd”. So, some public good ensues for the patent disclosure, and the inventor gets a short-term monopoly in the use of that invention in return. It is a win-win situation.

To ensure that the public gets their half of the bargain, a patent may be held invalid if there is not sufficient disclosure, if a “person having ordinary skill in the art” cannot “make and use” the invention without “undue experimentation”. The legal term for this is “enablement”. If a patent application has insufficient enablement then it can be rejected.

For example, take the patent application US 20060168937, “Magnetic Monopole Spacecraft” where it is claimed that a spacecraft of a specified shape can be powered by AC current and thereby induce a field of wormholes and magnetic monopoles. Once you’ve done that, the spacecraft practically flies itself.

The author describes that in one experiment he personally was teleported through hyperspace over 100 meters, and in another he blew smoke into a wormhole where it disappeared and came out another wormhole. However, although the inventor takes us carefully through the details of how the hull of his spacecraft was machined, the most critical aspect, the propulsion mechanism, is alluded to, but not really detailed.

(Granted, I may not be counted as a person skilled in this particular art. I studied astrophysics at Harvard, not M.I.T. Our program did not cover the practical applications of hyperspace wormhole travel.)

But one thing is certain — the existence of the magnetic monopole is still hypothetical. No one has shown conclusively that they exist. The first person who detects one will no doubt win the Nobel Prize in Physics. This is clearly a case of requiring “undue experimentation” to make and use this invention, and I would not be surprised if it is rejected for lack of enablement.

I’d suggest that a similar criterion be used for evaluating a standard. When a company proposes that one of its proprietary technologies be standardized, they are making a similar deal with the public. In return for specifying the details of their technology and enabling interoperability, they are getting a significant head start in implementing that standard, and will initially have the best and fullest implementation of that standard. The benefits to the company are clear. But to ensure that the public gets their half of the bargain, we should ask the question, is there sufficient disclosure to enable a “person having ordinary skill in the art” to “make and use” an interoperable implementation of the standard without “undue experimentation”. If a standard does not enable others to do this, then it should be rejected. The public and the standards organizations that represent them should demand this.

Simple enough? Let’s look at the new Ecma Office Open XML (OOXML) standard from this perspective. Microsoft claims that this standard is 100% compatible with billions of legacy Office documents. But is anyone actually able to use this specification to achieve this claimed benefit without undue experimentation? I don’t think so. For example, macros and scripts are not specified at all in OOXML. The standard is silent on these features. So how can anyone practice the claimed 100% backwards compatibility?

Similarly, there are a number of backwards-compatibility “features” which are specified in the following style:

This element specifies that applications shall emulate the behavior of a previously existing word processing application (Microsoft Word 6.x/95/97) when determining the placement of the contents of footnotes relative to the page on which the footnote reference occurs. This emulation typically involves some and/or all of the footnote being inappropriately placed on the page following the footnote reference.

[Guidance: To faithfully replicate this behavior, applications must imitate the behavior of that application, which involves many possible behaviors and cannot be faithfully placed into narrative for this Office Open XML Standard. If applications wish to match this behavior, they must utilize and duplicate the output of those applications. It is recommended that applications not intentionally replicate this behavior as it was deprecated due to issues with its output, and is maintained only for compatibility with existing documents from that application. end guidance]

This sounds oddly like Fermat’s, “I have a truly marvelous proof of this proposition which this margin is too narrow to contain”, but we don’t give Fermat credit for proving his Last Theorem and we shouldn’t give Microsoft credit for enabling backwards compatibility. How is this description any different than the patent application claim magnetic monopoles to drive hyperspace travel? The OOXML standard simply does not enable the functionality that Microsoft claims it contains.

Similarly, Digital Rights Management (DRM) has been an increasingly prominent part of Microsoft’s strategy since Office 2003. As one analyst put it:

The new rights management tools splinter to some extent the long-standing interoperability of Office formats. Until now, PC users have been able to count on opening and manipulating any document saved in Microsoft Word’s “.doc” format or Excel’s “.xls” in any compatible program, including older versions of Office and competing packages such as Sun Microsystems’ StarOffice and the open-source OpenOffice. But rights-protected documents created in Office 2003 can be manipulated only in Office 2003.

This has the potential to make any other file format disclosure by Microsoft irrelevant. If they hold the keys to the DRM, then they own your data. The OOXML specification is silent on DRM. So how can Microsoft say that OOXML is 100% compatible with Office 2007, let alone legacy DRM’ed documents from Office 2003? The OOXML standard simply does not enable anyone else to practice interoperable DRM.

It should also be noted that the legacy Office binary formats are not publicly available. They have been licensed by Microsoft under various restrictive schemes over the years, for example, only for use on Windows, only for use if you are not competing against Office, etc., but they have never been simply made available for download. And they’ve certainly never been released under the Open Specification Promise. So lacking a non-discriminatory, royalty-free license for the binary file format specification, how can anyone actually practice the claimed 100% compatibility? Isn’t it rather unorthodox to have a “standard” whose main benefit is claimed to be 100% compatibility with another specification that is treated as a trade secret? Doesn’t compatibility require that you disclose both formats?

Now what is probably true is that Microsoft Office 2007, the application, is compatible with legacy documents. But that is something else entirely. That fact would be true even if OOXML is not approved by ISO standard, or even if it were not an Ecma standard. In fact, Microsoft could have stuck with proprietary binary formats in Office 2007 and this would still be true. But by the criterion of whether a person having ordinary skill in the art can practice the claimed compatibility with legacy documents, this claim falls flat on its face. By accepting this standard, without sufficient enablement in the specification, the public risks giving away its standards imprimatur to Microsoft without getting a fair disclosure or the expectation of interoperability in return.

The “enablement” argument is surely a good way to debunk an inflated marketing hype. But is the claim of 100% compatibility included in the official standard text?

I ask because if the OOXML spec provides enough enablement to fulfill the goals stated in the spec itself, the marketing hype surrounding it should not matter to a standard body. It should rather matter to a prospective buyer that considers the actual product.

You can find the claim in several places. For example, the charter of Ecma TC45 was to “Produce a standard which is fully compatible with the Office Open XML Formats, including full and comprehensive documentation of those formats”.

Also, the Overview report, included Ecma’s OOXML submission to ISO, claims of OOXML, “To the best of our knowledge, it is the only XML document format that supports every feature in the binary formats.”

The examples given in this post and a previous post of elements that simply say to have, “small caps which are smaller than typical small caps at most font sizes” and to emulate other undefined legacy systems, these are certainly examples of insufficient enablement.

Finally, I’d note that Microsoft defends every bug in OOXML, such as incorrect calendar calculations, inclusion of VML, etc., on its need for full legacy compatibility. They also use that to justify the standardization of OOXML when an existing standard in this problem space was published by ISO just 3 months ago. Presumably we’ll read a repetition of these claims next week in Ecma’s response to the 20 JTC1 national body contradiction submissions.

So, considering the above, it is of some public interest, before granting ISO approval to OOXML, to know whether in fact OOXML actually enables others to practice this compatibility.

I realize that there are people on both sides of this issue. Some want to see OOXML defeated, while others see its existence as a good thing, that Microsoft is documenting its formats and this will allow open source software to remain compatible. I’m just pointing out for the latter that they are really not getting what they think they are getting. Especially without the DRM they are just getting a handful of dust.

Purposes for the standard: OpenXML was designed from the start to be capable of faithfully representing the pre-existing corpus of word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Corporation. The standardization process consisted of mirroring in XML the capabilities required to represent the existing corpus, extending them, providing detailed documentation, and enabling interoperability. At the time of writing, more than 400 million users generate documents in the binary formats, with estimates exceeding 40 billion documents and billions more being created each year.

The “Billions of Binaries” argument is by far and away the most compelling reason for ISO/IEC to consider Ecma 376.

Office Open XML (OpenXML) is a proposed open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on multiple platforms.

This statement is near identical to the original 2002 Open Office XML (now ODF – OpenDocument) Charter statement of purpose. But let’s cut to the chase and look at the ISO/IEC product description:

ISO/IEC 26300:2006 defines an XML schema for office applications and its semantics. The schema is suitable for office documents, including text documents, spreadsheets, charts and graphical documents like drawings or presentations, but is not restricted to these kinds of documents.

The contradiction is clear and unequivocal. So why is there any consideration whatsoever at ISO/IEC, let alone a proposed fast track cram through?

Reality man. The hard cold uncompromising reality of those billions of binary documents that are the fodder and grist of critical day to day MSOffice bound business processes, line of business applications, and assistive technology type add-ons. Microsoft has a monopolists grip on much of the world’s workflows and workgroups. The desktop productivity environment is the default user interface to backend data, services, transaction and information processing systems. Microsoft is not about to give up that incredible advantage control over this environment gives them – especially as they are now pushing into server, device and Internet system realms with a product stack promising unmatched integration to the end user desktop interface through the highly application and platform optimized MSXML document/data model.

Governments, enterprises, organizations, businesses – wherever workflows and workgroups collaborate in an extended information processing chain – are caught between a rock and a hard place.

Microsoft ruthlessly reserves for themselves the exclusive right to convert these billions of binaries to XML.

ISO/IEC and the governments of the world can standardize all they want on ODF, but the real world problem of how to get from where business processes are today to the open XML standard of ODF is a problem Microsoft is determined to keep pushing. And make no mistake about it, the only thing standing between those billions of binary documents and the truly open ODF is Microsoft and their refusal to document the legacy binary blueprints needed for perfect conversion.

Wha tmost people can’t seem to grasp is that perfect of near perfect conversion of those binaries is a base requirement for the continuous operation of workgroups and workflows. Anything less than perfect conversion fidelity is too disruptive and costly to try. Take a close look at the difficulties of major migration efforts such as Munich, Bristol and the Commonwealth of Massachusetts, and you’ll see that the first barrier of conversion fidelity is a show stopper for the second migration barrier; the workflow problem of MSOffice bound business processes, LOB’s and assistive technology add-ons. The two barriers are intimately and inextricably linked at the document level.

So put me on record agreeing with Rob. Ecma 376 is not useful or even implementable unless and until Microsoft provides full documentation concerning the legacy binary file formats. Reserving this secret for themselves is simply cover while they leverage their desktop monopoly into the realms of server, device and Internet systems. An objective for which a controlled portable XML document/data model is indispensible.

The fact that the written specification doesn’t match its stated goal should really matter to a standard body. When in the process is this validation performed?

I suppose ISO would expect that when an ECMA standard reaches them through the fast track route, the proposed standard text is reasonably matching the stated goal. Otherwise I would expect the ability of ECMA to perform its process should be questioned.

I have read on Andy Updegrove’s blog that South Africa has raised that issue but for a different reason.

One can level the same “enablement” criticism against ODF. The format is, in many parts, underspecified: the only way to replicate its behaviors is by examining programs that do implement it.

DRM is troubling, but it is NOT a major issue for three reasons:1. DRM-protected documents are not generally destined for sharing with others (why else would they be protected?)2. It is a niche technology; DRM authoring is not included in the most widely used versions of MS Office.3. In part because of the above reasons, MS’ DRM has not caught on!

Maybe I can persuade you to make up a pseudonym for yourself? I don’t mind anonymity, but I sure would like to know whether I’m debating with one person or a bunch of them.

In any case, I think you misunderstand the purpose of DRM in Office. It is very much intended for documents that are shared with others. What purpose would DRM possibly serve if the document was only on the author’s machine? Remember DRM is not a binary access/deny thing. It can be fine grained, such as only this person can read the document, or this person can read, but not write, or can read but not print. It is a way to allow the physical document to be freely mobile, via email, USB memory keys, etc. but still internally manage access controls.

You’re comparison with ODF is not persuasive. Sure, there are things that ODF 1.0 does not include, for example a specification of spreadsheet formulas. However, ODF 1.0 does not claim to include that feature. Enablement is an issue when you claim more than you disclose. You do not see a flag in ODF that says “doFormulasLikeOpenOffice2” or anything like that, right? So the ODF 1.0 specification does what it claims to do. Compare this to OOXML which claims to be 100% compatible with legacy Office file formats but doesn’t actually disclose how to do that.

MS Office DRM has several functions. I won’t claim to list them all but I can mention a few that are relevant to document standardization.

One of them is to protect secrets so only authorized persons can read them. These document are very valuable to the organization that owns them and they are locked into a proprietary format that may be obsoleted at Microsoft’s whim.

Another reason is to permit the document to be circulated across several parties while ensuring the owner of the document can control how it is used, like allow some designated person to read but not modify or print. This requires the parties agree on a method to share the relevant certificates as well as some federated Acive Directory based identity management. It can be done. This is a feature of Office DRM that is supported by Microsoft. But I haven’t seen the relevant protocol to be published, less standardized.

In any event if a corporation takes the trouble to implement DRM on some of its documents, it is sign that this corporation gives much importance to the intellectual property included in such documents. The number of users or market share are not good metrics of the stakes. It only takes a few DRMed documents to lock a corporation into Microsoft products.

“It only takes a few DRMed documents to lock a corporation into Microsoft products.”

That’s somewhat exaggerated. If a corporation decides that protecting documents with MS’ DRM is no longer in its best interest, all it need do is de-encrypt them with the program it used to encrypt them in the first place. And unprotecting a “few” documents is a trivial task.

I.E., it only requires that the corporation not discard the software that had originally acquired.