Microsoft Office Open XML Formats: “We Already Got One”

First, the good news. As many of you have seen by now, at 12:01 AM ET last night, Microsoft announced rather extensive – and pretty much open – support for XML based formats for its next gen Office iteration, due next year. Yes, that includes Powerpoint and Excel (but not OneNote, sadly). Indeed, the default save format will be XML. For specifics on the announcement, head on over to the blog of one of the men behind it, Brian Jones.

According to Brian, the format is open, comprehensive, and backward compatible – how could there be bad news? Well, as they say in Monty Python’s The Holy Grail, “It’s very nice-a, but we already got one.” As Tim Bray notes, many of the other heavyweights (including Adobe – how did that not get picked up more widely?) in the technology industry have been collaborating on just such an XML based, open format referred to with the unexciting but descriptive moniker Open Document Format.

The question then becomes why did Microsoft invest its time and energy into creating a duplicate format? While conventional wisdom might have us believe that it’s because big bad Redmond is all about lock-in, the open nature of the format undermines that argument (though I’ll concur with Tim Bray on the licensing, giving it a wait and see). Similar to its Metro format, the new formats while not accepting outside contributions or guidance – and therefore not an open standard – can be considered by virtue of their documentation an open format. Lock-in then seems to be an inadequate answer.

So why then did Microsoft pursue the course they did? Via Erwin Tenhumberg, whom you really ought to read if you care about Open Office (and whom I had the pleasure of speaking with on related topics recently), Star Office and the Open Document Format, comes one explanation in Beta News:

When asked why Microsoft did not use the OASIS (Organization for the Advancement of Structured Information Standards) OpenOffice.org XML file format, Paoli answered, “Sun standardized their own. We could have used a format from others and shoehorned in functionality, but our design needs to be different because we have 400 million legacy users. Moving 400 million users to XML is a complex problem.”

“This is a case of reality versus standards – this is reality. We can’t do (support) everything. Where does it stop?” Paoli is one of the authors of the original XML specification. The schemas flag all of the features in the current corresponding Office files formats.

So there’s your answer: legacy users. One certainly can’t discount the numbers and the volume of legacy users that Microsoft Office has, but would the Open Document Format really be inadequate for supporting those users? Well, the only people that really know the answer to that work for Microsoft. But the folks from KOffice, as Erwin points out, don’t seem to think so:

But I definitely think the OpenOffice.org file format was a very good basis for the OASIS format, since it was designed, from the start, as a file format that should be as independent as possible from the design of the application. It reuses standards like XSL/FO, CSS, HTML etc. as much as possible, so the goal is to make the OASIS format another one of those formats, where the application used to edit the document doesn’t matter.

But again, let’s give Microsoft the benefit of the doubt here; ensuring backward compatability for that many users is no doubt a monumental task, and either way the Redmond folks should be given much credit for releasing what on first glance seems to be a real, genuine open format.

Good intentions and such aside, however, if Microsoft really wants to demonstrate that they’re open they’ll go one step further and support the Open Document Format alongside their own non-independent version. Wondering if customers want it? Check out the comments to the original post from Brian Jones.

The objections to such support typically run along the lines of Microsoft has never had to compete on implementation, why would they choose to start now? Well, I reject that argument for a couple of reasons:

Office is clearly ahead of Open Office at the moment on the usability and feature/function fronts. Sorry OO.o advocates, but as a user of both, Microsoft Office is still out in front. Not by as much as people might think, but enough to easily compete on a level platform basis. I liken it to iTunes; IMO, if iTunes chose to compete on its merits rather than the closed nature of its platform, I’m a believer that it would win nine times out of ten because it’s simply better than alternatives.

Arguing this point assumes that the uneven playing field of a single, controlled format is sustainable indefinitely. Clearly, Microsoft has controlled the Office format landscape nearly forever, but times, as they say, are a-changin. Governments abroad, in particular, have been exerting increasing pressure on Microsoft to play fair by being more open, and in fact it’s difficult to imagine today’s announcement without some of this governmental pressure.

The PR win from such support is likely to more than cost justify the effort required to include support for the Open Document Format. Microsoft would all of a sudden be able to play the open card with ease, having by choice (rather than mandate, as could possibly become the case somewhere down the line) decided to support an open, independent document format.

On some level these arguments fly in the face of the “if it ain’t broke, don’t fix it” mindset that seems to be prevalent within Microsoft divisions that have generated quarter after quarter after quarter of sustained revenue growth, but I think that that mindset is going to have to change in a world that’s increasingly driven by macro trends like open source and open standards. Are we at the tipping point in the office format equation was a question I debated with a vendor yesterday, and the answer, in my mind, is not yet. But the Open Document Format is still young, and I do think it ultimately has the power to fundamentally alter the context of discussions around office productivity software. Microsoft can be proactive and aggressively compete on that basis now, or they can ignore the demands of governments and ISVs abroad. I know what I’d choose.

11 comments

I disagree about functionality. Until recently one of my jobs was to get Microsoft Word to do what any bloody word processor ought to do – things like stably assemble pages, do multidocuments properly, things like that. Stuff that had been easy to do in Word Perfect, FrameMaker, TeX and so forth since the 80s. Practically all the "extra features" in Word are unusable or deformed in some way. If not for the fact that the medical research institute for which I worked needed access to EndNote, which works seamlessly only with Word (unless you are using version 8, in which case it works once every three times), I'd have insisted we convert everything to other formats.

I haven't seen the MS XML formats for a while, but as I recall they needlessly complicated the standard with O: tags. Do they still do that?

Unclear, and Sam Ruby's (http://www.intertwingly.net/blog/2005/06/02/Open-Office-Wars-Part-6) raising questions to how extensible (or not) the formats will be, and whether or not the schemas will be provided. As to the O: tags, I have no idea, my only similar experience was with Word's HTML markup skills which were simply awful – inserting Microsoft specific MSO: tags everywhere, making the HTML export facilities more or less useless. One presumes that because the formats will be completely open, they wouldn't expose themself to potential criticisms in those areas, but maybe that's asking too much.

Grga: interesting. like Louis, i've never observed that myself, although the last time i was working with 25+ page docs was when i was an SI working on fairly monstrous deliverables. but maybe later versions do have that difficulty.

as for who's superior, i could point to the fact that we've been unable – even with the help of some OO.o experts – to duplicate our publication template in OO.o. but the point here should be on feature/functions. because IMO they're not what users care about: most of us only use 10 or 20% max of the features, anyhow.

either way, though, i like OO.o and it is fact my primary authoring environment, and what we use internally at RedMonk for all non-publication literature. it's just not perfect.

Louis: i'll wait and see on the o: question, but my primary question still is whether or not said schemas will be extensible by third parties.

Craig Ringersays:

It seems to me that control over their own format is probably the most likely motive. In their position, I'd find it pretty compelling too (but so are the arguments against).

There's another format that works a bit like this that's been extremely successful. PDF. Adobe retains control over PDF's direction and development, but licenses the format in a way that makes it possible for others to implement without problems.

This has resulted in the development of a large number of independent and/or free PDF tools, some of which compete with Adobe's own products. In fact, OpenOffice.org its self supports PDF export.

I'd be much happier to see MS use OpenDocument, but a properly licensed and documented format under their control would be a massive step up from the mess we have now.

As for functionality, I'm with you on that. OO.o is useful, but has some really frustrating problems and limitations. I've been using Linux thin clients at work for quite some time and they've been very successful for our basic users. I recently trialled them with two of the journalists, but it was a miserable failure – almost entirely due to problems with OO.o . The bigest problem was the UI, which one of the users called "an awful ripoff of a really bad interface" (referring to MS Office). I'll wait and see what OO.o 2.0 or 2.1 bring.

With the MS format – IMO it really comes down to "wait and see what the license will bring."

So far i've found MS XML to be refreshingly open in a declarative sense, but totally lacking in participation. It's still a permission based model where a single vendor has proprietary control over access to and implementation of the MS XML file format. You can look, but you can't touch. Where's the participation? Where's the collaborative community of contributing parties? And where's the Open Standards Group that Microsoft promised the European Union back in November of 2004 when Jean Paoli triumphantly announced that the EU had accepted MS XML version 1 as both "open" and "standard"? Of course Jean Paoli then went on to explain that the current agreement with the EU was based on Microsoft's promise to collaborate with Sun on creating "filters" that would transform MS XML to OpenDoc XML. The future promise Microsoft made to the EU was that they would provide an Open XML Format submitted to an Open Standards Group.

Unlike the headline hunting hounds in Massachusetts, who no doubt were successful in extorting from Microsoft far more than the few meager licensing concessions made public, the EU stood tall and held their ground. They are determined to get an Open XML file format backed by a recognized Open Standards Group.

So how is this version 2 of MS XML any different from MS "shared source"?

Since MS XML looks to be a clone of OpenDoc XML, i think it's disingenuous to imply that Microsoft put so much time and effort into creating a duplicate XML file format to meet their "legacy" needs. This is a knockoff clear and simple. The work was done by OpenOffice.org, Sun Microsystems, and the OASIS OpenDocument Technical Committee.

So what about the legacy issue and the 400 million user base that's been tormented since time immemorial by designed in incompatibility, and the ever profitable promise that the next upgrade will finally deliver interoperability?

If the upgrade to get all your file formats compatible treadmill has finally ground to a halt, it will be because of the OASIS OpenDoc XML work.

The first 18 months of work at the OASIS OpenDoc TC (then called the OASIS Open Office TC – name changed in September of 2004 at the suggestion of the EU TAC/IDA group), was focused near entirely on legacy systems. Especially legacy systems
wedded to Microsoft binary file formats.

The OpenDoc TC was very fortunate to have a wealth of expertise in reverse engineering the legacy maze of incompatible MS binary file formats. Experts from Corel Office, StarOffice, Boeing, Stellent, ArborText, and SpeedLegal among others had long made their living reverse engineering MS file formats. Phil Boutros, the legendary binary cracking wizard representing Stellent, near single handedly represented what would have otherwise been thought to be the full cooperation of Microsoft in solving these legacy issues. With over twenty years expertise in wrestling with legacy MS file formats, one couldn't help but feel Phil did a far better job at this than anyone Microsoft could have sent to the TC. To be honest, i don't know of any Microsoft worker bees old enough to be packing the level of expertise that was routine on the OpenDoc TC. If Jean Paoli has a MicroSerf in mind who can stand toe to toe with Phil Boutros, i would pay for a front row seat.

At any time Microsoft was and is able to jump into the TC discussion's about their legacy file formats and the transformation issues that were eventually resolved in the OpenDoc XML specification. They did after all have an official membership on the OpenDoc TC. Rabih Filfili, a Software Engineer at Microsoft has been the official observer of record for Microsoft since the inception of the OASIS OpenDoc TC. He can be reached at 650-693-2237.

Since the first 18 months of the OpenDoc TC's life was spent on legacy issues, most of which concerned MS file formats, it doesn't surprise me at all that they cloned the OpenDoc specification. But then to try to take credit for the enormity of work the OpenDoc TC put into the transformation process? Who are they kidding? The least they could do is send a thank you note and give credit to the real experts who actually did the work: Phil Boutros (Stellent), Paul Langille (Corel), Tom Magliery (Corel), Simon Davis (Australian National Archives), Jasson Harrop (SpeedLegal), Daniel Vogelheim (Sun StarOffice), Michael Brauer (Sun StarOffice), Doug Alberg (Boeing), Paul Grosso (ArborText), Patrick Durusau (Society of Biblical Literature), and David Faure (KOffice-KDE).

Where the first 18 months, or phase I, of the OpenDoc TC work was dedicated to MS legacy and transformation issues, the second phase took on the issues of emerging Open XML Technologies, and next generation collaborative Internet computing.

Yes there is still a need to expand the compound document capabilities of OpenDoc XML so that the specification covers far more of the desktop productivity environment than that defined by the typical desktop Office Suite. With both OpenOffice.org and KOffice now road testing implementations before they hit the specification, OpenDoc XML has considerable advantages towards developing a truly portable and structured, compound document file format that is completely independent of applications, platforms, and proprietary vendor permissions.

Some of the productivity environment expansion possibilities include the following: Michael Brauer has submitted a database proposal for consideration. David Wheeler is working on the OpenFormula Project. I'm hopeful that with their WorkPlace workflow integration and project management expertise, IBM will assist the TC to perfect a Contact Management-PiM-Project Management enhancement. And there is always the possibility that new member Adobe will assist the OpenDoc TC in expanding the metadata model based on the excellent cross enterprise work Adobe has done with XMP.

The core of the next generation work however is the embrace of W3C Open XML efforts such as XForms, SVG, and SMiL. I think when all is said and done, and we finally get to exercise our "shared source" rights and get to see the details of the MS XML clone, the most notable differences will not be those pertaining to the maze of legacy transformation issues Microsoft is now pointing to. No, the important differences will center around next generation collaborative computing capabilities where OpenDoc XML XForms, SVG, and SMil implementations will compete directly against whatever embrace-extend variation Microsoft conjures up.

I sincerely appreciate your efforts Stephen to provide some balance and perspective to this discussion. Keep on keeping on,
~ge~

John Kleinsays:

This discussion of the new XML standards to be implemented by Microsoft is very interesting.
While the technical committee, tc, struggled for 18 months to incorporate existing and previous Microsoft, MS, formats, what formats are not being covered and why? In particular if you compare the two implementations, which formats are not covered by either the MS XML or the OASIS standard?

A secondary question is why is OneNote being orphaned, deliberately by MS, from the MSXML format?

The OpenDoc XML specification is eXtensible. Even if a legacy format is found to have some idiosyncrasies, transformations and conversions are still quite possible.

If you have direct access to the binary format specs, your OpenDoc XML transformation can be very exacting. Otherwise, you're rather limited to the quality of your reverse engineering.

I don't know if MS XML is eXtensible in the same way, keeping to the traditions of Open XML, as in "eXtensible Markup Language". The MS XML Reference License allows for royalty free use of MS XML, but i wonder about the legal risk of eXtending MS XML to meet non MS file format needs. Technically MS XML looks to be quite eXtensible. The question however is can you get Microsoft's permission to do what you need to do with your information? And what will that permission cost? MS XML is royalty free for the moment, but there is no guarantee as to how long that will last.

Keep in mind that when the OASIS OpenDoc XML TC opened shop, the OpenOffice.org Office Productivity Suite was already capable of high level two way conversions for most if not all popular legacy and current file formats. Not only could OOo read and write to 25 years worth of legacy file formats, including everything used by the 450 million MS monopoly base, but every one of those file formats could easily be converted to OpenDoc XML.

Still, even with the high level quality of reverse engineering OOo has achieved, one has to assume that Microsoft would be able to further the fine tuning. With a release date set for late 2006, we've got lots of time to argue about MS XML fine tuning.

The question is why wait? If the Microsoft arguments for MS XML are valid and compelling today, then why not start converting legacy files to OpenDoc now? Why wait?

Besides, if Microsoft did have any problems converting their legacy files to MS XML, they could always go to OpenOffice.org for help. This business about dismissing the truly open and standard OpenDoc XML in favor of a proprietary solution because of the difficulties of converting legacy files is nonsense. If OOo can do this through reverse engineering, providing a ready to use zero cost tool for converting those same legacy file formats to OpenDoc XML, what could possibly be Microsoft's problem (other than wanting to maintain iron fisted control)?

Let me try to cast the work of the first 18 months in a slightly different light than that of accommodating legacy file format idiosyncrasies and conversion demands. Yes, the main topic of discussion often seemed to be that of running OpenDoc XML against the legacy of MS Office, WordPerfect Office, and Lotus Office formats to see if we could truly "standardize" a perfect fit. Meaning a perfect mapping instead of having to rely on eXtending the standard specification as needed.

But the purpose of these efforts wasn't exactly to the point of trying to establish OpenDoc XML as some kind of global destiny for all information. (That's phase II 🙂

Rather, the purpose seemed to be that of crafting OpenDoc XML as a useful "universal transformation layer" sitting between working legacy information systems and, rapidly emerging enterprise level publication and content management systems.

The OASIS OpenDoc XML TC was loaded with SGML expertise and enterprise publication and content management systems expertise. For me however, some of the most eye opening examples of the problems faced by end users were provided by Boeing representative Doug Alberg. His story covered some of the most dramatic cost saving efficiencies and long term solutions obtainable through implementing the OpenDoc XML TC as a common transformation layer.

Some things to note:

Boeing was also a client of Stellent, ArborText, and Documentum. In fact, they saw a future where any number of information management systems might have to interoperate at the file format level – reworking the same information in different, highly specialized ways.

Boeing also has hundreds of legacy information systems on line, many of which long ago lost their vendor support, but which are still being used. All of which use unique file formats. The investment in division level and workgroup specific legacy systems could continue to serve the company, if only the information could be digitally gathered and served up to enterprise level systems.

On top of these systems, Boeing had to develop their own mega enterprise publication and content management system – MODS, which Doug Alberg was charged with developing and implementing.

Finally, imagine the re-purposing, re-use, and re-publication synergies Boeing could tap into if they were able to get that legacy of unstructured information into a common XML structured format. Trading partners, supply chain participants, productivity groups (internal, external and mixed collaboration), massive government regulation at international proportions – they all need exacting access, at exactly the right time, to exactly the right details, explanations, and schematics. The enormity of what's at stake here is mind boggling. And as Doug was want to say, "The paper work and documentation for a 747 couldn't possibly fit into a 747".

So instead of writing direct transformations between backend legacy systems and the many different enterprise level publication and content management systems that might be needed, wouldn't it be far easier to manage and maintain these transformations if there was a common intermediate transformation layer? By sinking the information repository into this common layer, Boeing could effectively preserve their investments in legacy systems while keeping their options wide open in regards to enterprise level publication and content management systems.

One of the things that concerns me about MS XML is whether or not users will implement the file format as a repository? Or, will they choose to use MS XML as a transformation layer, able to quickly convert information to the globally open forever repository format, OpenDoc XML?

Because of the way the MS XML Reference License is worded, especially the use of a "link" to the critical section of the agreement describing possible patents, encumbrances and restrictions. Microsoft can change the rules at any time.

In the past MS Office users had no choice but to trust their information to Microsoft. When file formats are joined to specific applications and platforms, the only choices one has are those granted by the vendor. XML file formats are naturally highly portable, able to separate cleanly from applications and platforms. But the proof is still in the licensing. For MS XML, we're still looking at having to trust in Microsoft's by permission only business model.

By now everyone is quite familiar with the MS business model of "embrace, extend, extinguish". There are no assurances that i can see suggesting that somehow MS XML will forever be outside this insidiously reprehensible but very effective business approach. When Microsoft extends an emerging technology, especially one they have made much noise about embracing, the cascading entanglement of dependencies and interfaces can be breathtaking. They make a mockery of all the interoperability promises their embracing of these same open technologies were originally thought to have made possible. With MS XML we once again see Microsoft trading on these interoperability hopes. How many times do users have get stripped and whipped before they learn to be wary of a Microsoft bearing open technology gifts?

The effort that goes into creating this never ending sprawl of entanglements and interdependencies goes a long way towards explaining why Microsoft can't deliver applications anywhere near on time. It goes a long way towards explaining why Windows is so insecure and unstable. But it also explains how Microsoft maintains their iron grip on a vast monopoly base.

Until Microsoft removes the MS XML Reference License restrictions and patent threats, and turns the entire file format over to a recognized Open Standards Group – one that will manage and maintain the format under a truly open, transparent, and unrestrictive license – MS Office users should think twice before trusting in MS XML as their information repository format. And since these users will have near 18 months or more to think about this, why not try OpenOffice.org and OpenDoc XML in the meantime?