Conversion factors

A few articles appeared online today, talking about the new Windows Office 2007 file format and converters. They asked a very reasonable question — “Where are the converters for Mac Office?” but wrapped the question in some conjecture and a little bit of hyperbole. Sheridan put up a post over on Mac Mojo to give a little explanation and make a concrete affirmation that the MacBU will release converters for the new formats. So far there have been several dozen comments in reply, many of them expressing some strong emotions ranging from anger and frustration to incredulity and disbelief that we either can, want to, or will actually develop these converters.

Much of the sentiment expressed seems to stem from a number of theories, such as

We have been sitting around doing nothing

We have not talked to the Win Office team

File format converters are trivial

None of these three beliefs are true, however. Let’s look at each one in turn.

1. The MacBU has been doing nothing
The reality is we have been working very hard on underlying technologies we need in order to make these converters. The new file formats are XML-based. On the Windows side, there is a very complete and modern XML parser built into the operating system that the WinOffice team uses. That parser does not exist on the Mac (there are certainly a variety of XML parsers out there, including libxml, but the only one that ships on Mac OS by default is libxml and it doesn’t support everything that the new file formats need.) So, we have spent significant development resources on porting MSXML over to the Mac and testing it to ensure it works. You may recall from earlier posts of mine that porting code from Windows to the Mac is not a trivial task, even when the code doesn’t use any OS-specific APIs. The compilers used by each team are very different, even the behavior of some of the standard runtimes is different, and of course we have to test in great detail any new code and especially any changes we make to it. Beyond that, there are changes that needed to be made to Office 2004 applications to support the upcoming converters, and we’ve been busy doing that. I spent much of the month of October going through Excel 2004’s load and save code path to make it aware of the new converter (I even went so far as to create a fake converter that pretends to handle the conversion commands so that I could actually test Excel 2004’s new behavior as much as possible in advance.)

2. The MacBU never talks to the WinOffice team
Various people in MacBU talk to their counterparts on the WinOffice team pretty frequently. For example, I coordinated the work I did on Excel 2004 with a developer on the WinOffice side who did similar work for WinXL 2003. Our Program Managers have had weekly and monthly meetings with WinOffice folks for the past few years. We’ve known the general plan for the scope of work we’ve needed to do for some time, and have been doing that work. Knowing what to do, however, doesn’t make the work happen instantaneously.

3. File format converters are trivial
Let’s step back and look at the problem again. Office 2007 for Windows and Office 12 for the Mac both need to support the old and the new formats. They need to support the formats natively — it would be pretty silly to invoke an external converter for the new native file format, right? Ergo, the code for the new file format should be written directly into these newest versions of the suites. Ok, so now you’ve got that code in place, and you need to create a converter for the older versions of the suites. What do you do? Should you rewrite or duplicate that file code in the converter project, and risk the converter and the actual product getting out of sync? Or do you take that existing code and repackage it as a converter itself? Well, the WinOffice team decided to do the latter, so their converters for Office 2003 are based off of a significant portion of the Office 2007 code. They shipped their final converter versions after shipping the actual suite that natively implements the formats.

That’s exactly what we’ve chosen to do as well. Rather than duplicating code and having to test each set, we are writing it once and repackaging the appropriate parts as the converters. The problem for us is that we’re following along behind WinOffice in a temporal sense. Because we’re shipping after WinOffice, there is a very real time delta where WinOffice users will be creating files with the new formats and the Mac converters won’t be ready. We’ve been porting the WinOffice code over to MacOffice in stages for many months now, but since they just shipped their final bits in November, we’ve only had those final pieces of source code available for less than a month. We’re porting it over as fast as we can. However, as I mentioned earlier, porting code across is not a trivial task. Our apps have diverged from WinOffice over the last 10 years, so we have to deal with internal implementation conflicts as well as compiler pecularities, OS differences, etc. Some parts even need to be rewritten — anything that deals with graphics on WinOffice probably uses the GDI+ library on Windows, which doesn’t exist on the Mac. So, we need to take time to rewrite it to use CoreGraphics, etc. Those changes need to be tested thoroughly! The converters even need that new graphics code, so that they can convert Office 12-style pictures with all the fancy new effects into some format recognizible by Office 2004.

So, you can see that our Office 2004 converters are very dependent on the progress we make on Office 12 itself. We can’t just drop work on Office 12, as some have suggested, to get the converters ready faster, because the converters use a core subset of the actual Office 12 code. The overall Office 12 project, however, is not yet ready for beta use. The WinOffice file format code changed in some pretty significant ways between their Beta 2 and their final shipping bits, and we had to wait for them to ship so we could get those bits. Our code (not necessarily the file format code, but the whole of Office 12) has some rough edges still; they need to get smoothed out before we can ship.

None of these issues are intractable, insolvable, or something that has surprised us. These issues are just part of the facts of developing and porting code cross-platform (especially code that was not written with cross-platform specifically in mind.) We’re making great progress, and can even see the light at the end of the tunnel (and it’s not a train!) As Sheridan said in a followup comment to her post, back in November we weren’t ready to talk about dates because we didn’t have the latest WinOffice code and couldn’t predict our progress based on missing data. Now that we’ve had the code for a few weeks and have made significant forward progress, I can actually tell you about where we are and when you can expect to see some of the fruits of our labors.

We plan to release a beta of the converters in late March or early April (roughly 3 1/2 months from now), and final versions of them after Office 12 ships. It is imperative that we do proper testing on even the beta converters, because a buggy converter that destroys your files would probably be even worse than not shipping a converter at all. I’m sorry that the wait is so frustrating to you. Many people in the MacBU are working very hard to make the converters available as soon as possible. I’m even helping out by putting on my old Excel developer hat (the one I put on the shelf back in 2001!) and doing some of the Excel port myself. The converters are coming, just not immediately.

Post navigation

41 thoughts on “Conversion factors”

Thanks! I didn’t quite understand the reason for the delta between winoffice and mac converters when Sheridan posted the original entry over on mac mojo.

Is there by chance a spot for super-special alpha testers? I’m currently compiling the “burning edge” nightly for Mozilla Camino and running the Beta of Adium X. Testing bugginess is where all the fun is!

The post you refer to, Troy, notes that Mac Word 12 was speaking XML as well as Win Word 2003. That’s not the same as Win Word 2007, unfortunately. At that time, Word could parse and understand generic XML, but not the specific archive format with cross-linked sections for the new Open XML file formats. That’s what we’re working on now.

Rick also says “the process is relatively simple.” That’s simple in concept, not in actual implementation reality. I too await his white paper on the whole process.

I think you said it yourself
“These issues are just part of the facts of developing and porting code cross-platform”
Did they plan on not releasing a Mac version or something? Or are you telling me that the 2007 XML file format is legacy code? Or is it just that the Win Office team are simply not capable of producing high quality work?
I hope Microsoft reinvents itself at some point. This is getting to be a joke. Don’t you get annoyed that you’re managers are (ab)using you?

I think the answer to your question is in the follow-up to the line you quoted: “especially code that was not written with cross-platform specifically in mind.” I doubt the Windows Office team gave much, if any, thought to writing code in a way that would be easy to port. They know they’re working on the most important application for Windows, and they probably see the MacBU as some whiny little kid who is always tugging on big brother’s shirt to let him play too. I imagine the team working on Quicktime for Windows faces a somewhat similar mentality (although at least in that case you don’t have most of the company wondering why it exists at all).

This article is one of the best written explanations I have ever seen of what happens inside Microsoft and why the Mac version of (in this case) the file converter is going to be so late.

However there is a big, enormous flaw in this argument.

If the reason for the delay, is that in order to properly understand the new XML formats and by understanding them convert them a form usable on the older Office 2004 for Mac, it is necessary to have significant chunks of Office 12 in place, THEN HOW COME OPENOFFICE WILL BE ABLE TO DO THIS BY THE END OF JANUARY 2007!!! Thats at least THREE MONTHS AHEAD OF THE MAC BU! OpenOffice being written for Linux would have all the same problems as Office for Mac but multiplied (after all they – Novell would not have the same supposedly intimate access to the Windows code that the Mac BU is supposed to have).

Quick question: if the parser requirements of the Office 2007 XML documents weren’t present in libxml, was there any investigation or feasibility/viability exercise to bring those capabilities to libxml rather than have to undertake the effort to port the entire MSXML engine? Is it possible to incorporate the necessary pieces of functionality into the libxml codebase and create a separate, thin MSXML parser API to call through? Just curious.

The biggest problem is that you are not (or aren’t allowed to be) honest here.

Take the XML parser, e.g.: While the exact specifications of the new file format may have been unknown until Office 2007 was released, the specifications of the needed XML parser weren’t.

If you’d say something like this:
“Ok, it was clear from the beginning that we would need 5 additional developers to port a good XML parser to the Mac if we don’t want to distract our core developers from doing work on Office 12. Unfortunately, the management decided that we wouldn’t get these 5 developers – even though the MacBU is so profitable that we could easily hire those 5 developers -, and that’s why we had to port the parser ourselves, and that’s one of the reasons why the converters aren’t finished yet”,
everyone would say “ok, the Microsoft management really isn’t interested in a good experience for Mac owners, but at least the MacBU itself couldn’t help it.”

Instead, you’re trying to imply that it simply was NOT POSSIBLE in any way to release these converters earlier. I’m sorry, but this is totally ridiculous given that your company (and even your BU) makes tens of millions of dollars profit a year and could easily afford more developers if the experience for their customers was more important to them than mere profits. (And don’t tell me that Mac developers are hard to find – even non-Mac developers can port an XML parser to the Mac if they aren’t totally dumb.)

The management decided that you wouldn’t get these developers, because the monopoly of Office (esp. on the Mac) guarantees that customers won’t switch to other products even if they are unhappy with Microsoft Office. Therefore you can make more profits without additional developers. The customer satisfaction is less important to the management. THAT’s the real reason why the Office converters are coming out that late.

If the reason for the delay, is that in order to properly understand the new XML formats and by understanding them convert them a form usable on the older Office 2004 for Mac, it is necessary to have significant chunks of Office 12 in place, THEN HOW COME OPENOFFICE WILL BE ABLE TO DO THIS BY THE END OF JANUARY 2007!!! Thats at least THREE MONTHS AHEAD OF THE MAC BU! OpenOffice being written for Linux would have all the same problems as Office for Mac but multiplied (after all they – Novell would not have the same supposedly intimate access to the Windows code that the Mac BU is supposed to have).

Run Open Office without X11 or the Neo Java port. Let me know how that works out for you. Open Office is not dealing with any OS X – specific techs, and the long – promised “native” aqua port is, from what I’ve seen, the minimum amount of work necessary to run without X11. I’ve yet to see any sign that it will be supporting one jot more of OS X – specific tech than it has to.

If libxml can’t “handle” the conversions it smells a lot to me like MS is once again solidifying their hold on the office tools platform by obfuscating things. XML is designed to be portable, not to require “special” parsers in order to be used. OpenXML and proprietary parsers seem like a bit of an oxy-moron to me.

I don’t dis what you guys are doing and I agree that the task is formidable and applaud your efforts, but in the end it’s the users that are left out in the cold– again, and for some reason I suspect it’s not for the last time. I imagine that the MBU has little to nothing to say about this in the end game, but the main MS Office team surely does.

There’s nothing nefarious about MSXML. As Rick said in this post (http://blogs.msdn.com/rick_schaut/archive/2005/06/01/424086.aspx) libxml didn’t handle the latest open standards that the XML spec details and that the converters rely on. If libxml did, we’d most likely have used it (and saved ourselves the time and resources put into porting yet more code from Windows.)

As for OpenOffice having their support done first, they aren’t also implementing an entire new version of Office simultaneously, as I noted that we are doing.

ChrisG, libxml is open-source, so I can’t contribute to its implementation.

It looks to this outsider like it’s an issue of priorities. Is there anything stopping you from writing the converter code first, then copying it to the new Office (instead of the other way around)?

Well, there is if your priority is the new version and not compatibility for the old. Err…forget I said anything.

But, there’s something else I don’t understand. You make it sound like the converter has to be hard-coded into Office 2004. My (and I would guess, most people’s) assumption was that it would be a plugin…At least, I always thought the file format readers/converters/savers in Office were modular. Guess I was wrong.

> ChrisG, libxml is open-source, so I canâ€™t contribute to its implementation.

pardon?

It’s open source to that you CAN contribute.

And it’s MIT-licensed, so you don’t have to. If you want, you can add your needed features to it and bundle this enhanced libxml2 with Office without publishing your changes.

But probably you don’t even need that. Rick’s post (which you link above) explicitly refers to the version of libxml2 that shipped with Panther (2.5.4). Libxml2 2.6.16 (which is shipping with Tiger, and is also include in the latest Panther update 10.3.9) supports SAX2. And if that’s still not enough, you can bundle a newer version with Office.

That post you link to about libxml and SAX2 is well over a year old (almost 18 months old). Also, libxml does support SAX2. Rumors on the internets (http://www.aeroxp.org/board/index.php?showtopic=5142&view=findpost&p=59726) say 10.5 supports Word 2007 saves from TextEdit, and since OS X uses libxml, libxml obviously has to have the support necessary in order to handle Word 2007 documents.

One of Mac Office’s drawbacks I think lies in the conservative approach you guys often take in decisions just such as the MSXML one we’re kind of talking about here. Bringing something like that to the table, porting it, testing it, and basically doing the implementation cha cha for something which surely you must itch could be done a simpler way … it delays release and it makes for bigger apps and more lag.

I appreciate the almost Sisyphean efforts the Mac BU undertake to bring us our premium suite. But sometimes it really does seem to even a sympathetic observer like me that for whatever reasons you forever end up doing things the hard way!

Fingers crossed for a great Office 12 … “2008” maybe? So long as it comes out in ’07!

Since Leopard isn’t shipping until some time in 2007 (late spring?) it would be impractical for us to require our users to have Leopard to do their file conversions, if we relied on its XML parser. As for SAX support as mentioned in Rick’s post, we were well into our development work at that stage having already determined that Panther didn’t give us the support we needed and Tiger, although a year old, was still not widely adopted enough to cover as much of our user base as we needed.

As much as we love it when Apple adds technologies to the OS so that we can remove code from Office, we can’t always make it a requirement too early.

Martina, perhaps I should have been more clear. Yes, I know that Open Source means that anyone in the greater public can contribute to it. However, Microsoft has a corporate policy that precludes us from even reading open source code, much less contributing to it. The basis behind that policy is so that I don’t accidentally incorporate open-source-licenced code or algorithms into Microsoft code and thus bind that particular Open Source licence to our products. That policy exists regardless of the type of Open Source licence attached to any particular project.

Having said that, I do not want my blog and comments page to become a referendum on Open Source and Microsoft’s corporate positions on it. I reserve the right to reject or remove comments that drift in that direction. Instead, please continue to focus on the topic at hand — namely the Mac Office 12 converters for the new Open XML file formats. Thanks for understanding.

The Open Office group has about 737 contributors according to their website.

I couldn’t find any definitive estimate on the size of the Windows Office development team size, but I would guess-timate somewhere in the 1500-2000+ range conservatively.

So to all the nay sayers and pessimists I would say you need to take a step back and get some perspective. The MacBU is working on Office 12 (4 apps here),and at least 3 other applications I can think of, all in some stage of development. And all of this with less than a third of the team of the “closest” rival application suite.

They deserve a round of drinks for their work, not a bottle of whine. Keep up the great work guys.

I believe the Mac market moves to newer operating system (and processors ) far, far faster than the Windows market.

Also while your explanations of the difficulties in porting the file conversion plugin, VBA, and Office itself are extremely illuminating (and much appreciated), the sad reality is that Microsoft looks like a bunch of incompetents when faced with the reality that third parties have or are going to achieve the same goals far sooner. (I am referring to a perception rather than making an accusation.)

More recent comments above now lead to the logical conclusion that little old Text Edit is going to be able to read and write Word docx files before the real Microsoft Word for Mac!

Indeed it would be interesting to know if someone with access to the Leopard developer release could try this write [sic] now.

From the post:
“Our apps have diverged from WinOffice over the last 10 years”

This is, IM(Seldom)HO a mistake. A long time ago, in a galaxy far far away (about 1990) Excel had the same code base on Windows, Mac and even OS/2. Aldus also did this with PageMaker (tho perhaps they didn’t support OS/2). It requires effort and discipline but in the long run less effort than trying to keep some sync between different versions. Developer work is better leveraged. Testing is cheaper. Writing documentation is cheaper, assuming anyone still writes docs for apps. And finally it means you can release for multiple platforms at the same time.

If we maintained the same codebase as WinWord, we’d still be facing the Word 6 debacle. Users also would not have QuickTime integration, any part of Entourage, MERP, Quartz graphics, floating toolbars, the Scrapbook, or any other number of items we’ve implemented over the last 10 years.

With respect to libXml, there are some additional points worth considering:

1) 18 months ago, we hadn’t made the decision of the minium OS version that Office 12 was going to support.

2) Shipping an updated version of an open source library is a non-starter, for one simple reason: if there’s a security flaw in the version we ship, we’re on the hook for that security issue without having ownership of the code. The potential legal issues are too risky to allow us to go down that route.

3) We need more than just SAX2 compliance. We also need DOM support on the write side.

4) LibXML’s APIs all use UTF8 for string arguments (or, at least, the libXML that shipped with Panther did). MSXML uses UTF16. Since Win Office is written to MSXML, we’d either have to rewrite considerable portions of the code we port from Win Office, or write UTF16 wrappers around the API’s and the callbacks.

All of those combined to make a difficult decision fall on the MSXML side of the fence.

Rick, that’s what I always thought the issue was. Office couldn’t use libxml because the source code the MacBU was using was all based off the MSXML APIs and they’re just too different.

It’d be better if this was said flat out instead of saying there are features missing form libxml. Especially since the latter contradicts observed and reported behaviour. Namely, the fact leopard, openoffice, and other apps will support the “OpenXML” formats when the far majority use libxml to do it.

Not sure about the security issue problem. Whether you ship your own version of libxml or a port of MSXML, the security issues will be the same. There’s been potential/actual security issues found in both. Furthermore, Apple is in the same boat. They ship more than one version of libxml with OS X and even though it’s third party code, they still stay on top of the issues.

I guess my point is that simply saying, “The MSXML and libxml APIs are just too different to make using libxml feasible,” would be the most rational reason for not going with libxml.

First, there wasn’t a single issue. That’s why I listed four issues that played a role in our decision. And, I’ll still stand by the statement that, at the time we made the decision, the version of LibXML that we could assume to be available on our minimum OS was not up to the task (which is actually what I said–I did not make a blanket statement about deficiencies in LibXML in general).

The difference WRT the security issue is that we own the source to MSXML. When, not if, a security issue is discovered, we can fix the issue and release a patch without having to worry about any encumberance due to various intellectual property rights (which will vary according to the specific licenses involved).

The point regarding security issues is not to claim that MSXML is more, or less for that metter, secure than LibXML. Rather, the point is the extent to which we control the ability to get a patch to our customers in a timely manner. Clearly, that’s far easier when we own the code.

It’s important to keep in mind that, unlike Apple, shipping open source components as part of our software isn’t a part of our business. So, while Apple can afford to keep a paid staff of developers whose work is dedicated to open source changes and who do not work on any of Apple’s proprietary code, we can’t afford to do that. It’s not as though anyone here can wade into some open source code, fix a problem, and we can still sleep at night knowing we have an ironclad defense should someone raise the spectre of some violation of their intellectual property rights.

By the way, simply saying, “The MSXML and libxml APIs are just too different to make using libxml feasible,” isn’t the whole truth. There were costs associated with porting MSXML, and it might well have been easier to simply write a wrapper around LibXML that did the UTF16 to/from UTF8 conversions. Of course, that’s a little hard to tell without the use of a time machine, but I’m not sure that issue alone would have been sufficient to get me to spend the time porting MSXML to the Mac.

Thanks, Rick, for chiming in on why we chose MSXML over libxml. Having not been involved in the scoping of the XML work at the time, I was not aware of the complete set of issues involved. My apologies to Rosnya and others if my comments above appeared misleading. As I’ve blogged about elsewhere, I spent most of SUmmer 2005 to Summer 2006 heavily immersed in the Xcode and Intel parts of the project.

As someone who does use NeoOffice over MS Office I can say quite happily that I have no intention of switching to Linux any time soon.

I may have this wrong but it was my understanding that parts of the Windows networking stack is built on opensource code. IANAL but if I understand it right part of the beauty of the MIT license is that you aren’t going to have any property rights blow-back to worry about if you go ahead and use it in your proprietary software, so it’s disingenuous to say you can’t sleep at night for fear you going to get sued for patching libxml.

Realize I could be wrong on both of my above points. Frankly I’d consider using office again if it wasn’t the only mac app I’ve ever used besides Quark that can’t handle network shares. You know what I’m talking about. I’m much more worried about Office saving files to any format on a remote drive vs. what files it can open from the Win side of the world.

I believe that Mac office 2007 will be better than the Windows one, as alrealdy happened with the 2004 release.You made a very good job. Many collegues of mine bought a mac also for mac-office 2004.

I have two questions:
1)what can be done with Real-Basic to fill the gap left by VB?
2) Apple-script seems to be very powerful, since it would also provide full access to other OSX resources, which have been unavailable with VB. Then, you are going to leave the effort of “low-level programming” to apple-scripts, because of all the tech reasons explained in your post.

My second question is: are there chances to see the same script-code being interpreted by Apple script in OSX and by VB in Windows?

Mac Office is a very complex piece of software as far as I can see from the features offered. Developing it while the lead of the project (MacOffice) depends on a more important project (Office 2007) not using the same technologies is not an easy one.

Now if you add the fact that the new Office file formats are standarized by ECMA and possibly ISO, you can expect both projects to be slowed down.

I wish MS didn’t choose that .net path, it sounds like MS is gradually losing all its x-platform knowledge (now concentrated only in the MacBU).

I just wish that windows and Mac could work together a bit so that there were fewer differences and thus less converters required. Could people like you put some pressure on them (I think it has to come from industry, not individual consumers like myself). Maybe we need to start a huge petition. I know that they are competitors, but competitors in lots of other industries have worked together to move from competing standards to common standards.

After reading all of these well articulated and insightful answers from MBU employees, I almost forgot I was reading about a Microsoft project… until I read this:

“The difference WRT the security issue is that we own the source to MSXML. When, not if, a security issue is discovered, we can fix the issue and release a patch without having to worry about any encumberance [sic] due to various intellectual property rights [related to libxml] (which will vary according to the specific licenses involved).”

Well here it is April 2007. NeoOffice can open and save in .docx format. My fully updated Microsoft Office 2004 of the Mac cannot. Now what pathetic song and dance will be put for to explain this piece of idiocy?

I’m sure you’re sick of hearing this, but any update on the converter release? Also, I’ve asked on the Office-for-Mac blog more than once about whether or not the converters will be available for Office X, and no one has replied.