Search

The main text of this blog post is a letter I have sent to Tony Hey of Microsoft, asking him to use his influence to get specifications for older obsolete file formats published on Microsoft’s Open Specifications Page. If you support this, please leave me a comment below endorsing the letter (note, the spam filter may delete or refer for moderation any comments containing URLs).

Dear Tony,

Open Letter on specifications for obsolete file formats

I am writing to you, as the most senior person I know in Microsoft, to ask you to use your influence to ensure Microsoft adds specifications for older Office (and other) file formats to the Microsoft Open Specifications page. I have put this Open Letter on my blog (https://unsustainableideas.wordpress.com/), and (if you agree) would like to put any reply from you on that blog as well. I will also solicit further support for this letter on that blog, in the form of comments of endorsement.

Microsoft’s Open Specifications page and the accompanying Open Specification Promise were both very welcome developments, for which Microsoft is rightly applauded. However, the Specifications only go back to Office 97-2003 formats. I have some MS Word and MS Excel documents from earlier versions of Office that seem to open well in more modern software, so perhaps their file formats are compatible, at least to some extent. However, PowerPoint 4.0 files do not open at all in modern MS Office applications, and the file format is understood to be very different.

I have been attempting to convert some 50 or so PowerPoint 4.0 files to more modern formats (to migrate them, in digital preservation parlance), and have documented the process in a series of posts on my blog. The post at https://unsustainableideas.wordpress.com/2012/10/02/powerpoint-4-0-story-so-far/ sums up the exercise, and there is one further post about a small company that has succeeded in converting some files for me. At present there appear to be only two routes for migration: one relies on technology preservation (or emulation) in the form of systems that can (and are licensed to) run a sufficiently early version of MS PowerPoint, and the second is via this small company, Zamzar. Neither of these solutions can be relied on for the long term.

While my main focus in this letter is on older formats within the basic Office set, the specifications for related software such as Microsoft Works and early versions of Microsoft Access would also be helpful for preservation purposes.

You might ask: why should Microsoft put effort today into making these specifications available? I believe Microsoft’s software tools are not merely temporary mechanisms for profit in the marketplace, but (by dint of their flexibility and success) tools that the wider world has used to create billions of cultural artefacts that may be of lasting value. By declining to help make these obsolete file formats accessible, Microsoft is locking up this cultural content, and will eventually throw away the key.

Andrew Jackson of the British Library (who helped me with my initial attempts to convert my PowerPoint 4.0 files) has studied the population of older file formats in a dataset of 2.5B web resources from the UK Web Archive. He found that PowerPoint 4.0 has been persisting on the UK web until fairly recently. For ALL PowerPoint files with identifiable versions created from 1996 to 2010, PowerPoint 4.0 and PowerPoint 95 represent around 2.5%, and for PowerPoints created up to 2002 the proportion of the older formats was 27%. We can be confident that many, many more such resources will exist in private file systems.

Why should Microsoft act now? First, because the number of people within Microsoft who understand these formats must be declining. Second, the specifications themselves (to the extent that they exist as simple documents) must also be at risk of loss through accident or some grand tidy-up process that discounts older material as irrelevant. Third, because many of the early adopters who used these products in the 1990s are, like me, coming up to or past retirement. I believe there will be an increasing swell of documents from some of these people flowing into archives for preservation over the next several years. Many of these will be documents from people of much greater cultural and scientific importance than me, but who have less time and/or ability to pursue possible solutions to an obsolescence problem. Fourth, I think this is consistent with the direction you have helped Microsoft to take since joining them.

I’m also motivated by another factor: Jason Scott’s call for action to “Solve the File Format Problem” scheduled for this November (original post here and wiki page here http://www.archiveteam.org/index.php?title=Just_Solve_the_Problem_2012). Jason is a member of the Archive Team of “rogue archivists”, who attempts to save disappearing web sites, and is seeking a crowd-sourced solution to the lack of information on obsolete file formats. It would be wonderful if Microsoft could add to that information by making these specifications available in November.

What would this cost Microsoft? On the face of it, simply the staff effort to gather the relevant specifications and make them available. Of course, the documents may not exist as well-written specifications, in which case I would urge Microsoft to make as much information available as possible, allowing others to make sense of them against the ”ground truth” of existing files. It would be wonderful if Microsoft could make available a migration tool, but this would obviously be a larger effort wth longer term implications. Indeed, in the long run it might be more cost effective to support an open migration tool.

The benefit to Microsoft in doing this would be in enhancing its reputation as a responsible company that understands and acts on the implications of its past work.

Possible outcomes could include input filters for open software such as OpenOffice or Libre Office, input converters for SlideShare and others, and possible Microsoft or commercial 3rd party migration tools.

The societal benefits of this would include better preservation of a subset of cultural artefacts, a better understanding of the content of presentations in early days, which may document discoveries or encapsulate persuasion arguments for significant change programs. Ultimately, this is about a richer cultural heritage. My own presentations in PowerPoint 4.0 date from the time when I was Director of the JISC Electronic Libraries Programme, and document how we sought to persuade the community to go forward with that campaign, and some of the adjustments that were made to it.

103 Responses to “Open letter to Microsoft on specs for obsolete file formats”

Chris, we all know that this would be an important step that would facilitate digital preservationists in doing their jobs. And it is particularly timely, given that our community has recently lost our major advocate within Microsoft–Lee Dirks.

This is a great idea and I would be very impressed if Microsoft were to release their standards documentation.

I am, however, a little concerned about your statement that: “At present there appear to be only two routes for migration: one relies on technology preservation (or emulation) in the form of systems that can (and are licensed to) run a sufficiently early version of MS PowerPoint, and the second is via this small company, Zamzar. Neither of these solutions can be relied on for the long term.”
(emphasis added)

Emulation can be viable over the long term and therefore migration by emulation can be viable over the long term (e.g. this. Furthermore I fail to see why you believe that migration is likely to be any more viable. I can give countless examples of the use of emulation right now (such as for mobile phone software development), and examples of the use of emulation going back decades.

Nevertheless, to be able to have a viable long-term emulation solution we will need access to the software of yesteryear. As such I would love to see your open letter extended to include a request for access to old Microsoft software. It would not have to be without cost and perhaps could include a custom license for use only by memory institutions and/or with other restrictions.

Again, I don’t want to detract from the excellent request but rather to take the opportunity to add to it with the hope of adding even more value.

I wholeheartedly support this idea. File format specifications are a significant tool in our technological obsolescence armoury. They should be preserved wherever possible along information objects to ensure we can access content far into the future, regardless of when it was created.

I would like to suggest that you consider less far going alternatives. It might very well be impossible for Microsoft to do what you ask, for instance because there is no authoritative file spec, or specs have been changed in an inorderly fashion, or there are anomalies in the specs that would not be understood by the world nowadays (and ridiculed). How to make this a safe journey for Microsoft?

What I can think of:
– Microsoft starts giving support to specific migration problems
– specs are given piecemeal wise, related to specific migration problems, on request, and after signing a confidentiality agreement
– Microsoft opens a migration service with limited responsibility (but an estimate of the success of the migration)
– ?

I agree that efforts to open up outdated file formats would greatly facilitate the preservation, conversion and re-use of older scholarly materials (e.g. didactic materials, conference presentations). Chris makes an especially valuable point about the potential loss of knowledge residing in the heads and notebooks of workers who are approaching retirement age.

It is not uncommon for documents in these older MS formats to enter university archives and institutional repositories. Publication of their specifications would help those of us charged with preserving them to better ensure their continued viability through forward migration or emulation.

Caroline Martin (The University of Manchester Library)23 October, 2012 at 10:57#

Well done for producing this open letter. At the University of Manchester Library we have started planning for the preservation of legacy digital material so the problems described in the letter are no doubt something that we will face ourselves sooner or later.

The ability to convert older formats such as this transcends the purely scholastic interest as Powerpoint has often been favoured for producing diagrams and even presentations used in evidential and also longitudinal study matters, and the ability to recall these into the future with full authentication is essential.
I therefore wholeheartedly endorse this letter.

I hadn’t realized Microsoft has documented as much as it has till I started looking around its Open Specifications pages. Expanding their scope as you suggest would certainly be beneficial, and I’d like to mention another benefit to Microsoft: It would let other people do their work for them. Microsoft has little interest in spending money to support formats from the nineties, but if other people take up the slack in open source they will add to the long-term value of the formats, and thus give people more confidence in the long-term viability of their current formats.

Microsoft alludes to patent licensing without getting into specifics. It would give even more confidence if we could be confident that open-source implementations wouldn’t have the threat of patent lawsuits hanging over their heads.

You can bet that in this case the code *is* the documentation*. So you’re asking MS to release the source for old versions of Office. I can’t even keep a straight face thinking that.

[* Really. Even now the MS Office document format is “what the software decides it is”. That’s why the Mac version never quite displayed documents the same way as the Windows version. There was no proper documentation. And MS will have no interest in writing it now. MS have however promised that the next version of Office will finally conform to the ISO standard they bought a few years ago. I’ll believe it when I see it. ]

This is an excellent idea and I fully support Chris’s letter and his aims. Its a great initiative. If Microsoft were to release software specifications it could really be a significant contribution to the issues the digital preservation community have with preserving content in files created by obsolete software applications.

After reading this thought-provoking blog I immediately tried opening my oldest PowerPoint files. Luckily, they all were readable, but each took longer than anticipated to load which made me wonder if in a short time my files will be in the same situation. With that in mind, my support is provided to this dialogue.

Yes, it is critical that specifications for older formats be available. Like many long-term preservation repositories, the Florida Digital Archive relies on file format specifications for risk analysis, format analysis, and preservation planning.

Providing access to these older format specifications is simple, straightforward, and would be a tremendous boon to the preservation community. I sincerely hope Microsoft listens to Chris on this one and does the right thing.

Hi – as a former Microsoftie who worked in Office marketing in the late 90’s, I fear MetalSamurai is correct & that this is neither simple nor straightforward, as Jerome has described. The letter’s assertion that the number of people who might be familiar with these formats at Microsoft is declining is an understatement. It might be declining from 2 to 1 or 1 to none at this point. Euan’s answer suggests the only reliable – and already available – method for this preservation, which is emulation. In fact, it would probably be easier for Microsoft to set up a hosted instance of an older version of Windows & Office just for converting old files than it would be for them to reverse-engineer their own standard. I highly doubt there’s an old spec doc sitting around in their files. If there is, it’s full of errors. Not trying to be a spoilsport, just trying to suggest a solution that will get what you want – access to your files – reliably and quickly.

If emulation is the only practical option for accessing these files, then Microsoft should relax the licenses on its older products to allow installation and use specifically for digital preservation purposes. These older products would not compete with current ones, and preservationists would have relief from the single biggest non-technical problem with emulation — software licenses.

Hi Mark – I agree that Microsoft could do a lot to relax licensing standards, but there are plenty of old Office licenses and media available for ~$10 per. I’m going to purchase some older versions (Off95/Win95 or Off2000,Win2k) to support getting this set up in emulation. Even without some sort of hosted emulation, I believe most digital archive organizations could set up a legacy OS and file conversion PC for < $200. The harder part would be to get the various hardware components set up properly, particularly networking equipment or floppy drives.

I agree this sounds like a great idea. Lots and lots of OER is in the form of slides. Open formats will make innovations with that content possible.
Kathi Fletcher – Shuttleworth Foundation Fellow for an OER Roadmap.

This is an extremely important initiative; in my digital archiving class at the University of Texas School of Information we often work with Microsoft formats and have likewise struggled with the dramatic late 1990’s crevasse between versions of PowerPoint. Opening old specifications would be ideal for digital archives, but there is probably a deeper problem alluded to in some posts. Does Microsoft have an archives of its own where the old version control archives are kept? Might it consider creating one?

This is critical, and applauded. For data sharing, the ability to read and convert excel and access data is particularly important. Thanks for pushing this issue. Tony Hey is affiliated with DataONE, and so I will also approach him through that mechanism to push for support. Thanks for doing this!

This initiative would truly be groundbreaking and of immense value to science as we are rapidly moving into an era of open science, digital preservation, “big data”, and data-intensive science. I fully support the approach and hope that Microsoft will lead the way. Success would greatly enable science and is widely appreciated by those of us that have had to perform format conversions over time.

This is a really important issue; companies such as Microsoft and experts such as repository managers need to remember that file formats change over time and if we don’t have a way to make the files useable in newer versions of software or operating systems, then, as Mr. Rusbridge says, the cultural and scholarly record will be lost.

Long-term access to digital materials — we need it now and we will need it then. You know, in the future.

Microsoft can help set a standard of practice for others to emulate. However, as has been pointed out above, specifications may be missing, and knowledge already lost. If this is the case, then perhaps these formats need to be nominated for a rescue mission. It’s clear that the content they encode is at risk of permanent loss.

I strongly support this idea, and truly hope to see a positive response from Microsoft. Access to specifications of older formats is crucial to digital preservation and surely that is in Microsoft’s interests as well. However, I want to second Euan’s comment on emulation as a path of digital preservation. To achieve that access to software, not only specifications, will be necessary. In some cases a working emulation platform may be the only way for a file rescue mission to overcome the holes in specifications.

Having spent a good part of my professional career dealing with data curation and data conversion issues, I am strongly of the opinion that having legacy formats documented in an open and accessible manner can only be a good thing and I am happy to support Chris’s open letter to Microsoft

Chris – I’m interested in figuring out a way to help, but am confused by many of the comments on this thread (maybe people aren’t reading the other comments?). Is the goal of having the specification for the sake of having the specification itself, or is it to reveal the content of potentially unreadable files? If the the former, that may be searching for the nonexistent. If the latter, then you do not need an explicit specification document in order to do that. You just need working software that implements the specification.

Happy to support the essence of this too. As a defacto format for documents a range of MS formats are going to dominate the historical record. Anything Microsoft can share in terms of documentation and information about them would be a boon for preservation.

The existence of open specifications allows for the development of open-source tools to read older files and to convert them to standardized, preservation-friendly formats such as Open Document Format or PDF/A (or others that might appear down the road). Without the specs, such conversions are based on reverse-engineering the format and are often unreliable. As others point out, open specs also support other preservation strategies such as emulation.

I would like to add my support to this initiative. I fully support Chris’ request and look forward to seeing a response from Microsoft to it.

Like many other memory institutions, National Library of Australia has a load of files in legacy MS formats. Just a quick search in our small testing sample of files returned several PowerPoint 4 files which can’t be open with the current PowerPoint version. Any initiative which would help to solve this problem is very welcome.

Jeff: Yes, the ultimate goal is (not only) to reveal the content, but more importantly to save it in a newer, working version. If you can have access to software which can do it, then you’re saved. But what if you don’t? How much trouble (and expense) you’ve got to go into to get you there? And is this a long-term viable solution? I guess, having the specifications available would give everyone a greater confidence that such a solution can be developed, not only now but also anytime in the future. Having said that, I perfectly understand that it may not be viable neither, but if it is, it would definitely be very appreciated (as you can see from the comments).

I strongly support this initiative. It’s time to address long-term sustainability issues for born-digital materials, and to articulate problems and challenges related to vendor lock-in, software obsolescence, broken promises of backwards compatibility, and proprietary/open standards. Let’s start with Microsoft whose Office software have dominated the modern office for the last two decades. Good luck, Chris!

If Microsoft were to do this, it would be a great help to those of us working to preserve our digital heritage. Strongly support this initiative and hope Microsoft see the benefit in responding positively.

I FULLY support this. There are probably BILLIONS of documents worldwide on older MS file formats. It would be greatly irresponsible of Microsoft to end their support for them. We art talking about the possible loss of major portions of our historical and cultural record.

I wholeheartedly agree with Chris’ statement that “Microsoft’s software tools are not merely temporary mechanisms for profit in the marketplace, but (by dint of their flexibility and success) tools that the wider world has used to create billions of cultural artefacts that may be of lasting value. By declining to help make these obsolete file formats accessible, Microsoft is locking up this cultural content, and will eventually throw away the key.”

I urge Microsoft to release whatever specification documents may still exist under a public license. In the likelihood that much of this documentation is no longer accessible (only underlying the need for improved systems to ensure digital longevity) I’d also urge Microsoft to establish a policy for issuing public licenses for its library of legacy applications and operating systems (wherever MS decides to draw this line, e.g. 10 years after release?) to allow for legal implementation of emulation strategies.

Very much in support of this. I hope Microsoft sets an example that other companies (ahem, Apple) might follow. Not only is this important for cultural heritage and memory institutions, but perhaps even more so for corporate assets in legacy formats that have business or legal reasons for preservation.

I am very much in support of this. Microsoft would make a great contribution to digital preservation, cultural heritage, and a whole variety of scholarly disciplines. Its in thier interest too, surely?

I wholeheartedly support this initiative and endorse your request. It would be so useful to users and in addition to advancing preservation and digital scholarship, it would be a great PR move on Microsoft’s part.

Thanks, Chris, for initiating this outreach to Microsoft. As you have identified, open access to the older Office specifications would greatly facilitate efforts at maintaining the long-term viability and usability of many important digital assets.

I endorse this letter to Microsoft as a representative of nestor (German competence network for digital preservation).

It is important to stress, that Microsoft could benefit with little effort from a policy that makes outdated file format specifications available. Microsoft would gain trust amongst the users of its file formats, if migration and preservation becomes easier.

Trackbacks/Pingbacks

[…] next issue is, as highlighted by Chris Rusbridge in his Open letter to Microsoft on specs for obsolete file formats, the OSP doesn’t cover older file formats. So if you were an earlier adopter publishing OER in […]

[…] years ago on an old Mac, to a current format (he found a company that could do it) and also his open letter to Microsoft about publishing the specifications for old versions of their file formats. The latter had […]