Posted
by
CmdrTaco
on Saturday June 09, 2001 @12:50PM
from the now-who'da-thunk-of-cooperating dept.

An anonymous reader writes: "One important aspect of Free software is open collaboration and the pooling of efforts. There are several open source word processors available and they all need to import and export the ubiquitous MS Word format. To try and avoid duplicating efforts, developers from the Abiword, wvWare and Kword projects have been talking with regard to pooling their efforts in
writing filters."

Idon't understand why the makers of Office-like applications haven't done like the CAD-business. They created the OpenDWG [opendwg.org]alliance in order to reverse-engineer Autodesk's proprietary.dwg-format for storing CAD-drawings and succeeded with the task. Mabye an OpenDOC (no pun intended, apple) alliance would speed up the acceptance and usability of open alternatives to MS-Office.

Most commonly used to parse (unambiguous computer) languages, but a word file is alot less complicated then a language I can assure you:)

Actually, I would estimate the complexity of the Word format as greater than that of the English language (even including all the variants). It's the most incomprehensibly complicated, poorly documented (and frequently misdocumented), train wreck of a file format in the history of this universe, which no one could ever possibly hope to merely even make hypothetical conjectures at its actual implementation. It is a manifestation of evil; there is no other explanation. (By the way, I have code in wvWare.)

While it was originally a latex front end, lyx is now pretty much full-featured. It still prints by exporting X, but can output another couple of formats as well. It also imports almost all latex (I don't think there are any known problems). There isn't even a vague interest among devlopers to import word, though . . .

That is, when the different word processors from the different desktop environments save, they should save to the same file format.

Another advantage of this approach would be that all the word processors could use the same (or a very similar) filter to and from other non-*nix software. Sounds like a good idea to me.

BTW, doesn't Open Office use a compressed XML file to store it's documents? I thought I read something about Word2K using XML as well, but I could be wrong there. All that being said to say this: Isn't one of the promises of XML supposed to be improved sharing of data? This could be a good use...

Am I the only Slashdoter here who knows that KWord has been using XML as it's native format since the beginning? Honestly, you can try this yourself.

1. Create a file.kwd in KWord. Make it complex and add pictures and stuff.

2. Rename it to file.tgz

3. Uncompressed and untar it and viola, you have an XML document and a bunch of picture files etc...

The rest of KOffice works this way. Negotiations are still on to get all the Free office suites on Linux to unite on a single file format. I like the KOffice scheam because it inherently produces small files (already compressed). Others have favorites.

As for filters. I think we should have a separate program for importing the dreaded *.doc files and have all the office suites call this program for that task. Why should they all waste time redoing the same function that we would prefer not be needed at all? (I.e. MSWord not so cumbersome and convoluted in it's document formats)

IMHO, antiword [demon.nl] is by far the best Word-to-Ascii converter out there. It even renders footnotes, can be used in pipes and is much faster than wvWare. The program is GPL and comes for a variety of OS platforms. As the moderator of a mailing list, I regularly use it to convert *.doc attachments. (One should patch majordomo so that it automatically filters *.doc attachments through antiword. It has worked flawlessly for me since more than a year.
It surprises me quite a lot that such a superb program is so litte known in the Free Software community.

By all means use a compressed save format, but
don't just gzip XML, a tar file or some
other standard format. Every application
should have it's own unique file format header,
that's easily parsable by file(1). Otherwise,
we're headed down the windows road, where the only
way to identify a file is by its extension, and
that's somewhere I really don't want to go.
I'd be quite happy if gnumeric/kword used a header
to say "the following block in this file is N bytes long and is a zlib compressed XML reqpresentation
of the data". But just using gzip plain sucks.

Whoever sends or points at a.doc file more than likely has the capability to create a more useful version of the document. Simply state that the document should be published in a file format that is actually standardized upon.

This is all well and good (and I used to do it myself at school, where I am more or less justified in demanding that stuff be available to Unix users). However, using linux at work is very different. If everyone you work with is using some MS crap, the only way you can reasonably expect people to let you keep using linux is if you don't cause any hassle for anybody else. This means converting the docs yourself, either using your owm copy of MS Word (which was installed on the partition of your disk that you shrank to make room for a linux or BSD system:).

I was in that position last summer, and I had to use Wine (for Lotus Notes, which is actually an interesting program), and boot into windoze every now and then to use excel. Lotus Notes mostly works with wine, and has an excel and word viewer, so that saved some rebooting. Converters like antiword are also useful. I sent stuff to other people in HTML format or just ASCII email, since the stuff I had to write was only stuff like short reports on technical stuff. I would have pulled out LaTeX and made a PDF if necessary.
#define X(x,y) x##y

It is also deep structure. Abiword,for eg, also uses xml to markup its content but the resulting file resembles html much more than it does "data marked up with meaning" which is the essence of xml/sgml.

Problem that requires solving is not to replicate the type of replacement but to alter the interface of wordprocessors so that they allow you to highlight structures such as chapters and sub chapters, and so forth. None are offering this at all.

The current free offerings perpetuate the visual only representation of data.

While its true that with some farting around you can write additional xsl to transform simple markup ito something else, in practice this hard: you have to make assumptions like when you see this means insert another etc. Always breaks.

You may as well stick to html with embedded css if you don't allow external dtds.

XMl is just _not_ "fancy". It is a step in the direction of machine-readable "meaning".

HTML is a visual markup language concentrating on the concept of bigger and larger fonts, laying out the page etc.

XML does not inherently define layout - but it does allow at least one further tool to make layout happen. Typically this is xslt (there are others) and the output will probably html but could equally by more xml, rtf, pdf, ascii, csv, sql etc etc.

You *use* the processing lang you already know to manipulate the xml data. Thus all the biggies have or are soon to get the tools needed to use xml.

Thus you transform xml using an xslt processor itself written in java, python or c.

XML's implementation often proves the point that standards are often best written slightly afterthe fact but it really is usable now in a way that seemd very distant only a few years ago when the hype was *really* crazy

They were talking about setting up an email mailing list, where they could talk about problems. They said that they could not do a library, because they user C or C++ and different technologies. They did mention that they would run into the same problems, and that they would discuss them on the list.

So there would be 3 different efforts still, but they would share knowledge with each other.

So what will they do when MS.net is up and open and people are using that?

Well, that's just a perfect example of how the so-called "lameness filter", while frustrating good users who want to post brief comments, lets crappy posts through like a sieve.

How can the Slashdot editors criticize web-porn filters and Napster filters for blocking the wrong people when they do it themselves?

Dump the braindead heuristics. If you really want to curb AC abuse, make it so that AC posts don't appear on the main page until a logged-in user "adopts" the post and any karmic moderation that gets done to it.

I can't believe that people still care about what kind of language they need to code in.

C/C++/Java whatever. Doesn't it make sense to make something like a document transformer into a small CORBA service, talk XML for the result document and then we don't have the non-sensical language wars. I don't need to know that the document tranform was coded in language X. I just want it to work dammit.

Playing catch-up doesn't help set standards or even acquire market share

You're right. What I'm imagining is similar to what happened when the IBM PC BIOS was reverse-engineered: once we have very good compatibility, we can set the new standard. People (avg office-worker people) are sick of Word's Feature-itis anyway, and wouldn't it be compelling for a company to get to stop paying for Office entirely -- and just use, oh, AbiWord? Fast, efficient, does what you want it to, compatible with the Word everyone uses (95, 97), and free.

Not to mention: I really have doubts about companies wanting to store their documents on Microsoft servers across the internet, which is what MS is apparently planning.

Side note: never thought about this before, but just imagine: with the DMCA, reverse-engineering the IBM BIOS would be illegal, wouldn't it? No PC clones! I have to admit, sometimes I'm only inches from becoming a Libertarian.

Yeah, imagine what a horrible world it would be if everyone used the same format and we could interchange documents without any problems.

Uh, only if we're all using the same very latest version of Word on the same very latest version of Windows, on the same Microsoft-approved Intel-supplied hardware -- and then we get to play a big game of Simon Says -- "Microsoft says: okay everybody, time to upgrade, please enter your credit card number here."

I think what a lot of people fail to realize is that Microsoft has just as much right as anyone else to set standards.

The problem is that their "standards" follow the form of "here's our magic new standard format, it'll sorta do most of what you need, but only if you use it with our software. Don't bother trying to figure out the details of the format, because we'll change it at our whim, every so often, just to make sure that no one else's software will work with it. Even older versions of our own software won't work the the latest format, so everyone in your company will have to upgrade."

Microsoft doesn't have standards, they have proprietary formats. They don't want to promote and use open standards, they want to own the "standards". If they were willing/able to play well with others, they wouldn't be as hated as they are today.

Yeah, imagine what a horrible world it would be if everyone used the same format and we could interchange documents without any problems.

Hey, if this were the case, I would be happy. My point is that you can't even interchange documents among different versons of Office.

Plus, as speaking as somebody that has actually had to work on the DOC files, I'd much rather a common standard be due to some merit other than monopoly bullying.

If the unified document format needs to be extendend, the desktop environment groups can get together and agree on something so the file format will remain consistent. Good luck getting that from Microsoft.

While it's great to see collaboration done for importing and exporting Word documents, if they really want interoperability, they should agree on a unified document format. That is, when the different word processors from the different desktop environments save, they should save to the same file format.

The reason while Word's DOC format is so important is because it's the de-facto standard in the Windows world. I'm hoping we're not looking to make it the standard *nix world, too.

So, it just makes sense that all the developers get together and agree on a standard format so whether or not my coworkers and I are using Gnome or KDE or whatever, we don't have to go through yet ANOTHER set of filters.

This simply isn't the case. We've agreed on quite a few things, and we do hope to get OO people involved too. Some things that we've agreed upon:

1) We're going to use C++, and maybe do a C api too
2) We're going to use libole2
3) Should we need an XML parser, we'll use libxml2

There will be very little duplication of effort. Abi and KWord will both use libwv2.so and have their own filters that hook up to libwv, but the majority (98%) will be entirely shared, just like is the case with and shared library usage.

Wordperfect 2000 is it's own can of worms, and frankly isn't worth the price. WINE is far to finicky to work reliably on that large of an application. Many things flat out won't work for me. Trying to open up a power point file in the presentation app caused a crash every single time. So did exporting to pdf. On top of thise I've only seen a patch for fixing the install and uninstall problems present on some distributions, nothing else.

What about reverse engineering catdoc [davecentral.com] or Word2X [alcom.co.uk]? I've been able able to open Word files without a problem with them, and when I need to save I download the files to my laptop as text to save them under Mickeysoft, otherwise I try to save them with StarOffice (which borks things out every here and there).

The program could use existing code with a tcl or Python shell to get it done, maybe someone should contact the authors of the programs (Word2X, Catdoc) and come up with a collaboration.

Ok this is probably way off topic, well it is, but I'll put some of my strong points on my arguements over XML, which are strongly opinionated (as is everyone's). One of the biggest problems I've seen with XML is that, many have already created massive content on existing languages, whether its XML, Python, Perl, HTML, and many have invested a large amount of money into the already existing languages.

In order for a company to feasibly make the move over from $INSERT_LANGUAGE_HERE over to XML would mean that their programmers would have to know it meaning it would cost them more to pay for their education in it (even though they could learn online please here this out) or hire someone familiar with XML.

Looking at the current scenario, many companies have done well without it, not to say it shouldn't be used, but just to give everyone a reminder on it. It's always going to be an extremely opinionated arguement, and points/counterpoints could run on for years. Same arguements go for JAVA and others, you don't neccessarily need them for one, and just because someone uses X or X becomes a pseudo standard should not mean that programmers should focus on X and forget the core basics of it all.

UML, XML, HTML, CSS, COOL, JAVA, it all boils down to needs, and XML is not really a neccessity, and soon there'll be another acronym toting the same claims as the existing ones, "The Next Best (overhyped) Thing"

Sorry if I sound like a troll I'm trying to be as sincere as possible about my thoughts on it, without sounding anti-anything (XML, or other) just my notes on it. I think the programmers should stick with the basics without getting all fancy.

Your assuming things will move over to XML, and everyone is going to use it. Let us not forget about the standings when it comes to creating a so called standard, shtml, WML, and all those other acronyms I care not to type.

said that they could not do a library, because they user C or C++ and different technologies

Why not just make a standalone app as a filter. It could accept word documents in, and output an XML formatted document and jpg images for images embedded in word. The XML doc could be an open standard, parseable by all open source word processors.

LaTeX has nothing to do with XML. LaTeX is a document preparing system based on the typesetting engine TeX which was developed in the 80s by Donald Knuth (who else). XML is a much more recent innovation.

XML is not an alternative to Perl or Java. Those are programming languages. XML is a markup meta-language - a set of very simple ground rules for defining markup languages. It is already very useful. I'm writing an app that receives messages from a custom Windows app. Although the Windows programmers and I hardly share anything in common (they don't know what fork means, for example) we were able to agree on an XML message format with no difficulty. And in case you're wondering, none of us really understand DTD's or the finer points of XML. If XML did not exist I'd probably be asking for messages formatted like RFC822 headers (Key: Value) and we'd run into endless problems with newlines, CRLF etc. For decades programmers have been making ad-hoc markup languages and writing cheesy parsers that work 98% of the time. XML, which has exactly five reserved entities, lets us save a lot of energy and use proven standardized parsers. There is very little to know about XML and it's nowhere near as complex as a programming language. If you've made a web page, you've written something close to well-formed XML. The only difference being that in XML every element must be matched by a closing element or contain a trailing slash. So <P> would become <P/>

Odds are against you. Corel sells WordPerfect for money; if AbiWord and the rest become viable contenders, who wants to spend the money for WordPerfect? It is arguably in Corel's best interest for all the free word processors to have lousy filters for as long as possible.

How are the WordPerfect filters? If they suck, then Corel could rationally join the filter crew, since good filters would then benefit Corel as much as anyone else. At least in that scenario there is some clear benefit to Corel.

Of course, if the decision is made by a stereotypical boss figure, Corel will mind its own fish and stay out. Why do something new and different? Could be risky. Continuing to do the same thing is always seen as safe.

I like it. This nicely end-runs the problem of library compatability for C++/C/whatever. And under Linux, at least, firing up a new process is fast, and you only run the import filter when opening a new document, so there would be no issues with speed.

Why should you need it? Word has always been able to read an RTF. So if you write a document, export in RTF and send it to a Word-addicted coworker, he should be able to import it into Word with no problems.
The problem is that then he will want to send you back the modified document. If he used full-power Word (e.g. using the change bars to hilight the changes), even if he is willing to convert the doc back in RTF, lots of fomatting info will be lost.

Wy do not use an embeddable very high level language (perl(?), python, ruby)?.
A word filter is something that will need to evolve fast (to keep with changes in the original
format) and will need to be very hackable (to cover the special cases you did not think of). It does not need to be super-fast. All this calls for a VHLL, IMO.

As another WP user from way back (WP 3.5 for the Mac is still my favorite werp of all time, with 6.0b for DOS running a close second) I say, "Hell, yeah."

Combining this with the above thread on XML... maybe what the world really needs, in addition to open source suite projects, is an open source file translation project, with the goal of being able to convert all the common (and some not so common) formats to and from XML? An OpenDataViz kinda thing... Something like this that really worked, and was genuinely open and cross-platform, with people contributing new modules to it for ever more obscure file formats (need to put your dBase files into a ClarisWorks spreadsheet? We can do that) would solve a lot of problems.

If you like micro-controls, try WordPerfect and open up the review code screen. Not close to TeX of course.. but you can fine-tune things you can't do in Word. And just like TeX, WordPerfect is great for producing long and huge documents. One tends to mess things up in Word (especially itemization).

Don't think WordPerfect is based on a customized version of TeX. But I like to know the answer too, if someone knows it. The reveal code screen sure looks like TeX commands in some ways (eg, [BOLD]this is bold[Bold] vs { \bold this is bold}. Or rather, WordPerfect is closer to TeX and HTML, and its variants and descendants, than other word processors.

How about also enlisting Corel? Corel already have conversion routines for many formats but in their new cash strapped state you have to wonder how much it hurts their bottom line to keep doing all the reverse engineering. Maybe they are wrapped up in NDAs so that they could do nothing for a Free Software project, but if not perhaps ALL the Word Processing producers should combine their efforts in creating a libwpfile which converts all participants formats from/to an independant format AND holds the best reverse engineered conversion for all formats that don't want to join.

Exporting to TeX is straightforward. Importing TeX is very tough, because TeX is a programming language, not a representation. It's hard to do anything with TeX except run it, which renders output. This loses the document structure. The same is true of PostScript.

In the old days (windows days) every application semmed to be ablt to "talk to" and "incorperate with" each other.

Later I found the true way. There came Linux and after a painful 6 months I was able to do most of my job on the command line. The applications were still coorperating with each other. Oh, yes there was X, too with ugly but "coorperating" motif applications.

Then the dark side of the code emerged. We were all bound with project who do not like each other and all duplicating efforts. (see: KDE, GNOME and 80 million media players). I was unable to undestand all the *.desktop and *.nautilus horrors.

At last the sun starts to shine again. People start to realize that choices are good (vi/emacs/rhide) but code duplication (KDE/GNOME) and uncoorperation is not (*.desktop, *.nautilus).

The OpenDWG effort is laudable, but last I checked, the public won't get source to the library. Apart from the library not being available for the platform I use, it's not very sustainable: what if they fold? What if you upgrade and the libraries are no longer compatible with your new OS?

The announcement linked didn't mention XML but I agree with you--this seems like the right thing to do. For almost anything that MS Word formats you could duplicate it exactly using html+css1, and I think this should be a priority. The thing is, this would make an excellent independent project; you don't need the gurus of free office suites to muck around with this. You don't even need to know anything about their particular software at all.

I wonder if the recent propaganda assault by Microsoft is drawing the open source/free software community closer? There have been a spate of these "new cooperation" stories lately. Perhaps differences in philosophy and direction start to seem pretty minor when Microsoft conspicuously brings its ion cannons to bear...

Another point -- are we talking filtering one way or both? I'm thinking the cleanest way to go back is RTF export (which presumably already exists on all platforms) but where can you get an rtf->Word filter (probably to Word 97?)?

If Microsoft had any interest at all in interoperability there'd be a.doc file standard on the shelf next to Adobe's PDF definition. This is like Samba -- Andrew Tridgell wrote the original using a packet-sniffer on a DEC Pathworks server, as I recall. That's reverse-engineering for you.

And what -- sit by and never be able to handle Word documents? Unfortunately, there are still a good number of people who want to see, for example, resumes in Word format. (Even tech HR people sometimes insist on that, though I'm inclined to write them off as clueless...)

It's like being a Mac user or, I don't know, a non-American. Your average Mac can read a PC disk, but it doesn't usually go the other way. Meanwhile, your average USian speaks English and *maybe* Spanish, which means the rest of the world has to learn English to communicate with us. Good, bad, it's the reality -- it's great that Sun eats its own dogfood by using StarOffice internally, but file exchange is pretty important, and MSWord is the number one format to translate.

That's what I mean about "not everywhere it should but everywhere it can". It would be nice to start legacy-free, but it's not practical to replace everything in the world. Scientific programmers still use Fortran, for example; you could argue that C is better just because of the compiler tech or whatnot, but the fact is that it's entrenched.

You are quite right about the Next Best Thing problem, of course. But somewhere in the alphabet soup someone does find something useful. Linux for example -- it was one of a decent-size handful of projects like it, but it had a few features that stuck out: GPL, open development model, etc. It worked. That's where the dotcom bubble came from too -- though many an investment manager can be faulted for losing all trace of common sense and throwing old economy rules out the window prematurely, the basic idea was sound (if hilariously sloppily implemented): if you have no seeds, throw water at dirt and see if anything edible will grow.

The things you mentioned... I still don't get the whole UML thing; it sounds like a bureaucratic construct of roughly the same nature as flowcharts (when was the last time you saw one of those in use?). HTML is a standard and it's not going to die out as long as the web is still in service. CSS... it's a seedling, if we follow the above metaphor. We don't know where it's going to lead (I'm a bit suspicious of it myself because I don't like complicated HTML formatting; some kind of server-side processing might be better). COOL/C#... another early-stage seed(ling?). Microsoft might yet cook up an open standard from it (though I wouldn't bet on it), but I don't think it's going to fully displace Java.

Lots of technologies do get thrown out there. I happen to think XML, while maybe not likely to become entrenched where it was intended, will still wind up being a very popular way of structuring data.

Okay, I was a little naive the way I stated it, *but* it's also reasonable to say that Fortran performance is what it is because it's entrenched. If it isn't necessarily the right tool, that doesn't mean it can't become the right tool. As Fortran did.

This actually makes a lot of sense, especially when file formats are starting to move to XML-based formats (see OpenOffice) -- just translate the Word format to XML (or whatever) as an intermediate format.

Come to think of it, this would make a great project; anyone know what would be needed to write msw2xml(1)? My perl skills are becoming a bit rusty...

Except that XML does seem to be an actual up-and-coming standard. OpenOffice is building their doc format off of it (thus my choice of XML as a hypothetical), Apple's using it to write config files for Darwin, Mozilla's UI is built using an XML variant (or so I've heard)... you get the general idea.

It's a question of where the nucleation sites develop -- WML is out there, but there's no call for it since the Wireless Web is a nonentity (at least in my social circles -- for all I know it might be vastly different in, say, Finland). And we've had browser implementors shoving extensions down our throat, but realistically... when was the last time you saw a tag?

So I think assuming XML is heading in the right direction. It won't show up everywhere it *should* (I remember someone on/. wondering why RFC2821 (new generation SMTP, in case I got the number wrong) wasn't XML-based, which is the Right Thing for the most part but would break every MTA in existence) but I think it will show up everywhere it *can* for the near future.

(I do think XML is a bit skanky, btw, but it's like C -- it's there, it works, and it's a good starting point for future designs.)

It's hard to do anything with TeX except run it, which renders output. This loses the document structure. The same is true of PostScript.

This needn't be true, though. Since you have the TeX source, you should be able to come up with an output method for TeX which will output to, say, an Abiword doc, and simply outputs the information for the document structure along with the text and so on, rather than generating, uh, whatever it generates. Is that PostScript? The same is true of Ghostscript and PostScript. I'm not saying it'd be trivial, or that I could do it, but it's certainly possible.

Also; Why do libraries? Unix gives us STDIN and STDOUT for a reason. Just make any filter an executable. If you're filtering a document, and not a stream, you don't lose anything by that methodology, and the added beneficial side effect is that you end up with executables which anyone can use in any project, from a shell script to C++ to Java...

Amen. One of the biggest issues I run into in tech support at my company is outside legal offices using ancient (or even not-so-ancient) versions of WordPerfect.

Since my company is standardized on Word, we actually have a couple of secretaries whose sole duty is to convert WP to Word, and then correct all the mistakes from the conversion. WP can't import Word properly, and Word can't import WP worth the dead snail on my porch, so the whole company ends up pissed at both. (Rumor has it that we paid for one of our primary outside counsels to convert to Word because of the sheer volume of documentation involved.)

There are several open source word processors available and they all need to import and export the ubiquitous MS Word format

What about WordPerfect's *.wpd format? Yes, I know -- WordPerfect is available for Linux, and for free. But a lightweight, open source word processor along the lines of AbiWord or kWord would be real nice if it supported wpd files.

Well, if you exclude, ANSI, the ISO, and all the other public standards making bodies from "anyone", maybe.

Think about what you're saying. Do you really want to wait for ANSI or ISO to set a standard before any software can be written? Should WordPerfect have told ANSI that they were about to release a word processor in 1985 that had such and so features, and then let them get back to them on a standard format before they implemented those features?

Or Netscape -- people have criticized them for extending HTML, but how much longer would things have taken if they had waited for some committee to take years to set a standard, rather than going ahead and implemented https?

When it comes to technologies, particularly software technologies, it is often best for standards bodies to be reactive rather than proactive. A good example is Standard C, when the standards bodies formalized existing practice.

Microsoft is often criticized for "embrace and extend", but every company does it. Because you often have to extend a standard in order to implement new features that simply don't work within the existing framework. Again, see Netscape/HTML.

I love AbiWord for reading MS Word documents and writing quick letters, etc. I think it's a great program, and it reads the Microsoft.doc format quite nicely. But one thing that all open source word processors have omitted, including Open Office, is WordPerfect document support! Sure, I can get WordPerfect for Linux [corel.com], but isn't the point of Open Source that you shouldn't need to be tied to a single proprietary piece of software? Isn't that what the freedom is all about?

For one reason or another, I can't get WordPerfect 8, the personal edition available for download, to install on my Linux box. Perhaps it doesn't like Mandrake 8, maybe it's my own ineptitude (I've been running Linux as my primary OS for about 4 months now), but it just won't cooperate. I wouldn't mind purchasing WP Office 2000 for Linux, but if I can't get WP8 to install, that tells me that WP2000 might suffer from the same problems. Given the average return policy of most software stores (i.e., no returns once it's open), I'm extremely hesitant to spend upwards of $100 on software that may or may not work on my machine. But I've been using Word Perfect for over 12 years now, and need WP file support. Right now, the only way I can get it is by booting my Windows partition and using WP2k for Windows.

So developers, if you're listening, Word support is great, but don't forget about those of us who haven't used Microsoft (at least for word processing) for a long time!

WordPerfect, because that's the word processor I like, and it prints well for hard copies.

html, because it's everywhere, and even M$ Word lusers can read it.

When I email my resume to someone, I attach the html version, and if the want ad specified a Word format, then I politely explain to them that I can't provide it as a.doc because I don't have that luser app on my hard drive.

Oh, and I know someone it going to protest by saying that WordPerfect can save to.doc or.rtf, but it really destroys the formatting, which to me is half the battle of getting potential employers to actually do more than glance at a resume. If they see something with the indenting trashed and different font sizes from one page to the next, all they are going to do is toss the resume in the round file.

Didn't know that, thanx for the info. Just checked it out and it looks quite ok (the exported file when ran through latex looks quite nice:). Never seriously looked at kword before, will do that now. Kudos to them!

Cool, I hope they'll also start supporting importing and exporting to TeX. Maybe then the stuff will start looking professional*. Kidding aside, it's a shame that the word-processing crowd is ignoring the best type-setting system around. WYSIWIG documents just don't cut it compared with a doc prepared in LaTeX.

* professional as in 'professional publisher', not as in 'professional marketeer'

There's compiler writing tools. There's GUI building tools. There's class frameworks out there for just about everything. Maybe we need file-format interpreting Meta-Tools and some codified domain specific knowledge for this problem.

This collaboration is a good start, if they concentrate on not only coming up with filters, but discovering HOW to come up with a good filter.

Sure. If things keep going at this rate, Open Office could write a Word importer and exporter and finish it about the time MS is releasing the NEXT version of Office. Playing catch-up doesn't help set standards or even acquire market share.

I didn't know Word sucked at importing WP so much. Luckily I've didn't have to import anything from WP to Word on my Mac when I bought it because the only.WPD files I had were stuff from when I was like 5 years old messing around on the 286. When ever I have to use a.DOC file on my parent's PC I just open the copy of Word that Conpaq was so nice to install for us...

What KWord and that need are filters for other formats, paticularily WordPerfect 6-10. The thing about WordPerfect is that once you get WP6's format working you can open WP7, WP8, WP9 and WP10 because Corel never changes the format, unlike MS.

Without this my Dad can't switch to KWord or anything else (doubt he would want to though, he like WP8 too much) because he is an Auto Teacher and he has about 10 years worth of tests and stuff in WP format dating back to WP 5.1 on a 286 and DOS 5.1. (I remember that 286. Orange and black monitor. Those were the good old days.:-) And I know WP runs on Linux but everyone that I know hates WP for Linux.

While it would be possible to convert them all to RTF or something, he has hundreds and hundreds of files it won't be easy or fast.

What RedHat and others can to focus on is telling the Average Consumer that Windows XP is violating their privacy, among other things. Every few days I tell my dad about Windows XP's evil features (Such as Hardware ID stuff) and he considers switching to Macs or Linux more and more. But again the biggest thing keeping him back is lack of ANY WordPerfect format compatibility. (Minus WP it's self). The biggest thing keeping me form switching from my Mac and Word is lack of good consistant GUI.

I should stop rambling on and sum my post up: WordPerfect compatibility is important too!

I would like to agree with other posters. A common format between open source Linux WPs would be a big bonus! And it would make writing a Word2LinuxWP filter much easier.

What is the difference between LaTeX and XML? Aren't both specialized subsets of SGML?
Does anyone have any links?

One thing I will give XML, you could specify your WP as conforming to a given DTD version then as people add more features to the DTD you can release a new version of the WP that has support for the DTD features. This would drive the market in a feature oriented way without breaking much. Plus if it's in XML you can verify the ducument is well formed even if you couldn't edit or display some advanced tag. Of course the ultimate is that your WP would be a big component manager and you could 'plugin' new document features when a new DTD is approved.

Aside from all of the usual "Microsoft is evil" banter, it is very true that such corporations are not at all concerned with interoperability and compatible formats. Seriously,what standard has Microsoft pushed? And by that I don't mean a de facto standard like "Win9X is installed by many OEMs so it the standard as far as 'X' is concerned". What contributions to specifications and standards organizations has Microsoft applied?

My guess is that like all dominanat entities, they will change when outward circumstances force them to do so (RE: economically feasable). They control > 90% of the desktop and office software so don't count on standards or cooperation. Notice, however, some time back that they were pushing Internet chat standards merely because they had not the marketshare AOL enjoys.

"what good does.doc format do for _anyone_? "I agree, however over 90% of the market uses this format. Though not the best, it is the leader and we must recognize or fight that. We can't pretend it does not exist.

Im really happy to see this type of collaboration. It is only good for projects. I feel that Kword could benefit the most, as Abiword seems to do the.doc "thang" better for me. Glad to hear this is happening, and I hope to see more of this example.

I agree, and it shouldn't be too hard to implement WP filters, either, since Corel seems to have been able to keep the same file format since WordPerfect 6.0.

Incedentally, WordPerfect Office 2000 works fine on my Mandrake 8 box and it's wonderful. WP8 is sufferent from some serious bitrot, and it relies on some really old libraries that haven't been shipped with any Linux distro I've used in the last 2 years.

StarOffice 5.1/5.2 will import WP documents, but they don't do any exporting, I'm afraid.

If I had a nickle for every.doc file I have seen that wasn't a Word document, I'd be rich.

A preamble embedded in the file is a better way to go and even better is to use some sort of universally recognized structure like XML or TLV. Either way, file browsers can read the first few bytes of the file and find out what it really is, instead of trying to guess based on the extension.

As far as compressed files are concerned, 'file' already has a '-z' flag to make it look inside compressed files. This should be expanded to to include gzip'd files etc.

By the way, Microsoft is moving towards embedding information with their CLI Metadata.