Oracle Blog

What's New, What's Cool

Online Document Collaboration

Wiki pages represent a form of online collaboration that is both easy to use and widespread. They therefore provide the perfect mental model for the kind of thing we want to do--except that a Wiki produces HTML pages, rather than structured documents.

But HTML has many limitations:

It is almost totally unstructured. There is no such thing as a "section"--a self-contained entity that can be picked up and reused, because there is no nesting at that level. A "heading" is a bit of text that stands alone. There is nothing that tells a parser how far it extends.

Presentation and content are totally intertwined, so something written in one location may not "fit" somewhere else--even if there were a way to do that. (There isn't. HTML has no notion of transclusions.)

Headings are numbered. So once again, if you tried to reuse material, the heading styles are likely to come out all wrong.

There are no conditionals or variable-substitutions you can use to single-source your documents.

So okay, we like the idea of using DITA. But how are we going to collaborate on such documents? How are we going to edit them online?

The first thought is to use something that looks like a Wiki. But no matter how interactive the authoring becomes, structured authoring in a format like DITA presents a significantly more complex challenge, for a variety of reasons:

More tags. HTML has 80, DITA has 200, SolBook has 400, and DocBook has 800. The more tags you have, the harder it is for people to deal with them effectively. DITA provides the minimal set for structured documentation, but it's still harder than HTML.

More restrictions. In HTML, you can pretty much put anything anywhere. In a structured document, a new element is only legal in certain, precise locations. Often, you have to see the tags to get the cursor at the right location. That can be frustrating, until you learn the ropes.

More links. In addition to document and image links, a topic-based authoring system has maps that link to topics for assembly, conrefs that link to topic elements for transclusions, and metadata to conditionalize things. So there are many more links and attributes to deal with in a topic-oriented system.

More things to link to. Our 16 HTML install pages, inconsistently styled, heavily interlinked, and highly redundant though they were, were still only 16 files. Their 60 pages divided naturally into 40 topics, totaling 40 pages, with 3 maps to tie them together. But while the topic set is 16% smaller by virtue of eliminated redundancy, they also give you 2.5 times more places to look for something you want to cross-reference.

The need for downstream verification. There is a downside to having a "build" process for documents. Topics provide a great deal of publication flexibility. But it comes at a cost. The ability to single-source a page for Solaris and Windows, for example, means that examples containing paths and filenames are conditionalized so the user only sees examples that are familiar. But when you add conditional metadata to a document, how can you be sure that every document that uses the topic, with every possible variation of metadata, will produce the right results?

In documentation, there is no automated way to ensure quality. There are no regression tests. Documents can only be validated by inspection. An author will probably build a version to make sure it comes out right. But that doesn't mean the build system has the same settings. If we have a change that only applies to version 7, we might add it and specify the metadata that assigns it to that version. But what if version metadata was not needed, previously? How can we be sure that every production script does the right thing we metadata we just now found out that we needed.

So there's the rub, you have a lot more flexibility at the back end of the system, and you have single-sourced docs so you never have to change anything in more than one place. But there is definite complexity to manage in order to achieve those benefits.

The additional complexity requires the tools to be smarter--and you are forced you to use such tools to deal with the complexity. The tools have to know how to manage links. They have to know how to color-code conditional views and how to restrict the view of the document to the version you want to edit. They have to know enough to include content references in what you see, without letting you edit it, but give you a way to get to the source of that content so you can make changes. They have to manage dependencies so they produce all the outputs that depend on given topic, whenever it changes.

XDocs looks like a decent CMS for the back-end processing. (See XDocs: One Cool CMS) That leaves the front-end editing.

At the moment, I'm thinking that XMetaL's online editor (XMAX) might make a good online editing solution for DITA documents. A lot depends on how the license works. If it's per server, great. If it's per user, that could be difficult, given that JavaSE has some 6 million users!

Another solution is to use a CMS like XDocs that lets you checkout files onto your local file system. Then you can use any editor you want. That system could work in a variety of scenarios.

Downstream, I'd love to see a solution based on NetBeans, in conjunction with a CMS like XDocs. That would be a powerful addition to the developer's arsenal.

Or maybe we can build an entirely new system based around a browser-based editor like Xopus, DITAStorm, or FCKeditor. (But the first thing we need to do is change the damn name. Hey, you! Stop swearing!) [note: Thanks to Dan Dupe for pointing me to Xopus.]

There are a number of ways to skin that cat, so I'm pretty sure we can get there. If we want really modular content that is truly reusable, we need to get there. (On the other hand, if we don't particularly care about achieving that goal, we could just install a Wiki.)

Resources

These sites provide updated list of online, DITA-aware editors. There are many different price points and functional capabilities:

http://www.ditausers.org/tools/web_editors/This one includes DITAStorm. (It was one of the first, but the version I saw was very limited. It doesn't seem to be getting much traction, which suggests that it hasn't been evolving very fast.)

- DocZone, \*another\* really cool CMS (although I am biased on this viewpoint, since I am one of the owners of the company). DocZone bundles an end-to-end DITA environment (authoring, CMS, workflow, translation memory, and multilingual publishing to any output format) as a 100% browser-based app that is a hosted application (pay by the month).

[Trackback] Eric Armstrong at Sun has made a great post summarizing whats cool about DITA that might be useful if you are trying to make the case for moving to structured authoring to your boss. Subsequent posts ...

As a further addition to Dan's comment, I should add that DocZone \*is\* a really cool CMS. It's ready to go, right now, with no configuration--and it already integrates with every translation memory system out there, which can save a bundle in translation costs. (That's one of the more significant cost-reduction benefits for those who aren't currently using structured documents.)

Being a server company with a lots of bandwidth and a ton of content, an externally-hosted CMS doesn't make a lot of sense for Sun. But there is a huge number of companies for which it makes a \*lot\* of sense. And DocZone is as highly recommended for that group as XDocs is for people who need to do their own thing.