Moving to XML-based documentation – Just what does that mean?

This will be a long explanation, but there are quite a few elements that determine the success or failure of a new XML implementation so I want to be thorough. Some of this you may already know, but just in case I wanted to cover everything.

A proper and valid XML work flow requires three things. First you need the “rules” file to control everything. The rules file is the SCHEMA file (or possibly a DTD for older XML implementations). The SCHEMA file is the final arbiter of just what constitutes a valid and legal file for a given publishing work flow. There are public SCHEMA files that can be adopted or custom/proprietary SCHEMA files can be written to a specific purpose. The first hurdle when creating a custom SCHEMA file is to make sure that a LOT of planning goes into its architecture. Once a SCHEMA file is written it is pretty much set in stone. The reason is that once you have started using a SCHEMA file the documents you create with it are permanently locked to it. If at some point it is determined that the SCHEMA file needs to be updated there is a serious risk of rendering all previous documents created with it completely invalid and unusable. XML is a very unforgiving environment and it requires absolute precision. This is just one of the reasons that most companies choose to use publically available SCHEMA that have been tried and tested over time.

Second, you then create your content files in accordance with the SCHEMA document. For a proper XML work flow you must be working on the native XML documents, not an intermediary format and doing conversion later. One misconception is that this means that the documents must have a .XML file extension. The reality is that valid XML documents can have any file extension as long as it fits within the work flow defined in the SCHEMA. You could have a file called document.fred and it could be a valid XML document. The important thing is that the contents of document.fred declare this document to be an XML document and completely and fully conform to the rules of the SCHEMA in place.

Third, the part of the process that can be the trickiest and most expensive (and the most overlooked by teams adopting XML for the first time) is the transformation of the raw XML content files you have created into a format that the customer can use (PDF, Microsoft .CHM, web or browser-based systems, etc.). This step is one of the reasons that a lot of companies choose to use a SCHEMA that is already publicly available as this usually means that there will be pre-built transforms available to allow conversion of the raw documents into customer documents. If a team wants to build their own internal custom SCHEMA then a significant amount of budget needs to be allocated to this part of the process as it usually requires the services of programmers/developers and often more than one. The reason is that the creation of a custom XML transform requires that the person creating it is not only an absolute expert on the SCHEMA in use (and all legal combinations of elements within that SCHEMA and how to accommodate them during the transform process) but they also have to be an expert on the format being transformed into. As a result this tends to be a rather specialized field. You may need one developer to write a transform to create PDF and you may need a different developer with a different set of skills to write a transform to create Microsoft .chm files. This high level of expense deters many teams from writing a custom SCHEMA and adopting a tried and tested public SCHEMA instead.

Those are the three pieces, SCHEMA (the rules), XML files (the content), Transformations (the conversion from raw XML to customer deliverables).

What we do here at MadCap Software is adhere to all of the above rules AND include all of the transformations to the common output formats in a turn-key system. One of the items that is a little peculiar to our work flow is that we don’t use a single SCHEMA for validating our files, we actually use two SCHEMA files at the same time. For any visible content we validate against the W3C XHTML SCHEMA and for all of our content management/single-sourcing rules and meta-data we validate against a MadCap SCHEMA.

Some wonder why we use the W3C XHTML schema at all? Why didn’t we just write an all encompassing MadCap Schema that did everything? Honestly, that would have been much easier for us but we wanted to create a work flow that never locked up our customers’ content in any proprietary manner. Since the W3C XHTML SCHEMA is the most commonly used XML SCHEMA on the planet it seemed an obvious choice for maximum content portability in the future. Since the XHTML SCHEMA contains no content management or single-sourcing capabilities at all we then augment it with the MadCap SCHEMA (called out in every file by name space, a fully legal and valid XML technique). This provides a best of all worlds solution, full support for state of the art single-source publishing, content that is never tied up in any proprietary manner, and a complete suite of built in transformations for creating the most common customer delivery formats without having to hire a programmer.

Hopefully this helps to dispel some of the XML myths I see floating around from time to time.