Web Development

DITA: The Darwin Information Typing Architecture

By Amber Swope and Michael Priestley, February 06, 2008

New content types for creating, managing, and publishing modular content

Level 4: Automation and Integration

Once content is specialized, you can leverage your investment in semantics with automation of key processes, and begin tying content together -- even across different specializations or authoring disciplines. For example, you can share common content across marketing and training, or share common processes and infrastructure throughout your content life cycle.

Scenario

The software division of a large technology company stores their content in a CMS, which allows all the teams in the division to reuse the content. At this level, they have moved beyond single-sourcing of content and achieved multiway reuse. Product descriptions created by the marketing team can be reused by the technical publications group to create product overviews, and by the training group to create product tours. At the same time, product architectural specifications created by technical publications can be reused by training, technical support groups, and the marketing team.

The following figure illustrates how content created by different teams can be reused in multiple deliverables by multiple teams across the division.

[Click image to view at full size]

Figure 6: Content reuse across teams

Reusing its content across the teams in the division, the company can save a significant amount of money by translating the content source rather than each deliverable that instantiates the content.

Investment

Organizations need a CMS to effectively control and automate the content development life cycle. In addition to storing content and providing versioning control, the CMS provides workflow automation support that assists authors in creating, reusing, and publishing. However, the investment in implementing a CMS is non-trivial in terms of preparation and cost.

In preparation for a CMS implementation, you must understand the structure of the content and where it is appropriate for reuse. This requires a significant amount of research, planning, and coordination to identify the reuse possibilities, requirements, and standards across disciplines. In addition, you need to define a robust metadata model to support the content model and apply it to all topics. Lastly, you must have agreed-upon content development processes in order to automate them with workflow control. This requires consensus and support from all stakeholders in the content life cycle. The cost for implementing the CMS includes the following items:

Price of the CMS software

Hardware to run it and store the content

Resource time to prepare and plan for implementation

Resources to customize and maintain the CMS

Resource time for training stakeholders to use it

Although such an undertaking may seem daunting, the initial implementation is a one-time cost but the improvements in speed and efficiency will allow you to recoup the investment in a minimal amount of time.

A translation management system is another key automation and integration investment to manage and automate content localization. If you are translating content into more than one language, you must have processes in place to handle this additional work. A translation management system provides automated process management for translating content and integrates into the CMS workflow support.

To implement a translation management system, you must have a defined translation process that can scale to meet your localization needs as they increase, and you must understand the requirements for a scalable system. In addition, you must build your translation memory, which is the library of localized content.

Return

The return on investment in a CMS is the ability to reuse content across disciplines and automate the content development workfl ow. If content is not stored in a repository that provides easy retrieval through metadata, it will be impossible to reuse content across teams.

In addition to obvious characteristics such as automated status change notification and reporting, workflow support enables you to see quickly what information is reused in which topics. This crucial feature of this fourth level of adoption enables true reuse and mitigates the risk of inadvertently propagating change throughout the content set.

The following figure shows how users can share content stored in multiple repositories.

[Click image to view at full size]

Figure 7: Multiple users sharing content from multiple repositories

Traditional publishing and translation processes involve sending each deliverable out for translation. Although you can leverage the translation memory for the content in each deliverable, the translation vendor must compare each deliverable to the translation memory to determine what content is new and what needs to be translated.

If you have multiple deliverables with the same content, you pay for each analysis pass. If you have multiple deliverables with similar but non-identical information, you pay for the analysis pass, as well as the cost to translate each "version" of the information. Organizations that produce multi-language documentation can incur large, unnecessary costs if they have to multiply the number of languages by the number of versions of the content for each release.

In contrast, because DITA is an XML topic-based architecture, you send only the source topics that contain changed content to the translation vendor. This means that you can control the content in smaller units, and thus the amount of content the vendor analyzes for each language is significantly reduced. In addition, if you are reusing content rather than rewriting multiple versions of it, you simply pay to translate the original source instead of multiple versions of the same information. Content that is translated at the source rather than at the level of each deliverable, radically changes the translation cost structure. The ability to translate content at the source, combined with the ability to identify changed content and thereby reduce the actual amount of content by reuse, gives you greater control over the translation process and your overall localization costs.

By automating workflow support with a CMS and integrating the translation process, you can reuse content with confidence across teams and realize significant savings when localizing to multiple languages.

DITA Features Used

This adoption level uses the following DITA features:

Metadata. DITA provides some basic metadata attributes for all topics, including author, audience, resource ID, keywords, and index markers. Maps also have default metadata and maps, including copyright information and critical dates. However, specializations provide additional, deliverable-specific attributes. For example, the bookmap specialization includes book-specific metadata including book identification numbers and publication data.

Translation and language attributes. DITA provides the translate and xml:lang attributes to support localization. The translate attribute "indicates whether the content of the element should be translated or not."(DITA Version 1.1 Architectural Specification). The xml:lang attribute identifies the language into which the content should be translated. You can specify these attributes at the element, topic, or map level.

Generalization for cross-specialization reuse When reuse happens across different content types, issues of cross-type validation can quickly result: some of the semantics in the source may not be valid in the context of reuse. For example, a <step> is allowed in a task topic but not in a concept topic. But since a <step> is just a specialized type of list item (<l>>), you can reuse a <step> any place where a <li> is allowed by stripping away the extra semantics that do not apply in the new context. In this way, you can reuse the content of a <step> between tasks and concepts, even if the specialized semantics and structure only apply in the source type.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!