Abstract

This document provides a set of guidelines for developing XML documents and schemas that are internationalized properly. Following the best practices describes here allow both the developer of XML applications, as well as the author of XML content to create material in different languages.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Feedback about this document is encouraged. Send your comments to www-i18n-comments@w3.org. Use "[Comment on xml-i18n-bp WD]" in the subject line of your email, followed by a brief subject. The archives for this list are publicly available.

Publication as a Working Group NoteDraft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Appendices

1 Introduction

This document is a complement to the W3C Recommendation Internationalization Tag Set (ITS) Version 1.0[ITS]. However, not all internationalization-related issues can be resolved by the special markup described in ITS. The best practices in this document therefore go beyond application of ITS markup to address a number of problems that can be avoided by correctly designing the XML format, and by applying a few additional guidelines when developing content.

This document and Internationalization Tag Set (ITS) Version 1.0[ITS] implement requirements formulated in Internationalization and Localization Markup Requirements[ITS REQ].

This set of best practices does not cover all topics about internationalization for XML. Other useful reference material includes: Character Model for the World Wide Web 1.0: Fundamentals[CharMod], and Unicode in XML and other Markup Languages[Unicode in XML].

1.1 Who should use this document

This document is divided into two main sections:

The first one is intended for the designers and developers of XML applications (also referred to here as 'schemas' or 'formats').

The second is intended for the XML content authors. This includes users modifying the original content, such as translators.

1.2.2 Users and authors of XML content

Section 3: When Authoring XML Content provides a number of guidelines on how to create content with internationalization in mind. Many of these best practices are relevant regardless of whether or not your XML format was developed especially for internationalization.

Section 4.1: Writing ITS Rules provides practical guidelines on how to write ITS rules. Such techniques may be useful when applying some of the more advanced authoring best practices.

2 When Designing an XML Application

Designers and developers of XML applications should take into account the following best practices:

Make sure the its:translate attribute is defined for the root element of your documents, and for any element that has text content.

It is also recommended that you define the its:rules element in your schema, for example in a header if there is one,one. The its:rules element and within that the its:translateRuleelement. Content authors can then use these elements to globally change the default translate rules for specificof elements and attributes.attributes globally.

Provide an ITS Rules document where you use its:withinTextRule elements to indicate which elements should be treated as either part of their parents, or as a nested but independent run of text. By default, element boundaries are assumed to correspond to segmentation boundaries.

Make sure the its:ruby element and its children areis defined for all elements where there is text content.
It is also recommended to define the its:rules element in your schema, for example in a header if there is one. The its:rules element provides access to the its:rubyRule element which can be used to associate ruby information with elements and attributes globally.

It is also recommended that youto define the its:rules element in your schema, for example in a header if there is one, and within that theThe its:locNoteRule element and its related markup. Content authors can use this markupaccess to specify localization-related notes. Within the its:locNoteRuleelement, notes can be storedused to in the its:locNotenotes element.

It is also recommended to define the its:rules element in your schema, for example in a header if there is one. The its:rules element provides access to the its:termRule element which can be used to override terminology-related information globally.

If authors can use a proprietary mechanism for this, make sure it is covered in the ITS rules document provided for .

Make sure you define a span-like element in your schema that will allow the authors to associate arbitrarya delimited run of content with language-oriented properties such as directionality, or language information, etc.identification.

If no span-like element already exists in your schema, you may be able to use its:span.N/A

Make sure you document the internationalization and localization aspects of your schema by providing a set of relevant ITS rules in a single standalone ITS Rules document.

Where it says "How to implement this as a new feature", this section describes how to create new schemas or add new features to existing schemas. When doing this you may need to take into account the following:

Think twice before creating your own schema. Seriously consider using existing formats such as DITA, DocBook, Open Document Format, Office Open XML, XML User Interface Language, Universal Business Language, etc. Those formats have many useful insights already built in.

Check carefully whether an existing format comes with a built-in capability for modification. DocBook and DITA, for example, come with their own set of features for adapting their format to special needs.

The modification mechanisms available will depend on the schema language (DTD, XML Schema, RELAX NG, etc.) For example, namespace-based modularization of schemas is difficult to achieve with DTDs.

NVDL is an example of a meta-schema language was designed especially
to allow integration of several existing vocabularies into a single XML
vocabulary without the need to know the details of source schemas. This means
that with NVDL you can usually create a schema for compound documents
more easily than with other schema technologies.

Each schema language provides different ways of extending or modifying existing schemas. Some examples are the include, import or redefine mechanisms in XML Schema.

Some processors do not implement support for all schema language constructs, due to erroneous implementations or differences in conformance profiles (e.g. see the conformance requirements to XML Schema
part 1). Therefore a schema which works in one environment may not work in a different one.

What is possible also depends on the features of the schema which the modification is targeting. For example:

An XML Schema redefine is only possible if the modified schema has been created with named types.

Note: The considerations above are only a portion of what you need to take into account. You need to know a lot more when diving into schema modularization. The following provides some good additional reading: TODO: point to references.

Some XML documents may be designed to store data without natural language content. In these cases, there is no need for the xml:lang attribute.

The scope of the xml:lang attribute applies to both the attributes and the content of the element where it appears, therefore one cannot specify different languages for an attribute and the element content. ITS does not provide a remedy for this. Instead, it is recommended that you avoid translatable attributes.

Make sure that the definition of the xml:lang attribute allows for empty values. That is:

In a DTD you must not use NMTOKEN as the data type, instead use CDATA.

In XML Schema the built-in data type language does not allow empty values. However, the declaration for xml:lang in the XML Schema document for the XML namespace at http://www.w3.org/2001/xml.xsd does allow for empty values and therefore can be used.

It is not recommendedto use your own attribute or element to specify the language of the content. The xml:langattribute is supported by various XML technologies such as XPath and XSLT (e.g. the lang()function). Using something different would diminish the interoperability of your documents and reduce your ability to take advantage of some XML applications.

Note: If you need to specify language as data or meta-data about something external to the document, do it with an attribute different from xml:lang. For more information see the article xml:lang in XML document schemas.

Example 1: Language information not applicable to the content of the element where it is used

In XHTML the language of a file linked with the a element is indicated with a hreflang attribute because it does not apply to the content of the a element.

If you are working with an existing schema where there is a way to specify content language that uses something other than the xml:lang attribute (but still uses the same values as xml:lang), you should provide an ITS Rules documentwhere you use the its:langRule element to specify what attribute or element is used instead of xml:lang. This can be done in the ITS rules elements in the head of a document, if your format supports that, or in a separate document.

Example 2: Dealing with a non-standard way of declaring language information

In this document the langcode element is used to specify the language of the text element. The langcode element has no inheritance behavior equivalent to the one of xml:lang.

Information about the language of content can be very important for correctly rendering or styling text in some scripts, applying spell-checkers during content authoring, appropriate selection of voice for text-to-speech systems, script-based processing, and numerous other reasons. You must provide a standard way to specify the language for the document as a whole, but also for parts of the document where the language changes.

Provide a way for authors to specify the direction of text using ITS markup, or document equivalent legacy markup in an ITS Rules
document.text.

In scripts such as Arabic and Hebrew characters may run from both left to right and right to left when displayed. Directional markup allows you to manage the flow of characters. For an example of how directional markup is used see Creating (X)HTML Pages in Arabic & Hebrew.

Make sure the its:dir attribute is defined for the root element of your document, and for anyall elements whose content rendering is affected by directionality Maybe this should say "all elements which element that has text content.content".

Note: This example shows the directionality of the source text correctly. This is to ensure that you understand the concepts being described. For such display, you need a sophisticated editor that resolves directionality of the source text correctly. Many editors are not yet this sophisticated. See the related discussion about Problems with bidirectional source text in [Bidi in X/HTML].

Generally the Unicode bidirectional algorithm will produce the correct ordering of mixed directionality text in scripts such as Arabic and Hebrew. Sometimes, however, additional help is needed. For instance, in the sentence of Example 4 the 'W3C' and the comma should appear to the left side of the quotation. This cannot be achieved using the bidirectional algorithm alone.

Example 4: Sentence where bidirectional markup is needed for a proper display

The following will displayis incorrectly, since no directional markup has been used:

The title says "פעילות הבינאום, W3C" in Hebrew.

The text 'W3C' and the comma should be to the left of the quotedthis Hebrew text.(assuming If your browser supports bidirectional display, the following should appear correctly, since directional markup has been added to the element surrounding the quote:display):

The title says "פעילות הבינאום, W3C" in Hebrew.

The desired effect can be achieved using Unicode control characters, but this is not recommended (See Unicode in XML and other Markup Languages[Unicode in XML]). Markup is needed to establish the default directionality of a document, and to change that where appropriate by creating nested embedding levels.

Markup is also occasionally needed to disable the effects of the bidirectional algorithm for a specified range of text.

Note: In many cases, using translatable elementtext from content instead of translatable attributes will result in one sentence being embedded within another one. For instance, in Example 5 the description of the image will be embedded inside the text of the paragraph that contains it. In such cases, do not forget to declare the relevant element (here image) as 'nested', as described in Best Practice 1: Providing
information related to text segmentationinformation.

There are a number of issues related to storing translatable text in attribute values. Some of them are:

The language identification mechanism (i.e. xml:lang) applies to both the content and to the attribute values of the element where it is declared. If the text of an attribute is in a different language than the text of the element content, one cannot set the language for both correctly.

It may be necessary to apply some language-related properties, such as directionality and language identification, to only part of the text in an attribute value. This requires the use of a span-like element, but elements cannot be used within an attribute value.

It is difficult to apply meta-information, such as no-translate flags, author's notes, etc., to the text of an attribute value

The difficulty of attaching unique identifiers to translatable attribute text makes it more complicated to use ID-based leveraging tools.

It can be problematic to prepare translatable attributes for localization because they can occur within the content of a translatable element, breaking it into different parts, and possibly altering the sentence structure.

All these potential problems are less likely to occur when the text is the content of an element rather than the value of an attribute.

Note: Where appropriate, allow for the language of contentan element to be is given as xml:lang="zxx", where zxx indicates content that is not in a language, theand therefore is most likely not translatable.
If you are working with a schema where there are translatable element in(something question is probably notrecommended), to be translated. You shouldits:translateRule provide a rule for this.

Example 6: Document where default ITS "Translate" rules do not apply

In the following document, the content of the head element should not be translated, and the value of the alt attribute should be translated. In addition, the content of the del element should not be translated.

By default, ITS assumes that the content of all elements is translatable and that all attributes have non-translatable values. If your XML document type does not correspond to this default assumption it is important to indicate what are the exceptions. Doing so can significantly improve translation throughput.

It is also recommended that you define the its:rules element in your schema, for example in a header if there is one, and within that the its:translateRule element. Content authors can then use these elements to globally change the default translate rules for specific elements and attributes.

For example, DITA offers a translate attribute, and Glade provides a translatable attribute. Both have the same semantics as its:translate, ie. the translation information applies to element content, including child elements, but excluding attribute values.

Example 7: DITA translation information

The following rules indicate how to associate the DITA translate attribute with the ITS Translate data category. The order in which the rules are listed is important:

First translateRule:1: Indicates that the content of any element with a translate attribute set to no is not translatable.

Second translateRule:2: Indicates that any attribute value of any element with a translate attribute set to no is not translatable. This is needed because some attributes are translatable in DITA and we need to make sure they are not translated when translate="no" is used in the elements where they are.

Third translateRule:3: Indicates that the content of any element with a translate attribute set to yes is translatable. This takes care of the cases where translate="yes" is used to override a prior translate="no".

Document in an ITS Rules document how elements should be handled with regard to segmentation.

Segmentation refers to how text is broken down,down, from a linguistic viewpoint,viewpoint, into units that can be handled by processes such as translation.translation. Some element boundariess may not correspond to it, and this information needs to be provided.

WhetherThis is relevant you are creating a new schema or documenting legacy markup, providemarkup.Provide an ITS Rules document where you use its:withinTextRule elements to indicate which elements should be treated as either part of their parents,parents, or as a nested but independent run of text. By default,default, element boundaries are assumed to be non-nested independent run of textcorrespond to segmentation boundaries.boundaries.

Example 8: A DITA document with formatting and footnote elements.

In the following DITA document:

The elements term and b should be treated as part of their parent.

The element fn should be treated as an an nested and independent run of text.

<concept id="myConcept" xml:lang="en-us">
<title>Types of horse</title>
<conbody>
<ol>
<li>Palouse horse:<p><term>Palouse horses</term><fn>A palouse horse
is the same as an <b>Appaloosa</b>.</fn> have spotted coats.
The <term>Nez-Perce</term> Indians have been key in breeding this
type of horse.</p></li>
</ol>
</conbody>
</concept>

Many applications that process content for linguistic-related tasks need to be able to perform a basic segmentation of the text content. They need to be able to do this without knowing the semantics of the elements.

While in many cases it is possible to detect mixed content automatically,automatically, there are some occurrencessituations where the structure of an element makes it impossible for tools to know for sure how to treat textwhere appropriate segmentation boundaries fall.fall. For example, the li element in XHTML can contain text as well as p elements. I don't think this example, as expressed here, clarifies much. For example, the boundaries of some inline elements, such as emphasis, do not typically correspond to segmentation boundaries; on the other hand, some inline elements embedded in a parent element, such as footnotes or quotations, may define segments that should be handled separately from the text in which they are embedded.

Intelligent segmentation is particularly important in translation to successfully match source text against translation-memory databases.

Make sure the its:ruby element and its childrenis are defined for all elements where there is text content.

HandlingIt is also recommended to define the its:rules element in your schema, markup not in a header if there is one. The its:rules element provides access to the its:rubyRule element which can be used to associate ruby information with elements and attributes globally.
TODO: Ask Felix to write the ITSparagraph about conformance!
How to handle legacy namespace

Ruby is a type of annotation for text. It can betypically used with any language, but is very commonly used with East Asian scripts to provide phonetic transcriptions of characters that are likely to be unfamiliar to abe familiar reader. For example it is widely used in educational materials and children’s texts. It is also occasionally used to convey information about meaning.

Because ruby annotation may be needed when localizing into Japanese or Chinese, it is a good idea to make provision for it in your schema,it, even if your original documents are to be developed into a language that does not use such markup.

Example 10: An illustration of how an author could point to localization notes with its:locNoteRef

The its:locNoteelement specifies that the message with the identifier NotFoundhas a corresponding explanation note in an external HTML file. The URI for the exact location of the note is stored in the its:locNoteRefattribute.

It is also recommended that you define the its:rules element in your schema, for example in a header if there is one, and within that theThe its:locNoteRule element and its related markup. Content authors can use this markupaccess to specify localization-related notes. Within the its:locNoteRuleelement, notes can be storedused to in the its:locNotenotes element.

Example 11: An illustration of how an author could store localization notes in its:locNoteRule

The its:locNoteRuleelement associates the content of the its:locNoteelement with the message that has the identifier 'DisableInfo', and flagsHow it as important. This would also work if the rule was in an external file, allowing content authors to provide notes without modifying the source document.

Note: The example includes its:translate="no"in the its:rulestag, to prevent translators from attempting to translate the notes themselves.

Storing notes as element content has advantages over storing notes as its:locNoteattribute values: markup for such things as languageand directionalitycan be associated with the text of the content of an element, or parts of the text when a span-like elementis also available, but you cannot do these things with attribute text.

Storing notes in an its:locNoteelement can therefore offer these advantages as long as there is a mechanism to associate the notes with the relevant content. On the other hand, it can be easier to scan documents, in some cases, if the note text is stored in elements or attributes alongside the content it refers to.

Although ITS provides the its:locNoteattribute to store note text, offering the possibility of closely associating the note with the relevant content, using this approach makes it difficult to annotate the notes themselves for language, directionality, etc.

It can be argued that notes, being metadata, have different requirements to the content itself. Schema developers should carefully consider which approach to use. If all notes will always be written by English-speaking content developers, it may be acceptable to use attribute values, but if notes may be written by content developers in Arabic or Hebrew, they are almost certainly going to want to use directional markup and span elements in the notes themselves, so an element-based approach would almost certainly be better.

Handlinglegacy markup not in the ITS namespace

If you are working with an existing schema where there is a way to provide notes to the localizers that is not implemented using ITS, you should provide an ITS Rules document where you use the its:locNoteRule element to associate your notes markup with its equivalent in ITS.

Example 12: Document with custom localization notes

In this document the comment element is a note for its sibling text element.

<messages>
<msg id="ERR_NOFILE">
<text>The file '{0}' could not be found.</text>
<comment>The variable {0} is the name of a file.</comment>
</msg>
</messages>

To assist the translator to achieve a correct translation, authors may need to provide information about the text that they have written. For example, the author may want to do the following:

Tell the translator how to translate part of the content (e.g. "Leave text in uppercase").

Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used on the UI.

Clarify ambiguity and show relationships between items sufficiently to allow correct translation (e.g. in many languages it is impossible to translate the word 'enabled' in isolation without knowing the gender, number and case of the thing it refers to.)

Explain why text is not to be translated, point to text reuse, or describe the use of conditional text.

Provide a way for authors to assign unique identifiers to localizable elements.

How to do this

Make sure that elements with translatable content can be associated with a unique identifier.

It is strongly recommended that you define such identifiers as attributes of type ID, following the rules described in xml:id Version 1.0[xml:id]. This allows XML applications to take advantage of the built-in processes associated with that datatype, such as validation.

It is also recommended that you name such attributes xml:id to increase interoperability.

Note: UniqueUsing identifiers are mostthat useful when their values are globally unique (i.e. unique across any documents) and persistent (i.e. ones which do not change over time).time) often provides additional benefits.

Why do this

In order to most effectively reuse translated text where content is reused (for example across updates) it is necessary to have a unique and persistent identifier associated with the element.

This identifier allows the translation tools to correctly track an item from one version or location to the next. After ensuring that this is the same item, the content can be examined for changes, and if no change has taken place the potential for reuse of the previous translation is very high.

Change analysis of this kind constitutes an extremely powerful productivity tool for translation when compared to the typical source matching techniques (a.k.a. translation memory). These techniques simply look for similar source text in a multilingual database without, most of the time, being able to tell whether the context of its use is the same.

Identifiers can also be helpful to track displayed text back to its underlying source. For example, when reviewing a translated user interface, the identifiers can be used as temporary prefixes to the text so that any correction can be efficiently done on the proper strings.

The capability of specifying terms within the source content is important for terminology management and beneficial to translation and localization quality. For example, term identification facilitates the creation of glossaries and allows the validation of terminology usage in the source and translated documents.

Term identification is also useful for change management and to ensure source language quality.

Terms may require various associated information, such as part of speech, gender, number, term types, definitions, notes on usage, etc. To avoid associated information to be repeated throughout a document, it should be possible for identified terms to link to externalized attribute data, such as glossary documents and terminology database.

It is also recommended to define the its:rules element in your schema, for example in a header if there is one. The its:rules element provides access to the its:termRule element which can be used to override terminology-related information globally.

Avoid document formats that store multiple localized versions of content within the same document.documents.

ThisThe type of multilingual best practice refers specifically to situations where copies of the same content are stored in multiple languages in a single document. It is perfectly acceptable to have multilingual text in a document otherwise.

How to do this

For documents that need to go through some localization tasks, always store the localized version of the text inper a separate document.

Example 14: Avoiding multilingual documents

This is an example of bad design. It shows a single document that contains multiple translations of the same content:

Note: It is admissible to store multilingual copies of a content in a single document before the document to send to localization, or after all localization tasks are done. For example, a final resource file could be constructed by collating the different language entries.

Note: It is admissible to provide the localizer with multilingual documents in XML formats that are specifically designed for localization, and are industry standards, like the XML Localisation Interchange File Format[XLIFF 1.2].

Why do this

There are two main reasons to avoid sending multilingual documents for localizationlocalization:
During localization, if the source material is located in parallelthe same document with the different translations inshould be the same document:

Itwill is difficult to manage concurrent translations in all languages. ItEach translation is very likely that each translation will be done by a different translator, in a different location. To facilitate this, the document will have to be broken down into separate parts and reconstructed later on. This addswill add processing time, increases cost and provides more opportunities for the introduction ofintroducing errors.

DependingAlso, depending on the point inits the document'slive lifecycle, such a document may already contain translations, some up-to-date and some outdated (because the source material may have changed). In order to be able to identify what parts need to be localized and what parts should be left alone, the document would then also need to contain custom information about localization state, which may or may not be supported by localization tools.

The name of an element should indicate what its function is, not how its content will be presented, because presentation may vary depending on different factors such as language, script, medium,media, or accessibility.

Using documents where elements or attributes do not follow a predictable naming pattern may cause problems when using XSLT-driven processes. It may also be an issue for translation tools. This is especially true if not all parts of the document are to be translated, since it would be more difficult tothe specify rules to distinguish the translatable nodes from the non-translatable ones.ones would be more difficult to specify.

Provide a way for authors to annotate arbitrary content using its:span or equivalent markup.content.

A span-like element is an element that can be used to mark up an arbitrary content and associate it with various properties such as directionality or language information. Examples of such an element include the span element in XHTML, or the phrase element in DocBook.

How to do this

Make sure you define a span-like element in your schema that will allow the authors to associate arbitrarya delimited content with language-oriented properties such as directionality, language information, etc.information.

If your schema does not already provide such an element, you could provide the its:spanelement.

element The definition of the its:spanany element in the ITS Specification lists a set of ITS attributes that should be allowed on a span-like element.text.

Why do this

Some properties of contenta text are applied using attributes. Directionality,attributes, terminology, localization notes, translate information, and language identification are examples of such properties. There is a need for a neutral element to delimit the run of text to which such attributes apply, since the appropriate boundaries are sometimes not delimited by other markup thatnotes, is present,translatability, or perhaps those attributes are not permitted on other markup that is present.properties.

Although some XML vocabularies are easy to understand or process, it is often helpful or necessary to provide explicit information about a given vocabulary. If such a vocabulary is to be used in a multilingual context, it is of high importance to provide information, specific information such as which elements contain translatable content,content. This is needed because general information on purpose, general structure, and node types very often are not sufficient. In a way, this need for explicit information is related to the general good practice of documenting source code.

In XML it should come naturally to use a well-defined, structured format to capture such information. ForWith regard to information related to internationalization and translation, ITS Rules documents are a good choice for the following reasons:

They are designed to take intocover account many important aspects ofrelated to internationalization and translation.

They can be easily be combined with additional structured information (e.g. related to version control, as shown in the example below).

Example 17: ITS rules embedded in a customized information file

An ITS processor should still be able to processhow a file as an external ITS rules file if the format of the file contains your own customized information in addition to the ITS rules. The following is an example of that.information.

By default the text directionality in an XML document is assumed to be left-to-right. Use its:dir (or its equivalent in your schema) on the rooteach element of any documentwhich where the text runsdirectionality is predominantly from right-to-left, and on elements where the Unicode bidirectional algorithm needs help to achieve proper display of bidirectional text.parent.

Use uniquexml:id identifiers in the way provided byin your schema on each element that constitutes a segmentation boundary. If possible use globally unique and persistent values as identifier values.identifiers.

Use inserted text only when the text is self-contained and does not affectof its surrounding context. For example, titles and quotations are inserted text that, usually, would not cause problems. Avoid using inserted text that has any dependence on the context where it is inserted.

Use xml:lang (or its equivalent in your schema) on the root element of the document, and on each element where the language of the content changes. The elements without declarations inherit the language information from their parents. AttributeThe attributes values are deemed to be in the same language as the element where they are declared.

In this example, the schema for this document type uses a non-standard way to specify language: a code attribute. AuthorsThe author should use that mechanism, not xml:lang,. This is possible because the developer of the stringList document type should provide, along with the schema, an ITS Rules document (shown below) that declareswhere codetois declared as be equivalent to xml:langwhen used withfor the lang element.

By default the text directionality in an XML document is assumed to be left-to-right. Use its:dir (or its equivalent in your schema) on the rooteach element of any document where the text runs predominantly from right-to-left, and on elements where the Unicode bidirectional algorithm needs help to achieve proper display of bidirectional text.changes.

Without the markup,also the Hebrew title will display incorrectly. The text 'W3C' and the comma willmore be to the right of the quoted Hebrew text, rather than to its left. The markup provides the contextual information that tells the user agent that the comma and 'W3C' text are part of a right-to-left flow of text.

Note: This example shows the directionality of the source text correctly. This is to ensure that you understand the concepts being described. For such display, you need a sophisticated editor that resolves directionality of the source text correctly. Many editors are not yet this sophisticated. See the related discussion about Problems with bidirectional source textin [Bidi in X/HTML].display.

You also need to use dedicated markup to apply directional information, rather than just applying CSS direction properties to ordinary elements. See CSS vs. markup for bidi supportfor further information.

Why do this

User agents should use the Unicode Bidirectional (bidi) Algorithm and its knowledge of the directional properties of characters to decide whether a sequence of characters should flow to the left or to the right. The bidi algorithm can also handle simple cases where right-to-left and left-to-right text are mixed. However,markup situations commonly arise where higher level contextual information is needed to achieve the desired layout ofbidirectional text. This contextual information can be provided by markup in XML. Such markup also affects page layout behavior. For example, in a right-to-left context, table columns are ordered right-to-left, list bullets appear to the right of text, the page is right-aligned, and so forth.scripts.

There is not necessarily a one-to-one match between a given language and what directionality to use. For example, Azerbaijani can be written using both right-to-left and left-to-right scripts, and the language code az is relevant for either.

The values of inline directional markup are not necessarily aligned with the values of markup about the language. For example, a part of a document might be declared as having right-to-left directionality, but there might be only a general language declaration for a left-to-right script language available, like fr.

Markup used to indicate directionality has values that indicate that the normal directionality should be overridden; it is not possible to indicate that using language related values.

CSS should not be used to define the semantics of elements.
In XML documents, using markup is more appropriate than using Unicode Bidi Embedding Controls.

The ITS default is that element content should be translated and attribute content should not. Developers of your schema should also have documented any schema-specific defaultsfor your document type where these differ from the ITS default.

How to do this

Use its:translate (or its equivalent in your schema) on each element for which the translatability property is different from the defaults set for your schema.

In the following document, although the content of the par elements should normally be translated, in this instance the last par should remain in English. Using its:translate the author can indicate thatset the last parparagraph should not be translated.translatable.

Note that the author does not need to specify that the head element should not be translated,translatable because this is a setting defined for all documents of type myDoc by the ITS Rules document provided by the developer ofalong with the myDocschema (see just below).schema.

<myDoc xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<head>
<lastRev>2007-10-23 041254Z</lastRev>
<docID>1A454AE4-7EB8-4ed2-A58E-1EC7F75BB0D5</docID>
</head>
<par>To apply these terms to you library, attach the following notice.
It is safest to attach it to the start of each source file to most
effectively convey the exclusion of warranty; and each file should
have at least the "copyright" line and a pointer to where the full
notice is found.</par>
<par>The notice should read (preferably in English):</par>
<par its:translate="no">This library is free software; you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation;
either version 2.1 of the License, or (at your option) any later
version. This software is distributed as open source under LGPL.</par>
</myDoc>

First translateRule:1: The headelement and its children should not be translated.translatable.

Second translateRule:2: The alt attribute of any img element should be translated.translatable

[Example's source code]

To override translate information for attributes, you have to use an its:translateRule element in yourthe given document.

Example 22: Overriding default translation rules for attributes

This document is of the same type as the one in Example 21 and uses the same ITS rules, therefore the alt attribute should normally be translated. Because in this specific document the images refer to a user interface that will not be translated (whereas the document will be), the author needs to override the rule that allmade any altattributes should be translated.translatable. This is done at the top of the document, using a its:translateRule.

The its:translateRuleelement says that the alt text of images referring to UI buttons in the document should be left untranslated.

Note: Authors should NOT use its:translate to tag single words or terms that (they think) are likelyshould to remain the same as the source language when translated into a given target language (e.g. loan-words). This type of decision is normally made during translation.

This is an example of bad design. In this document its:translate is used to markup a proper name and two loan words in an attempt to indicate that they should not be translated. You should notNOT do this.

<book xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<body>
<p>Everything started when <span its:translate="no">Zebulon</span>
discovered that he had a <span its:translate="no">doppelgänger</span>
who was a serious baseball <span its:translate="no">aficionado</span>.</p>
</body>
</book>

ItOne thing that may, however, be useful toin helping the translator in this example, would possibly be to mark up loan-words or any special words in this example as terms, as described in thefollowing section Best Practice 1: Identifying terms.

Why do this

AlthoughWhile any exception to the set of ITS rules provided with the schema level should specify any exceptions to the default ITS translation rules for a given schema (see Best Practice 1: Indicating which elements and attributes should be translated), there are cases where these general rules need to be overridden for specific elements, in specific documents. It is up to the author of the content to indicate these cases using markup.

Avoid using CDATA sections for contentin that will be translated.content.

CDATA sections are often used to place programming code or other special
vocabularies in XML with minimal effort. There are often better ways of including such content.

How to do this

Do not put content that will be translated into CDATA sections.content.

Example 24: Avoiding the use of CDATA sections

This is an example of bad design. Inin this document, part of the content is in a CDATA section. It is no longer possible to mark up that content for language changes, terms, text direction, translate information, or anywithin of the other things that may be needed to facilitate localization. section.

<myData>
<item course="12" page="2">
<title>Accessing the R&amp;D facilities</title>
<body><![CDATA[The R&D facilities are located in the South wing
of Building 12-W, in the East quarter of the section Q.
IMPORTANT ==> These facilities are accessible only to personal with
Class Omega-45Q1 clearance.]]></body>
</item>
</myData>

Instead, use a normal XML for your content. This allows you to tag the content as needed. For instance, here the author has addedadd some terminology markup.

<myData xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<item course="12" page="2">
<title>Accessing the R&amp;D facilities</title>
<body>The R&amp;D facilities are located in the South wing
of Building 12-W, in the East quarter of the section Q.
IMPORTANT ==&gt; These facilities are accessible only to personal with
<span its:term="yes">Class Omega-45-Q1</span> clearance.</body>
</item>
</myData>

Instead, you could use XInclude to store the example code in a separate file and include it during at processing time. Note that you have to use parse="text" to treat the included file as plain text rather than markup.

Note: Using CDATA doeshas no not affect whether white-space is preserved or not by XML processors. To preserve white-space use the xml:space attribute with the value preserve.

Why do this

The use of CDATA sections prevents the insertion of markup for internationalization or localization purposes. For example, tags to denote change of directionality, or language, or to add localization notes, cannot be used with the content inside CDATA sections.CDATA.

Numeric character references and entity references are not supported in CDATA sections either. Thiswhich could lead to a possible loss of data if the document is converted from one encoding to another, or when translating.

Mixing content in CDATA sections and content not in CDATA sections in the same document causes more work when doing some tasks with non-XML-aware tools. For example, when searching for the text "R&D" the user has to search both for R&D (for the CDATA sections) and R&amp;D (for the normal content).

In this document two ITS local attributes are used to annotate an XSLT template:

its:locNoteRef is used to point to an explanation of the acronym RFID.

its:locNote is used to indicate what kind of value the element <xsl:value-of select="PNum"/> corresponds to.

Note: When working with XSLT, you need to decide whether the ITS markup should be in the output or not, and may have to use different markupmark up accordingly. In this example, the ITS attributes do not appear in the output.

There are many reasons to provide information to localizers. You may want to:

Expand on the meaning or contextual usage of a particular element, such as what a variable refers to or how a string will be used in the user interface.

Clarify ambiguity and show relationships between items sufficiently to allow correct translation. For example, in many languages it is impossible to translate the word "enabled" in isolation without knowing the gender, number and case of the thing it refers to.

Explain why text is not translated, point to text reuse, or describe the use of conditional text.

In this example, in the first message, the element var is used to insert the name of a printer. In the second example, it is used to insert a filename. The its:locNote attribute is utilized to provide a description of what the variables represent. This may help in deciding how to translate each message.

Types of inserted text are for example:
Boilerplate text reused in different contexts.
Various parts of a compound sentence.
Variables values replaced by their values during the document processing.
The implementation of such text can be done in different ways in XML. Some of them are:
Using entity references.
Using XSLT processing.
Using XInclude mechanisms.
Using XLink mechanisms.
Using a custom mechanism specific to a given format (e.g. the conref attribute in ).

If not used properly, inserted text can cause important (and sometimes unresolvable) problems during localization. Consider the following:

Example 29: Using conref in DITA

This is an example of bad design. In this example, the author, working with the DITA format [DITA 1.0], decided to reference a term in a termbase by using the conref mechanism. In this case, the term t123 in termbase.xml has the value 'hydraulic"hydraulic lift'.lift".

<p>Using a <term conref="termbase.xml#t123"/>, raise the vehicle from the ground.</p>

At a first glance the examplethis above seems to work fine in English. However, such a construction has several problems:

You should not separate the article from the noun. If "hydraulic lift" is independently replaced in the future and replaced by some other term, you may need to change thean article to 'an'instead or removeof it.

The article/noun separation also causes trouble for the translators. Without any easy way to see the actual term when translating the paragraph, they may not be able to decide the gender or number of the article.

If it is used at the beginning of a different sentence, the term would need to be capitalized.

The term is singular in the termbase, but it may need to be plural somewhere else in the document.

In inflected languages the form required in the text may be different from the form stored in the termbase. For example, in Polish the term would be stored in its nominative form ("dźwignia"dźwignia hydrauliczna"hydrauliczna"), while it should be in its instrumental form once inserted in this context: "Używając"Używając dźwignię hydrauliczną podnieś pojazd z ziemi."ziemi."

What constitutes a term depends on many factors specific to each organization and project. Terms may include for example names of features, programs, services, and so forth. They also may include words or expressions that are specific to the domain to which the content pertains, such as technical terms, or legal terms, and they may include terms that simply occur often and should be translated consistently.

How to do this

Use its:term and its:termInfoRef (or their equivalent in your schema) to mark terms and supply term-related information.

However, in this specific document, the author wants to indicate the following:

The content of any ui element should be seen as a term.

The text Vector Files in the title is a term.

In the first case, the author uses a its:termRule element in the header of the document to indicate that any ui element in this document is a term. This is more efficient than adding an attribute for each instance of ui in the body of the document.

In the second case, because the schema does not allow the element term to be used in title (an oversight of the developer), the author uses a simple span element with its:term and its:termInfoRef to associate Vector Files with its corresponding term information.

This ITS Rules document is the one created by the developer of the myManual document type (in implementing Best Practice 1: Identifying terminology-related elements). It provides one termRuleis elementprovided:
Rule indicating that any term element is a term and its associated information is located in the element that is identified with the value stored in the ref attribute of term.

If you do not indicate what words are terms of interest in the content, the translators will not know that these terms need to be translated consistently. Often, multiple translators are working on different files in a given project, and the way they choose to translate specific words can be inconsistent with the way that other translators have translated them. If important terms are marked in the content, they can extract these terms from the content before the content is translated, and pre-translate them in the form of a shared electronic dictionary. This ensures consistency of translation of important terms.

While markup denoting terms for a given schema level should be specified in a set of ITS rules provided with the schema (See Best Practice 1: Identifying terminology-related elements), there are cases where these general rules need to be overridden or complemented for specific elements, in specific documents. It is up to the author of the content to provide such overriding markup.

In this document, the elements top and body both contain HTML markup coded as text. There is no easy way to make the distinction between the HTML markup and the HTML text content.

<pages>
<row>
<key>ENConvClasses</key>
<top>&lt;span class="h1"&gt;Elibur Library&lt;/span&gt; - Conversation Groups</top>
<body><![CDATA[<p>These small discussion groups meet <b>weekly</b> and are for
people learning English. Each group is led by a volunteer who is a native speaker
of American English. Groups converse about books, articles, and other materials.</p>
<p>Space is limited. Ask for availability to <a href="mailto:enconv@elibur-lib.com">
enconv@elibur-lib.com</a>.</p>]]></body>
</row>
</pages>

Instead, use the XML namespace mechanism. Here the content of top and body is now a mix of text and XHTML elements. This avoid any confusion between text and HTML tags.

<pages xmlns:h="http://www.w3.org/1999/xhtml">
<row>
<key>ENConvClasses</key>
<top><h:span class="h1">Elibur Library</h:span> - Conversation Groups</top>
<body><h:p>These small discussion groups meet <h:b>weekly</h:b> and are for
people learning English. Each group is led by a volunteer who is a native
speaker of American English. Groups converse about books, articles, and
other materials.</h:p>
<h:p>Space is limited. Ask for availability to <h:a
href="mailto:enconv@elibur-lib.com">enconv@elibur-lib.com</h:a>.</h:p></body>
</row>
</pages>

Another alternative to using markup as text is to store it externally and include it into the document using a mechanism such as XInclude or XLink.

If you must include markup as text content:

Make sure to document the type of content, for example with an attribute set to the appropriate MIME-type. This may help tools to use a more appropriate parser to process the given content.

Aim at having the content well-formed. This will allow parsers to process it more easily.

Why do this

Some XML documents are used to store different types of data for purposes such as exchange or export. In some cases such data is itself XML data. For example, some XHTML content stored in a database can be exported to an XML container file for localization and re-imported back into the database.

Note: The use of escaping for literal examples of markup is is not a problem. The issue is only for large volume of XML/HTML data contained in another XML document.

Storing such XML data inside XML elements as textcontent (i.e. with its markup tags escaped), has several drawbacks:

Any handling of such content is made difficult by the impossibility to separate text from markup without extra processing.

If there is a process turning markup into escaping, there is the danger of double escaping.

4 Generic Techniques

This section provides a set of generic techniques that are applicable to various guidelines; for example, how to add ITS attributes to different types of schemas, or how to optimize XPath expressions for the ITS selector attribute.

4.1 Writing ITS Rules

Whether they are external or embedded, there are a few things you should take into consideration when writing ITS rules.

Try to keep the number of nodes to be overridden to a minimum. This improves performance. For example, if most of a document should not be translated, it is better to set the root element to be non-translatable than to set all elements. The inheritance mechanism will have the same effect for a much lower computing cost.

Because a rule has precedence over the ones before, you shouldwant to start with the most general rules first and progressively override them as needed. Some rules may be more complex if they need to take into account all the aspects of inheritance.

Nextspecific. are globaltwo rules, for example a set of its:translateRulea elements fordocument, the ITSlast Translate data category.wins.
Be Individual rules in an its:ruleselement have an inherent precedence which depends on their positionof in the its:rulesinheritance element: the rules at the bottom havecategory, a higher precedence thantable rules atsummarizes the top. In addition, the rules inside a given its:rulescategory.
Remember element havealso a higher precedence than the rules linked via an xlink:hrefFor attribute in that same its:ruleselement.example:

InheritedIn ITS informationdocument, constitutes the third level ofrule precedence. The kind of inheritance is data category specific. For example, ifp an element has been labelled as "do not translate" using one of the means described via 1) or2) above, this informationis inherited by its child elements, but not by attributes.

ITS information which originates inof data category specific defaults is the one with the lowest precedence. For example, thetherefore default forkeeps the ITS Translateoriginal data categoryof is that element content is to be translated and attribute values are not towill be translated.

The following example shows the usage of local and global ITS markup andselector="/doc", how the precedence described above comes into play.

Example 32: Precedence and inheritance in ITS

In this document, alleach child elements within the <text>element are set as to 'do notwhen translate' by the first its:translateRulerule element. However, the second and last its:translateRulealso element has higher precedenceto than the one before, so it canbe used to describe an exception: all <p>elements are stillp. to be translated. This showsTherefore the interplay between different rulesis and demonstrates that the lastwith one alwaysbolded "wins".

Another exception to the first its:translateRuleelement is expressed withget the local its:translateeffect attribute on the <notes>element. Itby specifies thatchanging the content of this element should be translated. Without the its:translatesecond attribute, the information fromof the first its:translateRuleelement would be inherited,rule, and this <notes>element would not beexplicitly translatable.

Finally,selecting the content of the <documentation>element within the <head>expression selector="//p/descendant-or-self::*".
In general element is also translatable, but not the content of any attributes in the document. This demonstrates the role of defaults for the ITSfaster Translate data category.

4.1.2 Dealing with namespaces

When writing rules for documents that use XML namespaces you must make sure that you declare the namespaces, and to use the relevant prefixes in the different XPath expressions.

Example 33: Applying ITS rules on a document containing namespaces

The first document uses several different XML vocabularies:

The host format is not associated with any namespace. Its elements have no prefix.

The "inventory-book" vocabulary is associated with the namespace http://www.example.com/inventory-book. The elements belonging to that namespace have a bkprefix.

The XHTML vocabulary is associated with the namespace http://www.w3.org/1999/xhtml. The elements belonging to that namespace ave a hprefix.

The XLink vocabulary is associated with the namespace http://www.w3.org/1999/xlink. There is one attribute belonging to that namespace and it has a xlinkprefix.

The ITS vocabulary is associated with the namespace http://www.w3.org/2005/11/its. There is one element belonging to that namespace and it has an itsprefix.

<inventory xmlns:bk="http://www.example.com/inventory-book"
xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:its="http://www.w3.org/2005/11/its">
<header>
<identity>3E039D7D-B416-47e8-83B3-3F4DF9EDDB87</identity>
<lastUpdate>2007-11-12</lastUpdate>
<desc>Inventory made by Joan, for shelves H to K only.</desc>
<its:rules version="1.0" xlink:href="EX-namespaces-2.xml" xlink:type="simple"/>
</header>
<list>
<bk:book xml:id="item00A83">
<bk:isbn>0312875819</bk:isbn>
<bk:quantity>2</bk:quantity>
<bk:type>HIST</bk:type>
<bk:author>Bradshaw, Gillian</bk:author>
<bk:pub>Forge Books; New Ed edition (June 2, 2001)</bk:pub>
<bk:title>The Sand-Reckoner</bk:title>
<bk:desc>
<h:p>Building on a few antique facts, Bradshaw ably recreates the extraordinary
life of Archimedes, the great mathematician and engineer who lived in Syracuse from
287 to 212 B.C. After a few years studying in Alexandria, Archimedes returns home
where his father is dying and his city at war with the Romans.
<h:img src="0312875819large.png" alt="The Sand-Reckoner (by Gillian Bradshaw)"/>
</h:p>
</bk:desc>
</bk:book>
</list>
</inventory>

The XLink and ITS namespaces are just used for associating this document with the external ITS rules file shown below.

The ITS Rules document contains several rules that determine what parts of the inventory document should be translated. The rules use XPath expressions where the elements are prefixed. These prefixes are associated with the namespaces used in the inventory. Here is a description of each its:translateRule, from top to bottom:

The first indicates that the inventoryelement should not be translated. This is inherited by all the children of inventory. Most of the content of the inventory is not to be translated, so the easiest way to define the proper rules for this type of document is to say that the root element should not be translated, and then list all the exceptions.

The second indicates that the descelement of the host format should be translated.

The third indicates that the titleof the http://www.example.com/inventory-booknamespace should be translated.

The fourth indicates that the descelement of the http://www.example.com/inventory-booknamespace should be translated.

The last indicates that the altattribute in the HTML imgelement should be translated.

In environments where XSLT is used to process ITS-related XPath expressions, it is important to know about the subset of XPath which is termed 'XSLT"XSLT patterns'patterns" (see the note in the section Global Approach of the ITS Specification). Using only XSLT patterns in ITS selector attributes helps to avoid issues which may arise with respect to the match attribute in XSLT template elements.

Declare xml:lang directly in your schema.
There is no existing declaration of, or standardized of
schema fragment defining, the xml:langattribute in RELAX NG.attribute. You have to declare
xml:lang directly in your schema and specify the choice of values to bebetween
language either the XML Schema language datatype or an empty value.

5 ITS Applied to Existing Formats

This section presents several examples of how ITS can be used to enhance the internationalization readiness of some well-known XML document types. These examples are only illustrative and may have to be adapted to fit the needs of each specific user.

Two topics are covered for each format:

How should ITS be integrated in specific markup schemas? For example, as for XHTML it promotesis helpful for the interoperability of ITS implementations if you specify that the ITS rules element will always be part of the content model of the head element.

How should ITS data categories be associated with existing markup declarations in a schema that have identical or overlapping purposes? For example, DITA[DITA 1.0] already has an attribute to indicate translatability of text, but doesn't have a global selectionmechanism for indicating what parts of an XML document the ITS translate data categorydocuments and its values should be applied to.schemas.

5.1 ITS and XHTML 1.0

XHTML [XHTML 1.0] is a reformulation of the three HTML 4 document types as applications of XML 1.0. HTML is an SGML (Standard Generalized Markup Language) application, widely regarded as the standard publishing language of the World Wide Web.

UseTo use either external ITS global rules, as shown in thebelow). following example. Even local information within the document that would be handled by ITS attributes can be set indirectly.

To use NVDL. See for details.

Example 39: ITS external rules for XHTML

These rules illustrate some of the ITS data categories you can associate with specific XHTML markup. The first its:translateRule indicates that the attribute content of the meta element should be translated if the attribute name is set to "keywords". The second its:translateRule indicates that no p with a class="notrans" should be translated. And the its:termRule indicates that any span element with class="term" is a term.

5.1.2 Using XHTML Modularization 1.1 for the Definition of ITS

This section describes how to use XHTML Modularization 1.1 [XHTMLMod1.1] for the definition of ITS. It first defines an ITS abstract module which is then implemented in the format of XML Schema format.Schema. The module is meant to be integrated in existing or new schemas which rely on XHTML Modularization 1.1..

5.1.2.1 Abstract Definition of ITS Markup

The following is the abstract definition of the elements for global ITS markup, which is consistent with the XHTML Modularization framework [XHTMLMod1.1]. Further definitions of XHTML abstract modules can be found in the XHTML Modularization specification [XHTMLMod1.1].

5.1.3 Using NVDL to integrate ITS into XHTML

As you have seen in the previous section it might sometimes be quite
laborious to integrate ITS into an existing vocabulary using only
modularization and the customization features of particular schema
language. In such situations you can use the NVDL schema language
instead.

In NVDL you can create a sort of "meta-schema" which defines how to
combine and provide additional rules foralready existing schemas. An NVDL schema can be used in the same
way as schemas written in other languages, such as DTDs,DTD, RELAX NG or XML
Schema. YouSchema—you can then use such a schema to validate your document
instances or so that an XML editor can guide you while you are editing
documents. The NVDL.org site provides
additional information about the language. You can also find there a list
of applications which supportare supporting the NVDL language.

Adding ITS to XHTML involvesconsist of allowing the its:rules element
inside the head element and allowing the ITS local
attributesto appear on every existing XHTML element.

5.1.4 Associating existing XHTML markup with ITS

A number of XHTML constructs implement the same semantics as some of the ITS data categories. In addition, some of the attributes in XHTML need to betranslatable, translated, which is not the default for XML documents according to the default translate settings in ITS. These attributes need to be identified as needing translation.translatable.

An external ITS rules element can summarize these relationships. Because XHTML use is widespread and covers a large amount of legacy material the rules defined here may not be optimal for everyone.

Note 1: The script and style elements may contain texthave that needs translation,text, but their content needs to be parsed with, respectively, a script filter and a CSS filter. Depending on the capabilities of your translation tools you may want to leave these elements as needing translation.translatable.

Note 2: The value attribute of the input element may or may not need translation, depending on the way the element is used. The decision as to whether the value of this attribute needs translationtranslatable or not will depend on its use in a given instance. Note, however, that it can oftenbe undesirable to translate these values, since they are commonly used by scripts as identifiers: change the value of the attribute and the script will often fail. The values of the valueattribute are not usually seen by a user of a web page.use.

Note 3: The del element indicates removed text and therefore, most often, its content would not be translated. Because ITS rules for elements are not inherited by attributes, and because this element may contain elements with attributes that need translation, such as img with an alt attribute, and because the scope of translatability does not include attributes, you need to: a) define this rule after defining how translation applies to attribute values,attributes, and b) use the rules such aswith selector="//h:del/descendant-or-self::*/@*" to override any possibility of translation being appliedtranslatable to an attribute within a del element or any of its descendants.

Note 4: The dt element is defined by HTML as a "definition term" and can therefore be seen as a candidate to be associated with the ITS Terminology data category. However, for historical reasons, this element has been used for many other purposes. Whether or not dtis associated with the ITS term data category will depend on its use in a given instance.use.

5.2 ITS and TEI

The Text Encoding Initiative[TEI] is intended for literary and linguistic material, and is most often used for digital editions of existing printed material. It is also suitable, however, for general purpose writing. The P5 release of the TEI consists of 23 modules which can be combined together as needed.

5.2.1 Integrating of ITS into TEI

The TEI is maintained as a single ODD document, and customizations of it are also written as ODD documents. ODD (One Document Does it all) is a literate programming language of the Text Encoding Initiative for writing XML schemas. These documents are processed using XSLT style sheets to make a tailored user-level schema in XML DTD, XML Schema or RELAX NG.

In addition, we load the ITS schema (in its RELAX NG XML format, the language used by the TEI for expressing content models), and overload the definition of the TEI content class model.headerPart to include the ITS rules:

The content class determines which elements are allowed as children of teiHeader. Lastly, we change the definition of the global attribute class att.global to reference the ITS local attributes (available from the ITS schema we loaded earlier):

Example 49: Addition of the ITS local attributes to the global attributes

5.3 ITS and XML Spec

The XML Spec format [XML Spec] is intended for W3C Working Drafts, Notes, Recommendations, and all other document types that fall under the category of technical reports. XML Spec is available in the following formats: XML DTD, XML Schema and RELAX NG.

5.3.1 Integration of ITS into XML Spec

TheITS has been integrated into xmlspec-i18n.dtd. This is a version of text belowXML takes version 2.10 of XML Specas an example and shows how you would integraterelated ITS into it. Thisversion is availabletranslate in DTD,xmlspec-i18n.dtd, XMLwhich Schemacan and RELAXused NGformats.

Theseparate integrationfiles ofi18n-extensions.mod ITS intoi18n-elements.mod. the XML Spec DTDuseswithin the filesW3C xmlspec-its.dtdInternationalization (theActivity XML Spec schema) and its.dtdtechnical (thereports.
For ITS schema). To achieve the integration,ITS, the following modifications to the XML Spec DTDxmlspec-i18n.dtd have been made:

External ITS definitions are integrated via theA new entity <!ENTITY % its SYSTEM "its.dtd"> and the entity call %its;. have been added to xmlspec-i18n.dtd.

The existing XML Spec entity %local.common.att; has been modified. It now includes theadditional declarations '%its.att.local.with-ns.attributes; and xmlns:its CDATA "http://www.w3.org/2005/11/its"its:termInfoRef. The formerdescription of implementation information for terminology provides more information allowsabout these the use of ITS localattributes its:translate, attributes, its:locNoteType and its:dir have not been added to the latterXML Spec DTD, since isthe DTD necessary to permit the usesame functionality already. Users of theXML Spec are encouraged to associate this ITS namespace in the DTD..

The XML Spec entity %header.mdl; contains the content model of the header element. The ITS its:rules element rules has been added as the last element to this content model. In this way, its:rules can be used inside an XML Spec document.

The ITS elements its:rubyheaderand its:spanhave been addedof to the XML Spec entity %p.pcd.mix;.has In this way it is possible to usebeen them as inline elements.

The integration of ITS into the XML Spec RELAX NG schemauses the files xmlspec-its.rncrules, (the XML Spec schema) and its.rnc(the ITS schema). The modifications to the RELAX NG schema have the same motivations like for the DTD described above. Theavoid modifications are:

External ITS definitions are integrated via the statement include "its.rnc".

The pattern its-att.local.with-ns.attributesis referenced from the pattern local.common.att.

The pattern its-rulesis referenced from the pattern header.mdl.

The patterns its-rubyand its-spanare referenced from the pattern p.pcd.mix.

The integrationimpact of ITS into the XML Spec XML Schema schemauses the files xmlspec-its.xsd(the XML Spec schema), its.xsd(the ITS schema), xml.xsd(for declarations from the XML namespace) and xlink.xsd(for declarations from the XLink namespace). The modifications to the XML Spec XML Schema schema have the same motivations like for the DTD described above. The modifications are:markup.

External ITS definitions are integrated via an <xs:import>statement.

The attribute group its-att.local.with-ns.attributesis added to the attribute groupITS common.att.

The element declarationruby its:ruleshas is added to the element group header.mdl.

The element declarationsentity its:rubyand its:spanare added to the element group p.pcd.mix%p.pcd.mix;.

The following example shows an XML Spec document conforming to the XML Spec+ITS schemas. The its:translateRuleit element is used to indicate that elements for code, keywords and examples should not be translated. The w3c-doctypeelement is also markedruby as non-translatable using local ITS markup.element.

5.3.2 Associating existing XML Spec markup with ITS

AAs mentioned number of XML Spec constructs implement the same semantics as some
ofinternationalization the ITS data categories. In addition, some of theoriginal XML Spec
attribute values need to be translated, whichthere is not theterm default for XML
documents according tofulfills the ITSsame default
settings for translatability. These attributes need to be
identifiedpurpose as needing translation, andterm some elements need to be identified as not needing translation.attribute.

Note: When you have the choice of using anexisting XML Spec construct or an ITS
constructmarkup to express the same semantics, make sure youmarkup, use the XML Spec
constructrules to ensure that XML Spec processing tools work properly. Usecreated.
Mapping ITS local
markup only ifto XML Spec does not providemarkup
[Example's an equivalent.code]

An external ITS its:rulesSpec element can
summarizeand these relationships. The rules defined here are just examples and mayXPath need further
tailoringwith for specific use.

5.4 ITS and DITA

5.4.1 Integration of ITS into DITA

DITA offers some of the ITS features by default (See Section 5.4.2: Associating existing DITA markup with ITS for more information on that).that aspect). But in In some cases, however, you may still want to allow the use of ITS markup directly in your DITA documents. For example, the its:locNote attribute, or the its:rules element. DITA provides a way to create a domain specialization based on the foreign element and attribute extension points.

For example, the DITA Concept DTD can be extended as follows:

First, createby creating two files for the ITS domain specialization. The first one itsDomain.ent contains the entity definitions that will be used in the extended DTD.

AllTODO: these changes allow you to include a new itsintegration element in different parts of the DITA document and use ITS-defined constructs where DITA may be missing support, such as for ruby text. This also allows you to use a selection of ITS-defined attributes to complement what DITA already provides.

Example 56: DITA document with ITS

This DITA document includes the following ITS constructs:

An its:ruleselement is added to the prologelement to specify that, in the scope of this document, the content of uicontrolelements is not to be translated.

The second pelement includes a its:locNoteattribute that applies to its content.

5.4.2 Associating existing DITA markup with ITS

There are several ITS data categories that are already implemented in DITA. For example, DITA offers a translate attribute that provides the same functionality as its:translate.

In the same way asLike for other formats, these existing features can be associated with ITS data categories, so ITS-enabled tools can process seamlessly DITA source documents.

Note: When you have the choice of using a DITA construct or an ITS construct to express the same thing,usething, make sure to use the DITA construct to ensure that DITA processors work properly. Use ITS local markup only if DITA does not provide an equivalent.

5.5 ITS and GladeXML

Glade [Glade] is a user interface builder system for GTK+ and Gnome. It uses XML files (GladeXML) to store the user-interface components. The library has been ported to different platforms and offers bindings in different programming languages.

5.5.1 Integration of ITS into GladeXML

The content of the GladeXML files are mostly composed of data that should not be translated: user-interface widgets properties. Text content is limited to titles, labels and a few various other types of UI strings.

GladeXML While Glade does offer support for some of the ITS features, but not all of them. While it would bewant technically feasible to allow the use of additional ITS markup directly in your GladeXML resources, there is little pointdoing it here because these resources are closely tied to the Glade's editors and compilers which would have to be modified as well.resources.

TODO

5.5.2 Associating Existing GladeXML Markup with ITS

GladeXML offers a translatable attribute that provides the same functionality as its:translate. The comments attribute can also be associated with localization notes.

Like for other formats, existing features of GladeXML can be associated with ITS data categories using global rules, so ITS-enabled tools can seamlessly process GladeXMLGlade source documents.

5.6 ITS and DocBook

DocBook is a general purpose XML schema particularly well suited to
books and papers about computer hardware and software (though it is by
no means limited to these applications). DocBook is maintained by the
DocBook
Technical Committee of OASIS.

5.6.1 Integration of ITS into DocBook

DocBook V5.0 V5.0 schema is maintained as a very modular and easy to
customize schema written in RELAX NG [RELAX NG 1.0]. General
techniques for schema customization are described in [DocBook V5.0 HOWTO].

The ITS additions involve the following changes to the DocBook schema:

For your convenience there is also available a “flattened” schema
stored inside one file.file and converted to other schema languages as
well.

dbits.rnc (RELAX NG compact syntax schema in one file)Flattened version are broken at this time

dbits.rng (RELAX NG schema in one file)Flattened version are broken at this time
dbits.dtd (DTD in one file)Flattened version are broken at this time
dbits.xsd (W3C XML Schema)TODO

There is no need to add the its:span element as
DocBook provides similar element called phrase which can be
used for attaching ITS local attributes to an arbitrary piece of
text.

The following example shows a sample DocBook article conforming to
the DocBook+ITS schema. The its:translateRule element is used to indicate that
function names (marked up usingby the function element) should not be
translated. The first paragraph is also marked as not to be translatednon-translatable
using local ITS markup.

5.6.2 Associating existing DocBook markup with ITS

A number of DocBook constructs implement the same semantics as some
of the ITS data categories. In addition, some of the DocBook
attributes have values which should be translated,translatable, which is not the default for XML
documents according to the ITS default
settingsfor translatability. These attributes need to be
identified as needing translation.translatable.

Note: When you have the choice of using a DocBook construct or an ITS
construct to express the same thing, make sure you use the DocBook
construct to ensure DocBook processing tools work properly. Use ITS local
markup only if DocBook does not provide an equivalent.

An external ITS its:rules element can
summarize these relationships. Because DocBook use is widespread and
diverse, the rules defined here are just examples which may need further
tailoring for specific use.

B AcknowledgementsRevision Log (Non-Normative)

The following log records major changes that have been made to this document since the publication in June 2007.
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Updated the section .
Created the content for the section .
Created the content for the section .
Created the content for the section .
Created the content for the section .
Created the content for the section .