Introduction

The mapping development is being transferred to the OASIS XLIFF TC where it is becoming a normative part of the planned XLIFF 2.1 release.
Parts that gave been transitioned to XLIFF TC are marked and should not be further developed here.
This document provides a recommendation on how the ITS 2.0 data categories are represented in XLIFF 2.
For the mapping between ITS 2.0 and XLIFF 1.2 see the page "XLIFF 1.2 Mapping".

Implementing and testing the mapping

General implementation and testing considerations

This section is a stub. Feel free to complete it by providing e.g. these ideas:

What input files are needed: XLIFF, general XML, HTML5?

XLIFF 2 documents, and I suppose (to see if an extractor supports ITS too): HTML5 or XML documents

What output is needed: XLIFF only?

For the extraction case: the XLIFF output

But I suppose some kind of comparable text format would be ideal. I'm not sure the same XPath-based format we used for ITS would be best here as XLIFF processors may be using very different way to process the document. Maybe something using the ID of the object rather than the path would be better?

How would the conformance of the output to mappings be tested?

Ideally by comparing the gold output to the tool's output

What would be a good location of the test files - a github repository or a XLIFF / ITS group specific location ? Advantage: many people can contribute

Github would be fine. I'm guessing there may be some call for hosting this in SVN's OASIS too.

Would we need to require a preprocessing of XLIFF files so that general ITS processors understand them? See the related thread.

The semantics of the attributes are analogical to their counterparts in the W3C ITS namespace in case those counterparts exist. The main semantic difference between its and itsm attributes is that itsm attributes can apply on non-wellformed spans that are delimited by empty boundary markers <sm/>/<em/>.

Note YS: it's also because the ITS namespace needs at time to be completed: when a data category uses the XLIFF markup and is missing some features (we would not be able to use the ITS namespace for this); and when ITS local rules are missing things, like a domain attribute

Handling of ITS Tools Annotation

ITS 2.0 provides a tools annotation mechanism. It identifies the processor that generates ITS information. This information is mandatory for the MT Confidence data category and optional for other data categories. It is mandatory for Terminology and Text Analysis if these provide confidence information.

tbd: what is special about handling this in XLIFF?

Note YS: nothing really, only that it has to hanlde the sm/em case too

Handling of overlap

In XLIFF, ITS information among others may be applied to mrk elements. If the ITS information is applied to pairs of sm and em elements, it may overlap with other elements. In that case the normal ITS mechanism of datacategory inheritance for elements nodes cannot be applied, because it would applies to the empty content of sm or em, not the content between sm and the corresponding em.

An ITS processor, before processing an XLIFF file, needs to do the following steps.

1) Change all pc elements to sc and ec elements. This is needed to handle proper inheritace of ITS 2.0 information. Example:

Use type="its:term-no" for denoting instances where you have its:term="no".

its:termInfoRef is mapped to the XLIFF ref attribute.

its:termConfidence is mapped to itsxlf:termConfidence.

When itsxlf:termConfidence is used, the annotated text MUST be contained within an element with a relevant its:annotatorsRef.

The attribute value can be used to store information denoted by the global rule attribute its:termInfoPointer.

WARNING: TBD: the XLIFF 2 specification allow ref and value to be both set at the same time. ITS 2.0 does not allow an info and an info-ref to be set at the same time. So we have to decide something for this case.

Note: If needed, the value of the ITS termInfoRef attribute is to be adjusted to point to a resource accessible from the XLIFF document. The location and format of this resource is decided by the tool creating the XLIFF document.

Language Information [TRANSFERRED TO XLIFF 2.1 Draft]

XLIFF is a bilingual document and defines the source and target language of its payload using the srcLang and trgLang attributes in the <xliff> element. By default, those languages apply to the content of <source> and <target>.

Structural Elements

Because XLIFF documents are normally source-monolingual, whole paragraphs in the source document that are not in the main source language are generally not to be extracted.

If there is a need to extract such content, the XLIFF output has to use an inline <mrk> element to enclose the content in a different language than the normal source language of the document.

MT Confidence [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

Structural Elements

It is not recommended that MT Confidence be used at a structural level.

If a structural element of the original document has an MT Confidence annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element. See the Inline Elements section below for details.

The MT Confidence score must be within the scope of a corresponding its:annotatorsRef attribute.

In the match element

The MT Confidence data category can also be used on the <match> element of the Translation Candidates module.
In that case: use the matchQuality attribute to store the value. You must adjust the value by multiplying it by 100 as the scale of matchQuality is [0.0 to 100.0] and the scale for the MT Confidence is [0.0 to 1.0].

Text Analysis [TRANSERED TO XLIFF 2.1 DRAFT]

Structural Elements

Text Analysis is not to be used at a structural level.

If a structural element of the original document has a Text Analysis annotation, it is RECOMMENDED to represent that annotation using a <mrk> element that encloses the whole content of the <source> element.

When the Target Locale in XLIFF is Defined

Use the translate attribute (yes if the target locale applies, no if it does not).
It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target):

When the Target Locale in XLIFF is Defined

Use the <mrk> element with translate='yes' if the target does apply or translate='no' if it does not. It is also recommended to keep the original ITS attributes, so the file could potentially be re-purposed (even if it has a current target).

If the content does not apply to the defined target locale you can also simply replace it by an inline code.

Provenance (==========TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS)

Communicates the identity of agents that have been involved in the translation of the content or the revision of the translated content.
See http://www.w3.org/TR/its20/#provenance for more details.

Structural Elements

The Provenance data category can be used on <file>, <group> and <unit>.

If a standoff element is needed (because the annotated element has more than one set of the provenance attributes), the <its:provenanceRecords> element must be located in same the element as where the reference is declared.

Inline Elements

For annotating the source or the target content, use the <<mrk> element with the ITS attributes.
If a standoff <its:provenanceRecords> element is being used, it must be located in the same <unit> as where the inline rference is declared.

Localization Quality Issue [TRANSFERRED TO XLIFF 2.1 DRAFT]

Structural Elements

Localization Quality Issue annotation may be used to annotate the source or the target content within a <unit> element.
It is done by using the <mrk> element. See below for details.

Inline Elements

The ITS attributes for Localization Quality Issue may be used inline with an <mrk> within a <source> or <target> elements in a <unit> element. For example for a single instance of the Localization Quality Issue data category:

When needed, a stand-off notation can be used and it is located at the unit's extension point (before the first <segment> element).
Note that the reference must used the XLIFF's fragment identifier syntax. The Fragment identifier prefix for the ITS module/extension is its.

Allowed Characters [TRANSFER TO XLIFF 2.1 DRAFT IN PROGRESS]

Structural Elements

dF: I do not see why the allowed characters could not be set for whole units, groups or files. If a project is coming from a legacy system the restrictions are likely to be structural..

If a structural element of the original document has a Allowed Characters annotation, it is recommended to represent that annotation using a <mrk> element that encloses the whole content of the <source> element. For example:

Element Within Text [TRANSER TO XLIFF 2.1 DRAFT IN PROGRESS]

This data category is not used directly in XLIFF, but it drives what XLIFF element is used to represent the original element in the extracted document:

withinText='no': Use <unit>

withinText='yes': Use an inline element such as <pc>, <sc>/<ec> or <ph>.

withinText='nested': Use a separate <unit>.

Target Pointer (==========TO REVIEW)

Provides a way to associate the node of a given source content (i.e. the content to be translated) and the node of its corresponding target content.
See http://www.w3.org/TR/its20/#target-pointer for more details.

This data category is not mapped to XLIFF but used by extracting and merging tools to get the source content from the original document and put back the translated content at its proper location.

Note that ITS processors working on XLIFF documents should use the following rule to locate the source and target content:

Id Value (==========TO REVIEW)

Note that the identifiers in XLIFF are not unique per document, so using the Id Value data category to specify IDs in an XLIFF document is largely useless, excepted when used in very specific contexts that cannot be expressed in the ITS rules. See the Fragment Identifier section for details on IDs in XLIFF 2.