This example shows how the use of ITS within an XLIFF-based workflow should map onto the PROV model. The example shows the PROV model in turtle format at each of the following stages of mapping an English [EX=xliff-prov-rt-1-src.html|HTML5 file] into French.

This example shows how the use of ITS within an XLIFF-based workflow should map onto the PROV model. The example shows the PROV model in turtle format at each of the following stages of mapping an English [EX=xliff-prov-rt-1-src.html|HTML5 file] into French.

1 Scope

This best practice document explains how the provRef data attribute of the ITS2.0 Provance data category can be used in conjunction with external provance records conformant to the W3C PROV recommendation.

The ITS2.0 Provenance data category allows inline identification of people, organisations and tools/services that were involved in the translation or translation revision of the annotated content. The inline provenance annotation does not support recording of the timing of translation or translation revision, additional attributes related to those activites nor record provenance information related to other types of activities related to internationalization and localization. For such use cases the provRef attribute can be used to point to such information in external provenance records. The ITS specification recommends the use of the W3C PROV specification for such records. This note therefore describes best practice for structuring PROV conformant external records.

As there is a growing interest in the use of RDF by the L10N and I18N community, the rest of this document will focus on the use of the RDF mapping of PROV.

The data category identifies the selected content as corresponding to an entity in a provenance record by specifying the provenance URI of that entity as specified in PROV-AQ. Such an entity provenance record can possess additional attributes characterizing the content it represents. Entities in a provenance record can be associated with provenance activities, representing processes that either made use of or generated the entity. Example activity types could include: named entity recognition; source QA; machine translation; postediting or target QA. Provenance records can also specify agents that play a role in an activity, therefore have some responsibility for the activity having taken place and as a result can have that responsibility expressed by the entity being attributed to the agent. Examples of agent types could be: people acting as translators or posteditors; pieces of software such as machine translation engines, text analytics services or CAT tools; or organizations such as Language Service Providers. Provenance records can also associate timings with entity generation and usage events as well as derivative or collection relationships between entities.

2 External Provenance Usage Scenarios

This best practice document introduces the following ITS usage scenarios that can be complemented by use of external provenance records.

translation and translation agent review using the ITS provenance category

This document also describes how external provenance records can be used with ITS mapped onto XLIFF.

It also indicates how external provenance records can be use with content that doesn't correspond to inline ITS markup. This is accomplished by using elements of the NLP Interchange Format.

Finally, it also explains how to interlink external provenance records that are related to the same content in a L10N workflow but are stored in different triple stores.

2.1 Extended translation and translation review provenance

2.2 Localization Quality Assurance Provenance

3 Summary

This implementation demonstrates how the Resource Description Framework (RDF) can can be used of integrate quality information from multiple sources. This is a prototype used to assess the viability of RDF for this role

In this use case addresses the following problems:

Integrate quality information from differ QA tools that are only available in different data schema

Benefit: Provide a single, but flexible, data schema so that QA data siloes from different tools can be integrated then then queried as a whole. This decouples the cost of generating job-level QA reports from the design of the value chain, the associated tool choices and resulting data siloes.

Benefit: This offers the potential for linking live quality data source across the value chain, specifically being able to link cusomter quality assessment to service provider assessments.

3.1 PROV using NIF

3.2 Example: PROV from XLIFF/ITS Roundtrip

This example shows how the use of ITS within an XLIFF-based workflow should map onto the PROV model. The example shows the PROV model in turtle format at each of the following stages of mapping an English [EX=xliff-prov-rt-1-src.html|HTML5 file] into French.

3.2.1 Extraction

Extracting the localizable content from the source file results in XLIFF file EX-xliff-prov-rt-post-extract.xlf

3.2.8 Translation Quality Assurance

3.2.9 Reassembly

Reassmbly of the content in the target language results in HTML5 file EX-xliff-prov-rt-post-reassembly.html

3.3 Interlinking PROV record across Triple stores

It is possible for multiple entity provenance records pertaining to the same content to co-exist. This may be because two organizations record differing views of the provenance of the same content. For example, a localization client may view the whole localization workflow resulting in translated content as a single step, whereas a language service provider may record details of the QA process conducted prior to the that same content being delivered. Therefore, document content may be associated with more than one entity provenance-URI, each potentially from a different provenance store.

4 Extension to PROV Schema

The basic PROV schema is structured according to the figure below (taken from the PROV Primer)