Preparing half-translated bilingual XML for Trados Studio – with XSLT

More and more translation clients, especially in the Web industry, but also in application I18N/L10N, use the versatile XML standard for translation purposes. The market leader of Computer Aided Translation (CAT) Tools, SDL’s Trados Studio, allows to translate XML with an „Any XML“ input filter, which includes an assistant that lets you choose which XML tags and attributes will be visible in the editor as „translatables“. Unfortunately, this means that the source strings will be overwritten with the translation — a bad idea if the source file is already bilingual XML that contains source and target language strings in matched tags.

If the target strings are empty, you can easily copy the content over and translate right away. But if the file is already partly translated, things get a bit more tricky, since you don’t want to overwrite existing translations. Worse, if the client happily announces that the source of some of the translated strings has changed, things get more than just a bit tricky. Let’s have a look at how to prepare those files with XSLT!

Alright, you have seen XML already, don’t you? Right. Looks similar to HTML, but you get to define the valid structure and tags in your own DTD. This basically means that while HTML is mainly used to display structured information to the human user, XML’s primary purpose is to contain structured information of any kind for humans and machines alike, and let separate stylesheets worry about how it will be displayed (e.g., as XHTML, PDF, LaTeX, CSV tables, plain text, you name it). If you want to know more, have a look at the XML and XSLT pages over at W3schools.

The XML file

Let’s first have a look at the file we want to translate with Trados (or the free/open source OmegaT plus Okapi Rainbow combo, or any other CAT tool):

Sometimes, clients will wrap HTML into those tags as Character Data (<![CDATA[ ]]>), which means you will get to see every tag in the translation environment as plain text. Be careful with those tags! Dear Clients: Using CDATA may lead to messed-up code during the translation, please try to use namespaces instead to enclose HTML in XML, then they will be correctly parsed and displayed as immutable tags and the translator is less likely to forget or mangle a tag somewhere.

The file starts with the XML file declaration including version and encoding. The mandatory „root element“ uistrings encompasses all other tags, it also holds the source and target languages as attributes. Inside, we can see three string tags with their IDs as attributes, each with one text and one translation tag with the actual source and target content. Attention: If the file is saved as ANSI instead of UTF-8, the Umlaut and Ampersands might throw parsing errors and should be replaced with Entities!

I have inserted three use cases: The string is already accurately translated, the string is untranslated (empty translation tag) and the string is translated but the translation doesn’t match the text (here: the company name has changed). Unfortunately, our virtual client has not marked that string as modified, for example by setting something like a new or modified="yes" attribute on the string or text element.

So, we have already translated strings, empty strings and strings that need to be edited. Usually, you would want to write your translations into the translation elements. However, telling Trados to parse the translation elements as translatables will lead to English text in TagEditor’s German source column for strings 001 and 003, and you won’t get to see string 002 at all, because it’s empty and nobody would ever need to translate „nothing“, right? And on top of that, you won’t ever get to see the German source text.

File preparation

So apparently, what we need to do before translating the translation elements is to copy the source text, preferably without destroying extant translations. One way to achieve this is by using a text editor with Regular Expression Search&Replace functionality to turn the whole XML thing into a tab-separated table, save as .TXT, use Trados text table input filter to read and translate the file and turn it back into an XML document with another RegEx. Been there (article in German), it works quite nicely and you automatically have the source text in TagEditor’s source column and any existing translations in the target column. But let’s try using only XML this time, shall we?

XML and XSL are like HTML and CSS on steroids. Not only can XSL present XML data in a number of other languages, it also lets you convert one XML file into another, use variables, copy and move elements, and even use control structures such as if. One (good) use is to convert our XML file into an HTML file showing us three columns: ID, text and translation – and tell Trados in the file type options to use that .xsl stylesheet to display the preview window. Trados will even mark the currently edited segment with a red box in that preview, and we have our source and target sitting nicely side by side instead of having to stare at XML code. Example:

But it gets even better: As I have said already, such an XSL sheet can also transform one XML file into another XML file, and that’s where we can make that whole CAT translation thingy work, because Trados actually has a special XML filetype that is bilingual and that is read and displayed and edited correctly: XLIFF, the Translation (abbreviated XL) Interchange File Format, which is used by Trados and almost all other major CAT tools (as an import/export format if not natively). XLIFF is for bilingual texts what TMX is for translation memories.

How it works: We begin with our usual xml file type declaration, followed by the declaration that this is going to be an xsl:stylesheet, including the XSL version and namespaces for XSL and XLIFF. We also add that we don’t want to see xliff: prepended to any element in the output file. Then we proceed to the desired output, which is going to be XML and shall be indented for better readability. To define how our XLIFF file should look like, we begin our xsl:template at the uistrings root element (the one which holds all other elements) of our sample XML file.

The first line that will be written into the new file is its own file type declaration (xliff), together with its namespace, followed by a body element. Then, we begin iterating through our strings from the XML file: For each string, we write one trans-unit carrying the id as its attribute. Each one will contain one source and one target element with the content of the original text and translation elements. Then we end the loop, neatly close our body, file and xliff tags and end the xsl:template. And that’s also the end of our xsl:stylesheet. Easy, isn’t it? You just need to know how your desired file must look like and insert the content into the corresponding places – the for-each statement does the rest.

Convert to XLIFF, translate, convert back

Now let’s see if it works! You can download Apache’s Xalan XSLT processor either as a C binary or as a Java app (Xalan-C / Xalan-J). Personally, I find that the Xalan-C is less hassle: You download the xalan-comb-… package for your system (usually x86-windows or amd64-windows) right from here, extract the archive, drop the contents from the Xerxes directory into the Xalan directory (integrate the folders bin, include and lib) and there you go. There are other XSLT processors, but Xalan is open source, free, libre and easy to work with.

Once you are done extracting (no real „installation“ required), copy the above XML file code (the first code box) and paste it into an empty text file. Save that as test.xml. Likewise, copy the code from the last sample XSL sheet and save that as test2xliff.xsl. From the Xalan „bin“ directory, do: xalan.exe -o testoutput.xlf test.xml test2xliff.xsl – be sure to include the full path to where you saved the test files, e.g. xalan.exe -o C:\Users\Me\Documents\XMLtest\testoutput.xlf C:\Users\Me\Documents\XMLtest\test.xml C:\Users\Me\Documents\XMLtest\test2xliff.xsl

Subsequently, you can open the .XLF (short form of .XLIFF) with Trados File/Open command and translate that. For me, it worked without hassles.

Wham!

Now you know how to write an XSL transformation into XLIFF. Will you be able to write a similar XLS transformation to convert the XLIFF back into the original XML file format? Try it out and tell me!
Cheers,
Christopher Köbel from DeFrEnT

13 Kommentare

Tommi Nieminen Veröffentlicht am12:12 pm - Nov 13, 2014

That’s a good trick Christopher. I’m looking for a solution for importing Qt Linguist XML files (.ts) into Studio, and so far the best option seems to be making a new filetype with the Studio SDK, which is quite a lot of work. However, that way the whole process of conversion is invisible to the person preparing the files. I just wish it was possible to create custom bilingual XML file types inside Studio in the same way as normal XML file types.

Christopher Köbel Veröffentlicht am5:04 pm - Okt 1, 2015

Hello Dark(ness my old friend?),

I was just short of not allowing that comment because it’s obviously an advertising… but since it is actually a relevant ad that might interest some of the readers here, I’ll allow it. This time.
As to your proposed SaaS cloud solution, well, I am like most Germans are nowadays – we’re wary of cloud solutions. Not only do they require a ’net connection to work at all (clear plus for locally installed software: it only needs a power plug or, in the case of laptops, enough battery power left), but it also raises confidentiality issues: I have to trust the cloud service provider not to scoop off any client data and all the intermediate server operators and the SSL protocol which doesn’t look as safe as it once did, and that means I have come to like working offline a lot more in those last few years. So that’s why I won’t use it, as good as it might be on the UI/UX and functionality sides. Just as I will never use a Dropbox or other file hoster to transfer data from/to clients. And wherever I can, I’m nudging my clients to get themselves E-Mail certificates to close down that gaping security hole, too. For others who don’t mind, PoEditor dot com might work very well, so good luck!

Bill White Veröffentlicht am11:18 pm - Aug 14, 2017

We are new to trados and I see you have written a very nice explanation in this post and was wondering if you could provide a similar example to do the following:

We would like to „hide“ certain nodes in our xml that represent items that should not be translated such as radio button being set to „yes“. Some of our translators continue to translate this node despite being told not to.

Christopher Köbel Veröffentlicht am10:28 am - Aug 15, 2017

Hello Bill,

depending on your XML structure and DTD, you can either use existing attributes (XLIFF) to separate translatables from non-translatables or you will have to make up and insert attributes that allow for this distinction (but if the XML is being validated, these new attributes must be in the relevant DTD or validation will fail). With Trados, a custom XML file filter is what you are looking for, because if nodes either have a unique name (such as <button>) or are marked with a recognizable attribute (e.g. <text editable="no">), correctly set ParseRules will either hide non-translatables from translators or lock them in the Editor.

So, assuming all your translators are using Trados, there will be 2 cases:

you are sending your translators XLIFF files, created, for example, by the above procedure. In this case, the standard setting of Trados‘ XLIFF file filter is to lock segments whose XLIFF status attribute is set to state='final' as well as those with a translate='no' attribute. If those are set on the relevant tags, they should not be able to be translated in the Editor. I haven’t tested, but could imagine that translators can manually unlock the relevant segments, but then the whole situation shifts from negligent to malignant behaviour and I’d seriously reconsider the collaboration. Nevertheless, this would be my preferred way, since XLIFF is a standard for translation and has widely understood in-built „switches“ to set translatables apart from non-translatables.

you are sending them the original XML file. In this case, your best bet is to create a custom „XML: Any XML“-based file type definition that includes instructions to not translate certain tags (or tags with a certain attribute). More on setting the relevant Parser Rules can be found in the Trados Help for Trados 2017 or this older help article on parser rules and to export it so that you can send it along with the XML and tell translators to use that definition only (by importing it in their Trados and disabling the other XML file filters before loading the file).

You will have to decide for yourself which workflow will be easier / safer / better for your setting. Hope these hints have pointed you in the right direction.
Christopher

Hints that the source of my current EN>FR manual #xl8 might have been written by a German engineer: "solve" the screw on the "belt roles", having to loose a singular clamping screw before fastening a plural of them (do the others come loose by themselves?), ... 🤨😑