Unless you only plan to support a translated user manual, the localization work typically involves two parts -- the software user interface and the online help. Both need to be translated and both require a quite different approach. It would be possible to write an entire article about software localization, so I just provide a brief overview here.

When you localize your software's EXE file you change the strings in the program. This can be done in the compiled EXE file or at an earlier stage, before compilation. In most cases, strings will be separated from the core EXE by the compiler, putting all language-dependent text into a separate language DLL. Most modern programming IDEs have this functionality and will also extract a string table of the translatable text. Whether the strings are compiled into the EXE or located in a separate DLL, they are resource strings -- a standard Windows data type that can be read and written independently of the compiler. Translation software for the localization of your program typically reads those string tables and builds a database of matching pairs -- the original language and the translation. Once this database is built, it can more or less automatically translate software updates and will only stop at those strings it doesn't already know.

This approach works quite reliably, once the initial preparations on the development side are in place. The trickiest parts are the separation of the language-dependent text and the preparation of your program's dialog windows for other languages. Here is a quick checklist of things you need to consider:

Is the translatable text in your program hard-coded?

Does your program retrieve output text from a database?

Do you generate information and error messages dynamically?

Do you use formatting strings? (A string like "File %s has been changed! Do you want to save it?" is one string to translate. If you compose this message from two strings, you've got two separate strings to translate. Remember that in some languages the structure of the entire sentence may be different, requiring a different position for the break and the variables.)

Do you use images that are language-dependent?

Are your dialog boxes and text labels large enough to fit other languages? (German or French text is significantly longer than the same text in English.)

Do you plan to localize your software into languages which require Unicode and does your development tool support this?

Since you are reading this, you are most likely using Help & Manual for your software documentation One of the most powerful features of Help & Manual is its ability to output to many different formats. And you don't want to lose this feature for the translated documentation, do you? So, in order to keep this feature, you need to translate the source format. A help project not only contains translatable text, it also has a logical structure (topic IDs, associative keywords, context numbers) and meta data associated with the text. But before we go into the details of the pros and and cons of the different translation workflows, let's discuss some basic considerations -- these considerations will tell you which translation workflow is necessary or appropriate.

Which languages can you handle?

When you localize your software, even if you only localize the documentation of the software, keep in mind that users will expect you to offer support in this language as well. This is especially important for small independent software vendors (ISVs). You not only need a one-time translation, you will most probably face customer support requests in that language, too. Imagine having to say, "Yes we have a French version, but for support, please write in English". All you will get will be angry customers, because providing the software in their language indicates that you understand it. So, if you plan to localize into languages which you cannot handle yourself, it might be a good idea to look for a local partner who can handle the support. If you make the software and the documentation translation-ready, the partner might also be able to take on the translation. However, this requires a different workflow than your own in-house translation.

Let's have a look at three basic translation workflow models:

The ISV model: You are the developer and translator. Alternatively, you have employees who speak the language and the translation is done internally, ideally with the tools you already have. If you are a native German speaker, you might offer your software and support for it in German and English.

The partner model: You prepare your software to make it ready for translation, but do not translate it yourself. This model is often combined with the first.

The corporate model: You work with professional translators on a project basis. Support might be handled by local distributors, but you are responsible for delivering the localized software. The software gets translated into several different languages -- ideally simultaneously, since you want to be able to release the localized versions without much delay.

I will discuss the three models in the summary at the end of this article. First, let's have a look at the tools available for doing this work. Basically, you can translate text manually with Help & Manual, MS Word, an XML editor, or even with Notepad. Or you could use a computer-aided translation tool (CAT).

2.1 Computer-aided translation (CAT)

There are some myths about automated translation that need to be clarified. First of all, there is no such thing as fully automated translation, at least not yet and not in the next couple of years. Computers still don't understand text. When you translate a website using http://babelfish.altavista.com, this may help you to get an idea of what the content is about, if you don't understand the language used at all. But apart from that, these machine-generated translations are only good for laughs, like this example (sorry, German speakers only):

You don't want to be laughed at, do you? For translation, you need a human translator who understands the target language. Computer-aided translation means translating with a semi-automated approach. Most translation tools on the market fall into this category. A human translator starts with an empty database. He or she imports the source-language content, splits it into manageable pieces (words, special terms, phrases, complete sentences or even groups of sentences) and manually translates the text. The translation software saves both -- the source and the translation -- in a database, a so-called Translation Memory or TM. The idea is that if the same piece of text occurs again in another place, the translator can simply pick the matching entry from the translation memory. Translation tools offer a fuzzy search, so the translated content can at least be used as a basis, even if the source text is slightly different.

Translation memories now use standardized formats. Most CAT tools can import and export TMX (Translation Memory eXchange, http://www.lisa.org/standards/tmx/) or XLIFF (XML Localization Interchange File Format, http://www.xliff.org). This makes TMs independent of the translation tool. In theory at least, because there are several revisions of these formats and not all CAT tools can handle the latest revisions. However, a translation memory ensures consistency throughout a translation, which is especially important if you have more than one translator working on a large project. The TM is an integral part of the translation and should be delivered with the translated document.

CAT tools are a highly specialized pieces of software and since translation always requires human labor it is generally expensive. So are the tools. The prices range from $300 for simple source code localization tools to several thousand dollars for translation software able to handle all aspects of the translation workflow including teamwork functionality. The market leader Trados, for instance, starts at $895 for a single user license. Here is in incomplete list of translation tools which specialize in document translation (as opposed to the list of software localization tools above):

- For small projects, setup costs are often too high and it's cheaper to translate manually

- Many freelance translators do not use CAT tools

The kind of results you can achieve in practice depend on a number of factors. The more simply your text is formatted, the more you use styles, the easier is it to translate, hence the lower the costs. Computer-aided translation works very well for software localization, since the source text found in EXE files and language DLLs is usually just that: unformatted plain text. Furthermore, the strings have language-independent IDs that make them easy to identify and the localization tool can handle large parts automatically, once the translation memory for a particular application has been built.

Automated translation of formatted documents such as help files is a bit more difficult. Because the text is formatted, the translation tool has to return a formatted translation of the text. While text attributes such as bold, italic and underlined might be simple to retain, elements like hyperlinks are much more problematic: a hyperlink contains meta data -- the link address, a link caption, a title and perhaps a target window. The translation software must take this into account, it must translate the text including caption and title, while leaving the target address in place.

How do translation tools do that?

Appropriate source formats: feeding CAT tools

Most translation tools accept standard formats such as DOC, RTF, HTML and XML for input and output. Plain text is not appropriate, unless the topics in your help project are unformatted text without images and links.

DOC and RTF as input

Help & Manual creates RTF manuals and most translators are familiar with Microsoft Word. Even if they use a CAT tool, the software will most probably be able to handle this format. However, if you export with Help & Manual to RTF, you lose much of the original content. The RTF export is designed for creating printed manuals and Help & Manual will export only those parts of your help project that go into the user manual. Conditional text? Forget it!

HTML as source format

HTML is slightly better. It's a tagged standard format, Help & Manual separates styles from content and can import HTML very well. But there is still no way to include all the variants of your conditional text in the HTML output. Specialized topics, such as context-sensitive popups, may not be recognized when you re-import the translated HTML.

Furthermore, Help & Manual internally separates the topic templates from the topic content but the two get combined when you export to HTML. The translation tool doesn't know which part belongs to the topic body and which to the header and it can't distinguish the template from the content. This will hit you when you re-import the translated HTML pages: the separation of topic content and templates is gone.

XML for translation

The XML support in Help & Manual was designed -- not only but primarily -- for translation. With XML it is possible to describe the content and structure of a help project completely, so that no data is lost in the translation process. Moreover, Help & Manual's XML schema is clearly structured so that it is immediately obvious what is translatable text and what is not. If you plan to get the translated manual back into Help & Manual, use XML.

2.2 Manual translation with external editors

So far we have only discussed computer aided translation (CAT). But what if you don't use translation tools at all, which editor is appropriate for translating the content? Theoretically, you can use any XML editor. An XML editor displays the XML structure and lets you edit the tags. XML is a standard format, but that doesn't mean that a particular XML schema is automatically understood.

The XML format tells the XML editor which tags it contains and which attributes they have. The additional XSD schema file that Help & Manual automatically creates tells the XML editor how to handle the tags, which tags are required, which are optional and which values are allowed for a particular attribute. XSD schema files are the successor of DTDs (Document Type Definition), an older standard for content description. XSD files do the same as the former DTDs, but they are themselves XML documents and support a much more detailed description of the content.

However, even a schema file doesn't tell the editor how to display the content. In other words: what you take for granted in Help & Manual -- the WYSIWYG display -- is something an XML editor knows nothing about. It cannot, because there is no standard for that.

(One common mistake about XML is that many user think that XML describes the content, so an XML editor must know how to display it. Actually, XML is completely abstract and only defines how data is described. It contains no information at all about what the data is about or how to display it.)

However, the more powerful XML editors like XMLSpy do have WYSIWYG editing capabilities. Since there is no standard for that, each vendor creates their own solution for WYSIWYG display and how useful the results are depends entirely on the capabilities of the tool. Also, configuring an editor for a particular XML schema is always a lot of work and usually involves creating a special customized template of some kind.

Help & Manual also exports an XSL stylesheet along with the XML data files. Now, you might wonder if this stylesheet can be used by an XML editor. It can, but only for display. The XSL file does an on-the-fly transformation of the XML data to HTML. That's why you can open an XML topic in your browser and see formatted text. But XSL transformations are a one-way street, they transform the original XML into something else. What you see in the browser is "instant HTML", it's not XML anymore. So this XSL stylesheet is useful for proofreading, but that is really all. First, the translator gets an idea of what the content looks like. Also, you can use the XSL stylesheet to visualize the translated topic in the same way, to check if everything is correct. The editor, however, needs to edit the original XML data, since you want it to return XML again, and not HTML. Let me repeat this: the XSL stylesheet is for proofreading, not for editing.

Pros and Cons of external XML editors

+ XML is a standard data format

+ You can use any XML editor to edit it

+ Help & Manual supports XSL style sheets for WYSIWYG proofreading

- For WYSIWYG editing you still need to tell the editor how to handle the tags (if this is possible, which isn't always the case)

- License costs for the XML tools

- No explicit support for synchronization of updates

- Translator may not be familiar with the tool

- Damage of XML data by the editor is possible (testing cycles necessary)

- Microsoft Word is not appropriate for XML editing

Synchronization of updates

The last point of my pros & cons list -- support for synchronization -- is less obvious when you start localizing your documentation. But if you are not prepared it will hit you as soon as you want to produce an updated version of your project.

Imagine you have completed version 1 of your software, the online help in English is finished and has been translated into French as well. You got the French version back into Help & Manual and everything went smoothly. That's all fine. One year later, you bring out version 2. You have updated the English help file which contains a few dozen new topics, many topics have changed and some have been removed. The structure of the table of contents has changed, too. The remaining point on your to-do-list is to update the changes in the French user manual. Now what?

You certainly don't want the translator to start all over again. Since many topics remained unchanged, it makes sense to use the old translation. Even those topics which have changed aren't completely new, so most of the translated content is reusable.

To update the French user manual, you need the old French v1 source. If you have that in Help & Manual, you can simply export it to XML. The new English v2 manual also gets exported to XML and both will be shipped to the translator. Help & Manual nicely creates a ZIP compressed localization kit for both manuals, ready for deployment. If you look inside that localization kit, you notice that the XML topic files have different time stamps! The time stamp represents the time when you last edited (or created) the topic in Help & Manual -- unchanged topics have an old time stamp which makes them easy to identify. If something goes wrong, the time stamp is also encoded inside the XML file, in the <topic> tag.

The changed topics are the most difficult part, not the new ones. A CAT tool would find that most of the text in a topic actually remains unchanged, since it has a translation memory to look this up. Without a TM, you need to compare English v1 with English v2 to find the differences. It was such a great idea to backup the old help project once it was finished, wasn't it? Now you can restore that old version, create another localization kit and add it to the distribution for comparison. XML files are text and can be easily compared. So, when you update a translation, you are probably dealing with 3 different help projects, a translation triple jump... Keep that in mind when a revision cycle is completed and archive it properly.

Instead of using an external comparison tool, you can also synchronize the content in Help & Manual directly. This requires some preparations when you start translating version 1. More on that in the next chapter: Synchronization in Help & Manual.

Late binding

Almost every software project involves late modifications that make it necessary to update the online help. And as we all know, this often happens less than 24 hours before release. If the documentation is already translated when it happens, a complete XML export/import cycle might be overkill. However, this isn't necessary because Help & Manual can alos export and import single topics: In the Topics menu, select Save Topic to File and choose the H&M XML topic format to save a single XML file. You can then use the Load Topic from File option in the same menu to insert an updated single topic into the localized help project.

2.3 Translating in Help & Manual

The pros and cons of external editors inevitably raise the question of whether it would make sense to do the translation work in Help & Manual itself. This is not a plug for our own product, just a description of what is possible -- it's up to you to decide which route you want to take. The fact is, Help & Manual is an XML editor that you already have and are familiar with, and it offers a full WYSIWYG editing environment. WYSIWYG editing in XML editors is theoretically possible but it means creating customized templates for the editor you are using, which is a complex programming task. Let's see what Help & Manual offers as a translation tool:

H&M has a very quick learning curve, since its editor is very similar to MS Word. The help project is clearly structured and simple to maintain. So, to get started you could simply make a copy of your original help project and start translating it. On the down side, there is no support for computer-aided translation within Help & Manual. So if your translators are already using a CAT tool then by all means go the XML route, this is definitely the better option if it is available. If you or your translators are not using a CAT tool then doing your translation in Help & Manual is a viable alternative to using external XML editors, providing a rich and familiar editing environment.

Getting started

When you start translating in Help & Manual it is important to keep in mind that you will want to update the translated help file later on. Always remember that you will need to synchronize the language pairs once you have updated the original source.

H&M has a dedicated function for that: Project Synchronization in the Tools menu. This tool performs two tasks: First, it creates a copy of your original project for translation. Later, when you come to the update cycle, it will update the translation copy to synchronize it with the updated source.

It is important to use the Project Synchronization tool to create the copy of your original project for the translator. This ensures that all the IDs are the same in both projects (the visible topic IDs and hidden IDs for TOC entries and the unchanging internal topic identifiers). Help & Manual needs these IDs to be able detect changes and synchronize them correctly. For instance, if you add a new topic in the English project and do the same (manually) in the French twin project, you will end up having different IDs in the TOC and this will cause synchronization problems.

Synchronization in Help & Manual

When the first release is done and you are going to update the French version 1 from the new English version 2, use the Project Synchronization again. If you created the initial copy from the original project, Help & Manual will properly synchronize the language twin, so that - technically - it exactly matches the source file. It takes care that all topic IDs are available and named correctly, help context numbers will be correctly assigned and A-keywords will be properly in place.

The Project Synchronization also highlights topics and TOC entries that have changed since the last revision, making it easy to identify topics which need a review or require to be translated from scratch. For more information on this topic, please also read the chapter The Project Synchronization in the online help: http://www.helpandmanual.com/help/index.html?hm_advanced_tools_projsynch.htm

Pros and Cons of using Help & Manual for translation

+ Integrated "WYSIWYG" editor

+ Can be instantly used (no configuration of external editor)

+ Quick learning curve

+ Damage to the XML code and project structure is unlikely

- No support for computer-aided translation

- Extra license for external translator required

- Translator may not yet be familiar with the tool

3 Summary - recommendations for the translation work flow

Unless Microsoft updates MS Word to make it a true XML editor that can handle any XML document with some kind of "WYSIWYG-like" display we only have the options described above. To summarize: If you or your translators already have a CAT tool then use it, this is the best option. If you don't already have a CAT tool, you are left with 3 options: Consider a CAT tool, use an external XML editor or use Help & Manual as your translation tool.

Back to our three basic translation work flow types:

The ISV model

If you do your translation in-house and without computer-aided translation tools then I strongly recommend that you stay with Help & Manual. H&M has everything you need to translate and synchronize the documentation. An external XML editor simply doesn't make any sense in this case. The cheap ones don't have WYSIWYG editing capabilities and the advanced tools are expensive -- considerably more expensive than an additional H&M license for the translator. (Note that the translator only needs the Standard version of H&M, which can also be used to edit all projects created with the Pro version.)

If you are already using a CAT tool, then export your help project to XML, translate it with the CAT tool and re-import the data into Help & Manual.

The partner model

Here everything depends on the tools your partner uses. If they have a CAT tool, then they should use it, working on the XML output generated by H&M. If they don't have a CAT tool but are familiar with XML and have the tools to edit the help project in XML format, this would be also be a viable alternative. If your partner does not yet have any suitable tool then Help & Manual is also an alternative worth considering, with the added benefit of zero configuration -- it can edit your projects directly.

The corporate model

If you are going to translate your project into several languages simultaneously using the services of professional translators or translation agencies, you can generally assume that these people already know how to deal with XML. If you are translating multiple projects or multiple technical documents rather than a single help project then CAT tools should really be considered a must. They help to ensure consistency of your terminology and style across multiple documents.