Introduction to Software Translation for MFC

An introduction to software localization and translation with issues specific to MFC development.

Introduction

Recently, I have been involved in localization of software applications for global markets. Although software localization and translation is usually (and hopefully) less complex and less expensive than the original development of the application, it is still a complex issue, and it can be difficult knowing how to get started.

In this article, I am summarizing some of the information I would have liked to have immediately available when I first considered localizing applications. This article is primarily targeted to programmers considering translation and localization of applications. It is intended to help you make the decision to proceed, and to point out some of the unexpected pitfalls you may encounter. It is by no means all the information you need on software translations and localization.

The information comes from research and my own experiences. As my experiences are incomplete, and limited to Latin languages, this article is limited to these areas as well. As I gain more experience with more complex translations, I hope to post updated articles on this subject. Of course, other programmers are welcome to add their comments and insights. I will apologize in advance for anything I say that is incorrect. None of this information is guaranteed accurate. After all, it is free!

It should be noted that there is a difference between translation and localization.

Translation is translating the application text to a different language, e.g. English to French.

Localization is customizing an application for a specific country or location, e.g. American English to U.K. English.

Of course, if you are translating software from English to French, you are probably localizing it as well.

Finally, I am using a writing style where the original language is assumed to be English. I hope this won't offend the rest of the planet... it is just easier!

When should you do the translation?

If at all possible, you should thoroughly complete and test the English version before beginning translation. It is much easier. Of course, the English version should have all strings in resources, and all date/time values should be formatted using system settings. This helps to eliminate bugs that could appear when moving strings into resources.

However, there may be good business reasons to develop English and translated versions simultaneously. Just remember that last minute changes to the user-interface will be even more difficult to accommodate than if you were developing in English alone. In this case, use of a translation toolkit is even more helpful.

How much translation do you want to do?

There are different degrees of software translation. Before beginning the translation, you should decide exactly how much of the application will be translated. For example, you may just want to translate the interface (menus, dialogs and resource strings), but not the documentation. Is it OK to have some obscure error messages in English? Maybe you want to translate on-line docs, but not printed docs. What about the installation program? All these components must be considered. You may even want to ship one version of the program with all resources in DLL in different languages, allowing the user to select the language for the interface.

Of course, if you only translate the interface, you end up with documentation that says "Select Import from the File menu", and the user has no "File" menu and no "Import" option. I think this is one reason that Microsoft has gone to "What's This" help, even though user's never seem to use it or know that it is there.

Preparing Your Application for Translation

Writing an application that you know will be translated, is not as easy as writing one for a single language. String issues are the most complex, but there are also dialog size issues, icons and accelerator keys.

String Resources

You probably know that one of the main reasons for using string resources is to simplify translation of the product, so make use of this feature. All strings displayed to the user, including formatting strings, should be retrieved from the resource string table.

Here are some other string handling tips:

Try to use consistent error messages. For example, the following error messages all say the same thing in principle, but could all be worded exactly the same to be more efficient:

"The file could not be opened."

"Failed to open file."

"Failed to open the selected file."

Even if you have to generalize an error message somewhat, it may be worth it to simplify translation. Most programmers write vague and cryptic error messages anyway (but that's a whole other topic).

Try to minimize the number of %d and %s formatting items in each string. It is possible that a string with two %s strings will require the strings to be displayed in reverse order when translated. Of course, this is very difficult to handle within the code. Minimizing these complex strings reduces your chances of this occurrence.

Try to make strings that make sense on their own. Your translator is not always going to understand the context in which each string is used. Short and vague strings are sure to generate a call from your translator asking for clarification about what it is trying to say.

Dialog Sizes

Translating English to another language can increase text size from 30% to as much as 100%, based on my experience. You will need to allow for extra space in your dialog boxes. Of course, you will be able to resize your dialogs before rebuilding the translated application, but it will be much easier if most of the dialogs and controls have sufficient size to begin with. You will have to balance funny looking English dialogs, with the amount of work involved in resizing translated dialogs.

Localizing MFC Print Preview

When localizing resources, don't forget about the print preview dialog bar and MFC print dialogs. Microsoft provides these resources in a variety of languages. Look in the VisualC\Mfc\Src directory. There is a sub-directory for each language. French resources are in l.fra.

These print preview resources are #include-ed into your RC file with the following code:

Open your RC file as a text file to find this code section. In the example shown above, I am including the French resources for standard components and print preview. This is indicated in the following lines:

Localizing Property Pages

The Good News: Text on these buttons appear in the language of the operating system, so you don't have to translate them at all.

The Bad News: If your application is in a different language than the operating system, property pages and wizards will appear in mixed languages.

In one of my applications, I chose to derive a class from CPropertySheet and use it for all of my wizards and property sheets. Then I set each button caption to the correct text for the language of the application, not of the operating system. It helps to give a more consistent look when the OS and app language are not the same.

Sorting and Strings

Here is a question for you... If Chinese has no alphabet, how do you sort strings?

Well, I have asked several Chinese, and never gotten a good answer to this question, but I do know this... sorting strings is the bane of many software translation projects, and it is not just a problem with Chinese. Almost any western language makes use of accented characters, characters with hats, umlauts, or funny German double s's. Most of these characters are in the ASCII 128 to 255 range for normal English charsets. Consequently, words with these characters may not get sorted correctly.

The solution is to make sure that all of your string comparison routines make use of the current user's system locale information. The documentation is a little weird, but I believe that CString does not account for the locale setting. You must use the strcoll and related functions. You had best test any sorting algorithms for success with each locale.

Handling Date and Time

Even if your application is only in English, you should display date and time values using the Windows system settings for that user. Just because the user has an English application, does not mean they are on English Windows, and there is nothing worse than trying to work with dates and times that are not formatted the way you like.

I try to use COleDateTime or COleDateTimeSpan whenever possible. Formatting COleDateTime with system settings is very easy.

Keep in mind that Windows allows the user to set system settings for both the long date and the short date. Here is a function to convert a COleDateTime to a string in the long date, set by the user's system settings.

Error Messages

Bad error messages are one of my biggest complaints about software, even when written in English. But who wants to translate a lot of cryptic error messages that the user will probably not see anyway? Well, you can receive many system error messages in the language of the operating system by using the FormatMessage function, passing to it the result of a call to GetLastError. Refer to the SDK on these functions for further information.

How and Why to Build a Unicode App

What are Unicode and MBCS?

Unicode and MBCS are character sets that allow for more than 255 characters, for languages such as Chinese and Japanese. In Unicode, every character is 2 bytes. In MBCS, some characters are one byte, and some are two, and I think some may be more. This makes it very difficult to determine how many bytes are in a string. In short, you want to use Unicode, not MBCS!

When using Unicode, the big thing to be careful of is making sure that you pass the number of characters in a string to functions that require the number of characters, and the size in bytes of a string to functions requiring the size in bytes. With ANSI, these values are the same, but not with Unicode.

How to Build A Unicode App

Building a Unicode app is not really difficult. You will have to install the Unicode MFC libraries with Visual C++. I don't believe these are installed by default.

You need to read the article "Unicode Programming Summary" on MSDN. Search for the term wWinMainCRTStartup. It also explains how to handle strings in Unicode, and type-safe functions like _tcslen instead of strlen.

I wrote a program that searches through source code for non-Unicode safe function calls and writes the info to a log file. It also replaces all string literals such as TRACE("Hello there"); with Unicode-safe string literals such as TRACE(_T("Hello there"));. It is not clean enough to post here, but I hope to someday. If you have a lot of code to convert to Unicode, developing such a utility is worth your effort.

Why to Build a Unicode App

Unicode applications will not run on Windows 95 or 98. They only run on Windows NT and 2000 (actually, an app can use Unicode internally on Win9x, but cannot pass Unicode strings to Windows API calls). The most recent book from Microsoft on internationalization (see the Books section below) recommends that all applications written for Windows 2000 be written in Unicode, no matter what the language.

There is a good reason to make a Unicode version of your English applications available, even if you are not going to localize to Chinese. Suppose you make a graphics application. With Unicode, on Japanese Windows 2000, the user can make the graph titles and text appear in Japanese, even if the application is in English.

Where do I get that Czech version of Windows 98?

You really do need to see and test your final application on Windows of the same language as your translated application. But you can't just go to Wal-Mart to buy Japanese Windows 98, or even order it from MicroWarehouse. It does not seem to be well publicized, but Microsoft offers an MSDN subscription with foreign language versions of all of its operating systems.

You can buy the MSDN subscription through a vendor such as Programmer's Paradise. You basically receive an empty box from them which tells you to call Microsoft to place your order. There are three international packs in MSDN. Last I knew, if you buy Pack 3, you get Packs 1 and 2, so Pack 3 is the one you want. MSDN online has a listing of all the CDs included in each of the international packs. Make sure that you are getting the languages you want. You are well advised to call Microsoft a couple of times to make sure you get the same answer from several different representatives. Even so, they never sent me Russian Windows. After a couple months, I called to find out why. They said I had to call them to request it. It was not sent automatically. But after I called, they sent it for no additional cost. The bottom line, is: make sure you receive all the disks you expect, and call MS if you don't!

How about localized hardware?

MSDN is the source for your Japanese Windows 2000, but what about that Japanese keyboard? This is one of those questions to which I have no good answer. I was able to locate Russian and French keyboards here. Also, last time I talked to Micron, they said they were coming out with computers made for Arabic. As more and more companies are going "e", it should be easier to locate foreign language hardware on the Internet. You are just going to have to look for it.

But does the hardware matter? Testing on localized hardware is probably the least important aspect of the localized testing cycle. However, I understand that Japanese computers have some hardware nuances that may cause problems. Don't take my word for it, do your own research and evaluate the importance of localized hardware testing for your own applications.

What about right-to-left reading languages?

Arabic and Hebrew read right to left, adding another layer of complication for your translated applications. I have never worked with these languages, so I have no further information. I know that there are companies that specialize in Arabic software translation. Search the Internet for more information (coming here was a good start)!

Grammatical Issues with Non-English Languages

The technical issues are not the end of your translation adventure. Most other languages have grammatical complexities that must be considered in translation.

Noun Gender

Almost every application has a "New" command on the File menu. New what? Well, new file, of course. In English, it doesn't matter what the new item is; New is always New. But in Spanish, is it Nuevo or Nueva? Here it depends on what the "what" is. If the "what" is masculine, we want Nuevo. If the "what" is feminine, we want Nueva. In many of my applications, there are "New" buttons on many different dialogs. It is important that the translator understand what the new item is so that the correct gender is used.

Of course, you could just decide to use masculine by default. This is a design decision that you will have to make.

Is it a command or a description?

In English, suppose you have a phrase such as "Open this Document". This could be a command, telling the user what to do. It could also be a description of a menu task, such as that which appears in the status bar when you highlight a recently opened file in the menu. In English, the phrase is the same for both the command and the description. But this may not be true for other languages.

I was recently working with a French translator on translating string resources for a product. My string resources included descriptions of menu items that appear in the status bar when you highlight a menu item. String resources also include captions for my Open and Save common dialog boxes, such as "Select A File to Import". The former is a description of a command, the latter is a command. In English, the same text could be a command or a description of a command, but in French, the text would be different depending on the context. It is important that your translator understands the context of each term.

Just recently, we began implementing a possible solution to this problem. Instead of prefixing all string identifiers with IDS_, we are implementing command identifiers with IDSC_ and description identifiers with IDSD_ as shown below:

#define IDSC_OPEN_TEXT_FILE "Open Text File // this is a command
#define IDSD_OPEN_TEXT_FILE "Open Text File // this is a description

With this naming scheme, the translator can look at the identifier for the string to determine the context in which the string is used.

Is that a noun or a verb?

In English, many nouns and verbs are the exact same word. For example, the verb "to call" will appear as "Call" on a menu item or button. In this case, it is used as a verb. But "Call" is also a noun. In English, they are the same. But in almost any other romance language, the noun and verb are different. Your translator will always have to know the context in which each term is used.

A potential solution to this problem is to use different string identifier prefixes for nouns or verbs as described above for commands and descriptions.

Books

There are only a few books available on software translations. I have three, and I got them all from amazon.com. Fortunately, they are all quite different and there is very little overlap in subject matter. They are:

International Programming for Microsoft Windows by David A. Schmitt, published by Microsoft, 2000.

This one just came out at the time of writing, so I haven't read it yet, but it looks pretty good. It spends a huge amount of time on locales. It also covers Unicode well, and the new localization issues with Windows 2000.

A Practical Guide to Software Localization by Bert Esselink, published by John Benjamins Publishing Company, 1998

This book spends quite a bit of time covering translation tools, translating on-line help and translating documentation. It also covers Macintosh translation, project management and Visual Basic.

The authors came out of Borland when the company was in its prime (OWL used to be preferred over MFC!). This book spends a lot of time discussing how to create your own locale, which most people don't need to do, but which is very useful if you need to do it. It also discusses European and Asian localization specifically, keyboard configurations and Unicode. Don't let the publishing date fool you, the information is still useful.

Finding a Translator

Finding someone to do the translation may be the most difficult aspect of the translation. Not only do you need someone fluent in both languages, but they need to be comfortable with computers as well. If your application is targeted to a specific market, such as the medical industry, you will need a translator who is bilingual in medical terminology as well.

Translation Services: One method is to contract to a translation service. The advantage is that you get experienced and professional translators. The disadvantage is the high cost. There are many companies around the world that offer translation services. Most have web sites that can be located from your favorite search engine.

Independent Translators: Possibly less expensive than a translation company is an independent professional translator. Many can be located on the Internet from web sites that specialize in translation resources, and in translation newsgroups.

The Local University: For low budget translations, you can advertise at a local university. The advantage is that most large universities have many bilingual students and student translators may work cheap. The disadvantage is that the students may be unreliable and will probably give their schoolwork higher priority than your translation.

Keep in mind that many translators may want to charge by the number of words. This may be to your disadvantage because of the high levels of repetition of words and phrases typical of software. Also, use of software translation tools can automate much of the translation of the repetitive words and phrases. This also helps to maintain consistency within the translated application.

Glossaries -- What is Microsoft's French term for "status bar" or "print preview"?

Naturally, you want your French application to have a similar look and feel to Microsoft's French applications (OK, maybe you don't). So you will want to know what Microsoft calls the status bar in French, and the French menu item for Print Preview.

Microsoft is nice enough to provide glossaries of all the words and phrases used in their applications, along with the translated term in many different languages. The files are available for free from Microsoft's Web site (but they are quite large), and they are available on MSDN (international pack at least). What you get are comma-separated value (CSV) files for each application (Word, Excel, Outlook, Visual C++, Windows 98, Windows 2K, IE, etc.) in many different languages. You can easily search these files (using VC++ "Find in Files" search, works great) to see, for example, Microsoft's French term for "Disk I/O Error".

The better translation tools will be able to import these glossaries and use them to assist in translation of terms in your application. Be careful about licensing though. The glossaries are the intellectual property of Microsoft, and you cannot just replace all your text with all of theirs. Be sure to read the licensing agreement.

International Marketing

Just because you can write a Thai version of your slick new calendar program doesn't mean you should! Before you jump and start translating all your applications to other languages, ask yourself how you are going to market them. It may not be as easy as you think.

Before spending a lot of money on translation, be sure to investigate the market for your product in that language, and how you will reach that market.

Summary

Software localization is a hugely complex issue. There are only a few books available on the topic, they have very little overlap of information, and they still don't answer all of the questions you may have.

This article is intended to provide answers to only the most basic questions, to help you decide if you want to localize your product, and to tell you just what is involved. I don't even discuss such monumental topics as locales and code pages. Even so, this is information that I wish was easily available to me when I began my first localization projects.

I hope you find it useful, and I hope to post more detailed information on this subject in the future.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Bob Pittenger is founder and President of Starpoint Software Inc. He holds a B.A. degree from Miami University, M.S. and Ph.D. degrees from Purdue University, and an MBA from Xavier University. He has been programming since 1993, starting with Windows application development in C++/MFC and moving to C# and .NET around 2005 and is a .NET Microsoft Certified Professional Developer.

Bob is the author of two books:
Billionaire: How the Ultra-Rich Built Their Fortunes Through Good and Evil and What You Can Learn from Them
and
Wealthonomics: The Most Important Economic and Financial Concepts that Can Make You Rich Fast.
Visit http://www.billionairebook.net for more information.

Comments and Discussions

In Visual studio, when I create a new project, and set the "Charecter set" to unicode, and the try to change the caption of a button to russian like this:

m_btn1.SetWindowText("ываавыавыаыевраполрлоао");

the button's text is changed correctly.

However, when Iset the "Charecter set" to multibyte, the button's text appears like "???????????????"
I have a huge project which now has to be localized, and it's all written in multibyte!!
Please help!!!

After compiling and executing I have such dialog:
http://www.is.svitonline.com/pan193/misc/incorrect_dialog.PNG[^]
Menu with correct characters - and the body itself without special characters - they are converted to english.
I need language specific characters. So the question is how to fix this? Where is the problem can be?

I have a set of resources like path strings, etc in a separate .rc files which I don’t what to be translated. Let me call this as Shared resources.

On the other hand I have a set of resources that need to be translated in a resource dll project. I have separate resource dll projects for different languages.

I want to include these Shared resources in all the resource dll project’s .rc file. Or simply I need the resulting dll to have both the resources. (Shared Resource + Resources need to be translated from dll project’s .rc file)

The problem is when in Shared .rc file the LANGUAGE is LANG_ENGLISH,SUBLANG_ENGLISH_US and in the dll project’s .rc file if the LANGUAGE is LANG_FRENCH, SUBLANG_FRENCH, these two resources are not merged together. If I just comment the LANGUAGE in Shared.rc this works fine.

But every time I edit the resource of Shared.rc through Visual Studio this comment will be replaced.

what about function who need a special descript string, as specifying extension format or multiple files selection in file selection box?

the file selection need string with NULL caracter separator like ("all files\0*.*\0\0") , but resources cut string with NULL caracter. how to manage this.??
if i can store in resources, i can translate it... thank you in advance

Assuming I want to use AfxMessageBox. How can I force my MFC dialog based application with MFC shared to show French buttons instead of English buttons on an english WinXP OS ?

In fact I want to be sure that my buttons will always look the same (Oui, Non insead of Yes, No, Ja, Nein...).

My problem is that if a make an application which use AfxMessageBoxes then on a French OS it will show Oui, Non buttons, on a German winXP OS it will show Ja, Nein, Abrechen and on an Japaneese version of Win XP, God knows what else it will be shown .

On my Win XP English version I could not possible call an AfxMessageBox with other buttons than English version.
Altough I have succeded to load a french text:
CString strText;
strText.LoadString(AFX_IDS_OPENFILE); // loads "Ouvrir" instead of Open.

but the AfxMessageBoxes still are English versions

I did all the steps mentioned in this article and Microsoft related articles. But I have not succeded with AfxMessageBoxes. Where does the compiler/or OS looks for the texts to be shown for AfxMessageBox buttons?

I appreciate any answers.
Claudiu.

PS: the problem is more complex. AfxMessageBox is just an example. Imagine that I use in my application also CFileDialog or PropertySheets wizard style (with Next, Back, Finish buttons) and many many other MFC inherited controls.
I do my best to care about my resources but when a call a simple AfxMessageBox warning message or I want to Open a file ... I have no control about the language used to show them It will always be in the OS version language no matters what I do.

Hello Caudiu,
I would too to display AfxMessage Box with Oui/Non button in place of Yes/No with English XP system.
Do you have found a solution for that? If yes could you give me it? Is someone know the solving?
Thanks a lot in advance.

Can u please explain why LANGUAGE is 12 .. I have bold the line below ... I know that we want to load French strings and 12 is code of French but what does is this LANGUAGE specify ? .. For tha matter please explain me what each and every line means ... what does each line signinfy ??
I shall be very gr8ful to u !!

I am working on internationalization of an application and tried using some of the functions that I could see as relevant in this context:

1)IsValidLocale() returns always true, with my 2nd argument being LCID_INSTALLED
2)Tried also using the pair EnumUILanguages() and the associated callback required, BUT could not find anywhere an example of what this call back should have inside it….

So the locale identifier specified by the first parameter is a valid locale identifier.

Alex Evans wrote:Tried also using the pair EnumUILanguages() and the associated callback required, BUT could not find anywhere an example of what this call back should have inside it

EnumUILanguages calls the given EnumUILanguagesProc for each language identifier (first parameter of EnumUILanguagesProc). You can either store these identifiers in a list or add them to a combobox. Please note, that EnumUILanguages is not supported in Windows 95/98/Me/NT

"Even if you have to generalize an error message somewhat, it may be worth it to simplify translation."

In my experience this is a very bad practise, as error messages are not just for the end user. You can of course return a user message such as:

"An error occured while saving"

But this doesn't offer the user any chance of fixing the problem. You could of course change this to:

"An error occured while saving"
"Error: 0x802000201"

This way the error can be identified for the programmer/support engineer. An error number can help when supporting customers who speak a different languages as well, but let's face it:

"There was insufficient disk space to save your work."
"The document was NOT saved. Please free up disk space
before retrying"

Such a specific error message, tells both the programmer and user what the specific problem is and how it may be resolved. It should only be displayed when there really isn't enough disk space; i.e. don't display this every time your CSomeDocumentClass::Save() function returns false. Have you CSomeDocumentClass::Save() function return an enum of error codes and choose your message based on that, with no duplicates!!

If you are working in the games industry programming for the PS2 this is mandatory, you cannot have generalised error messages.

Of course specific error messages increase the job of translation, but to not do it properly is something of a cop out and may make supporting/debugging your software much harder!

I also recommend doing rough translations of a group of languages you intend to support as early as possible. Some languages (i.e. German) can be especially difficult to layout and you may want the GUI for your app to either not change (i.e. all dialogs and buttons live in the same space with the same dimensions) or you may not be ABLE to change your GUI. If you don't find out how long the text "Click Here" is, you may find an equivalent string for a target language doesn't fit on your button until too late!!!

Anyway, my 2 peneth on localisation
Ian

p.s. don't put ANY text on graphics, unless you are happy NOT to translate that text

We have developed a MFC application, which is supported in English and Japanese. In Japanese XP the folder names like "My Documents" are displayed in Japanese in the windows Explorer, but in our application when we try to get the path for the same folder, we are getting it in English.

Please can anyone tell us how the windows translating the folder names into Japanese when its path is in English.

HI all,
I have installed japanes language support tool on my system.
After setting a language (say japanes) in windows control panel->regional settings,
How to get the default language setting in vc++ 6.0 program?
Is there any function to get the default language settings?

HI all,
I have installed japanes language support tool on my system.
After setting a language (say japanes) in windows control panel->regional settings,
How to get the default language setting in vc++ 6.0 program?
Is there any function to get the default language settings?

Hello,
I have to make my MFC application work in French and English.
First I made it in French (I am French). Then I read several articles dealing with software translation and, among all, this Microsoft KB article:Q147149 HOWTO: Localize Application Resources with Foundation Classes

Finally, I modified my RC file. At the end this text is written (see end of this message). For the moment I decided to keep one resource file.

For my tests I have a French Windows XP and an English (US) Windows XP.
On French OS, everything is OK.
On English OS, I see several problems.
1. All the "Print Preview" screen is in French.
2. Some of the strings stored in afxres.rc file cannot be found, for example AFX_IDS_OPENFILE (title for the Open File dialog), AFX_IDS_SAVEFILE (title for the Save File dialog) and the application crashes on ASSERTS in MFC code.
3. Another string stored in afxres.rc file appear in French: AFX_IDP_ASK_TO_SAVE ("Save changes to... ?" message for unsaved files).
No problems for menu commands, though.

Did I make a big mistake ? Or perhaps it cannot work this way and I have to do one resource DLL for each language ?
Could you please show me an working example (or a part of it) of a multilanguage application with one or several resource files ?