Internationalization Introduction

Internationalization is the process of designing an application so that it can be adapted to various languages and regions without engineering changes. Sometimes the term internationalization is abbreviated as i18n, because there are 18 letters between the first "i" and the last "n."

An internationalized program has the following characteristics:

◈ With the addition of localized data, the same executable can run worldwide.
◈ Textual elements, such as status messages and the GUI component labels, are not hardcoded in the program. Instead they are stored outside the source code and retrieved dynamically.
◈ Support for new languages does not require recompilation.
◈ Culturally-dependent data, such as dates and currencies, appear in formats that conform to the end user's region and language.
◈ It can be localized quickly.

Localization is the process of adapting software for a specific region or language by adding locale-specific components and translating text. The term localization is often abbreviated as l10n, because there are 10 letters between the "l" and the "n."

The primary task of localization is translating the user interface elements and documentation. Localization involves not only changing the language interaction, but also other relevant changes such as display of numbers, dates, currency, and so on. Other types of data, such as sounds and images, may require localization if they are culturally sensitive. The better internationalized an application is, the easier it is to localize it for a particular language and character encoding scheme.

Internationalization may seem a bit daunting at first. Reading the following sections will help ease you into the subject.

A Quick Example

If you're new to internationalizing software, this lesson is for you. This lesson uses a simple example to demonstrate how to internationalize a program so that it displays text messages in the appropriate language. You'll learn how Locale and ResourceBundle objects work together and how to use properties files.

1. Before Internationalization

Suppose that you've written a program that displays three messages, as follows:

You've decided that this program needs to display these same messages for people living in France and Germany. Unfortunately your programming staff is not multilingual, so you'll need help translating the messages into French and German. Since the translators aren't programmers, you'll have to move the messages out of the source code and into text files that the translators can edit. Also, the program must be flexible enough so that it can display the messages in other languages, but right now no one knows what those languages will be.

It looks like the program needs to be internationalized.

2. After Internationalization

The source code for the internationalized program follows. Notice that the text of the messages is not hardcoded.

3. Running the Sample Program

The internationalized program is flexible; it allows the end user to specify a language and a country on the command line. In the following example the language code is fr (French) and the country code is FR (France), so the program displays the messages in French:

% java I18NSample fr FR
Bonjour.
Comment allez-vous?
Au revoir.

In the next example the language code is en (English) and the country code is US (United States) so the program displays the messages in English:

% java I18NSample en US
Hello.
How are you?
Goodbye.

4. Internationalizing the Sample Program

If you look at the internationalized source code, you'll notice that the hardcoded English messages have been removed. Because the messages are no longer hardcoded and because the language code is specified at run time, the same executable can be distributed worldwide. No recompilation is required for localization. The program has been internationalized.

You may be wondering what happened to the text of the messages or what the language and country codes mean. Don't worry. You'll learn about these concepts as you step through the process of internationalizing the sample program.

1. Create the Properties Files

A properties file stores information about the characteristics of a program or environment. A properties file is in plain-text format. You can create the file with just about any text editor.

In the example the properties files store the translatable text of the messages to be displayed. Before the program was internationalized, the English version of this text was hardcoded in the System.out.println statements. The default properties file, which is called MessagesBundle.properties, contains the following lines:

greetings = Hello
farewell = Goodbye
inquiry = How are you?

Now that the messages are in a properties file, they can be translated into various languages. No changes to the source code are required. The French translator has created a properties file called MessagesBundle_fr_FR.properties, which contains these lines:

Notice that the values to the right side of the equal sign have been translated but that the keys on the left side have not been changed. These keys must not change, because they will be referenced when your program fetches the translated text.

The name of the properties file is important. For example, the name of the MessagesBundle_fr_FR.properties file contains the fr language code and the FR country code. These codes are also used when creating a Locale object.

2. Define the Locale

The Locale object identifies a particular language and country. The following statement defines a Locale for which the language is English and the country is the United States:

aLocale = new Locale("en","US");.

The next example creates Locale objects for the French language in Canada and in France:

caLocale = new Locale("fr","CA");
frLocale = new Locale("fr","FR");

The program is flexible. Instead of using hardcoded language and country codes, the program gets them from the command line at run time:

Locale objects are only identifiers. After defining a Locale, you pass it to other objects that perform useful tasks, such as formatting dates and numbers. These objects are locale-sensitive because their behavior varies according to Locale. A ResourceBundle is an example of a locale-sensitive object.

3. Create a ResourceBundle

ResourceBundle objects contain locale-specific objects. You use ResourceBundle objects to isolate locale-sensitive data, such as translatable text. In the sample program the ResourceBundle is backed by the properties files that contain the message text we want to display.

The ResourceBundle is created as follows:

messages = ResourceBundle.getBundle("MessagesBundle", currentLocale);

The arguments passed to the getBundle method identify which properties file will be accessed. The first argument, MessagesBundle, refers to this family of properties files:

The Locale, which is the second argument of getBundle, specifies which of the MessagesBundle files is chosen. When the Locale was created, the language code and the country code were passed to its constructor. Note that the language and country codes follow MessagesBundle in the names of the properties files.

Now all you have to do is get the translated messages from the ResourceBundle.

4. Fetch the Text from the ResourceBundle

The properties files contain key-value pairs. The values consist of the translated text that the program will display. You specify the keys when fetching the translated messages from the ResourceBundle with the getString method. For example, to retrieve the message identified by the greetings key, you invoke getString as follows:

String msg1 = messages.getString("greetings");

The sample program uses the key greetings because it reflects the content of the message, but it could have used another String, such as s1 or msg1. Just remember that the key is hardcoded in the program and it must be present in the properties files. If your translators accidentally modify the keys in the properties files, getString won't be able to find the messages.

Checklist

Many programs are not internationalized when first written. These programs may have started as prototypes, or perhaps they were not intended for international distribution. If you must internationalize an existing program, take the following steps:

Identify Culturally Dependent Data

Text messages are the most obvious form of data that varies with culture. However, other types of data may vary with region or language. The following list contains examples of culturally dependent data:

Translation is costly. You can help reduce costs by isolating the text that must be translated in ResourceBundle objects. Translatable text includes status messages, error messages, log file entries, and GUI component labels. This text is hardcoded into programs that haven't been internationalized. You need to locate all occurrences of hardcoded text that is displayed to end users. For example, you should clean up code like this:

Compound messages contain variable data. In the message "The disk contains 1100 files." the integer 1100 may vary. This message is difficult to translate because the position of the integer in the sentence is not the same in all languages. The following message is not translatable, because the order of the sentence elements is hardcoded by concatenation:

Whenever possible, you should avoid constructing compound messages, because they are difficult to translate. However, if your application requires compound messages, you can handle them with the techniques described in the section Messages.

Format Numbers and Currencies

If your application displays numbers and currencies, you must format them in a locale-independent manner. The following code is not yet internationalized, because it will not display the number correctly in all countries:

You should replace the preceding code with a routine that formats the number correctly. The Java programming language provides several classes that format numbers and currencies. These classes are discussed in the section Numbers and Currencies.

Format Dates and Times

Date and time formats differ with region and language. If your code contains statements like the following, you need to change it:

Watch out for code like this, because it won't work with languages other than English. For example, the if statement misses the character ü in the German word Grün.

The Character comparison methods use the Unicode standard to identify character properties. Thus you should replace the previous code with the following:

char ch;
// ...
if (Character.isLetter(ch))

Compare Strings Properly

When sorting text you often compare strings. If the text is displayed, you shouldn't use the comparison methods of the String class. A program that hasn't been internationalized might compare strings as follows:

The String.equals and String.compareTo methods perform binary comparisons, which are ineffective when sorting in most languages. Instead you should use the Collator class, which is described in the section Comparing Strings.

Convert Non-Unicode Text

Characters in the Java programming language are encoded in Unicode. If your application handles non-Unicode text, you might need to translate it into Unicode.