Category Archives: l10n

Post navigation

In case you don’t know what Fluent is, it’s a localization system designed and developed by Mozilla to overcome the limitations of the existing localization technologies. If you have been around Mozilla Localization for a while, and you’re wondering what happened to L20n, you can read this explanation about the relation between these two projects.

With Firefox 58 we started moving Firefox Preferences to Fluent, and today we’re migrating the last pane (Firefox Account – Sync) in Firefox Nightly (61). The work is not done yet, there are still edge cases to migrate in the existing panes, and subdialogs, but we’re on track. If you’re interested in the details, you can read the full journey in two blog posts from Zibi (2017 and 2018), covering not only Fluent, but also the huge amount of work done on the Gecko platform to improve multilingual support.

At this point, you might be wondering: do we really need another localization system? What’s wrong with what we have?

The truth is that there is a lot wrong with the current systems. In Gecko alone, we support 4 different file formats to localize content: .dtd, .properties, .inc, .inc. And since none of them support plural forms, we built hacks on top of .properties to support pluralization.

Here are a few practical examples of why Fluent is a huge improvement over existing technologies, and will allow us to improve the quality of the localizations we ship.

DTDs and Concatenations

You want to localize this simple fragment of XUL code without using JavaScript.

This turns into 2 separate strings in a DTD file, and a long localization comment:

<!-- LOCALIZATION NOTE (signedInLoginFailure.beforename.label, signedInLoginFailure.aftername.label): these two string are used respectively before and after the account email address. Localizers can use one of them, or both, to better adapt this sentence to their language. --><!ENTITY signedInLoginFailure.beforename.label "Please sign in to reconnect"><!ENTITY signedInLoginFailure.aftername.label "">

<!-- LOCALIZATION NOTE (signedInLoginFailure.beforename.label,
signedInLoginFailure.aftername.label):
these two string are used respectively before and after the account
email address. Localizers can use one of them, or both, to better
adapt this sentence to their language. -->
<!ENTITY signedInLoginFailure.beforename.label "Please sign in to reconnect">
<!ENTITY signedInLoginFailure.aftername.label "">

Why the empty string at the end? Because, while English doesn’t need it, other languages might need to change the structure of the sentence, adding content after the email address. On top of that, some localization tools don’t support empty strings correctly, not allowing localizers to mark an empty translation as a “translated” string.

In Fluent, this is simply:

sync-signedin-login-failure= Please sign in to reconnect { $email }

sync-signedin-login-failure = Please sign in to reconnect { $email }

One single string, full visibility on the context, flexibility to move around the email address.

Plural Forms

Plural forms are supported in Gecko only for .properties files. Fluent supports plural forms natively, and with a lot of additional flexibility.

First of all, if you’re not familiar with the complexity of plurals across languages (limiting the discussion to cardinal integer numbers):

English, like many other European languages, only has 2 plural forms: n=1 uses one form (“1 page”), all other numbers (n!=1) use a different form (“2 pages”). Sadly, this makes a lot of people think about plural in terms of “1 vs many”, while that’s not really the case for most languages.

French still has 2 plural forms, but uses the same form for both 0 and 1.

Other languages can only have one form (e.g. Chinese), or have up to 6 different plural forms (e.g. Arabic). Fluent uses the CLDR categories (zero, one, two, few, many, other) to match a number to the correct plural form. For example, in Russian 1 and 21 will use the form “one”, but 11 will use “other”.

The behavior might change if the actual number is present or not. For example, Turkic languages don’t need to pluralize a noun after a number (“1 page”), but need plural forms in sentences referencing to one or more elements (“this” vs “them”).

Consider for example this use case: in Firefox, the button to set the home page changes from “Use Current Page” to “Use Current Pages”, depending on the number of open tabs.

If you want to use a proper plural form, you need to add the number of tabs to the string. In .properties, it would look like this (plural forms are separated by a semicolon):

use-current-pages= Use Current Page;Use Current #1 Pages

use-current-pages = Use Current Page;Use Current #1 Pages

This will force languages to create all plural forms for their locales, even if they might not be needed. If your language has 6 forms, you need to provide all 6 forms, even if they’re all identical. Fun, isn’t it? Note that this is not just a limitation of the plural system used in .properties, the same happens in GetText (.po files).

Here’s how Fluent improve things: first, you don’t need to add all plural forms, you can rely on the fall back to the default value (indicated by *), without raising any error:

Variants

This is one of the most exciting changes introduced to the localization paradigm.

Consider this example: “Firefox Account” is a special brand within Firefox. While “Firefox” itself should not be localized or declined, “account” can be localized and moved. In Italian it’s “Account Firefox”, “Cuenta de Firefox” in Spanish.

A special entity is defined in order to be reused in other strings:

<!ENTITY syncBrand.fxAccount.label "Firefox Account">

<!ENTITY syncBrand.fxAccount.label "Firefox Account">

For example:

<!ENTITY signedOut.accountBox.title "Connect with a &syncBrand.fxAccount.label;">

<!ENTITY signedOut.accountBox.title "Connect with a &syncBrand.fxAccount.label;">

In Italian this results in “Connetti &syncBrand.fxAccount.label;”. It’s not natural, and it looks wrong, because we don’t capitalize nouns in the middle of a sentence.

My only option to improve the translation, and make it sound more natural, would have been to drop the entity and just add the translated name. That defies the entire concept of having a central definition for the brand.

Here’s what I can do in Fluent. The brand is defined as a term, a special type of message that can only be referenced from other strings (not code), and can have additional attributes.

While uppercase vs lowercase is a trivial example, variants can have a much deeper impact on localization quality for complex languages that use declensions, where the word “account” changes based on its role within the sentence (nominative, accusative, etc.).

This is only the tip of the iceberg, there’s more you can do with Fluent, and the new localization API will allow us to drastically improve the experience for non English users in Firefox. Here are some additional links if you want to learn more about Fluent:

If you never noticed the menu item in this blog, I’m the developer of a small add-on for Firefox called BBCodeXtra: it’s an extension, started about 10 years ago, that makes posting on forums and other places (e.g. GitHub) a little less painful.

For the first time in years, I’m going to release a new version that includes new features, and therefore new strings. While I obviously love localization and localizers, I don’t intend to work with localization platforms for an add-on with a very limited set of strings and infrequent updates.

The new version (v0.5.0) will be released with all the existing languages enabled, but the new strings will be left in English (excluding Italian). Starting from the next version, I will drop locales that go below 60% unchanged strings (currently it’s about 70% for all locales).

If you want to receive an email before the next release, in case I have to add new strings and you want to localize them before release, send me an email at flod(at)lodolo(dot).net and I’ll organize a mailing list.

“Subtitles” assume the viewer can hear but cannot understand the language or accent, or the speech is not entirely clear, so they only transcribe dialogue and some on-screen text. “Captions” aim to describe to the deaf and hard of hearing all significant audio content—spoken dialogue and non-speech information such as the identity of speakers and, occasionally, their manner of speaking – along with any significant music or sound effects using words or symbols.

Sample and Reference

I’m using mozilla-beta as a reference, and comparing each locale against en-GB. Why not en-US? The reason is simple: en-US strings are scattered across the entire mozilla-central repository, so I should do tricks like Transvision in order to create a pseudo en-US string-only repository. Using en-GB leads to less precise results (see below), but for the sake of this analysis I considered it an acceptable compromise.

I’m not checking all folders, only the main ones (‘browser’, ‘dom’, ‘mail’, ‘mobile’, ‘netwerk’, ‘security’, ‘services’, ‘suite’, ‘toolkit’, ‘webapprt’). This still generates an archive of almost 18,000 strings for locales translating all products, so it seems a decent sample.

Not sure if this is the best choice, but I couldn’t think of an alternative. Note also that I’m ignoring single character strings (access keys, shortcuts).

In the table you’ll see a global column (average results) and “buckets”, with string groups based on en-GB original length. Too bad these groups are often unreliable because of the “concatenation conundrum”, where one string could be created by concatenating 3 different labels.

Typical example to create a sentence with a link (note that concatenation should be always avoided):

Nine months ago I wrote this post. Are things better now? Not at all, they keep getting worse.

When people asked me “how can you be happy with the rapid release cycle?”, I always answered “because finally I have a clear schedule”. Now imagine how I feel about the rapid release cycle.

I’m not a developer, I’m not an engineer either, but guess what: if you’re breaking things every single cycle, you’re doing it wrong. I think it would be a good time to start thinking about it, maybe before localizers start giving up.

My personal short memo for the next meeting, even if I’m sure Axel is already on this:

Aurora is supposed to be string frozen, so that localizers have a full cycle to update their localization, test their work and sign-off the best changeset available for Beta. This worked quite well for 5 releases, why did everything go wrong this time? We’re just a couple of days away from the end of this cycle (Firefox 10 release, Jan 30th), a backout on toolkit broke everything1 and then a bug on devtools added even more confusion.

Being a Mozilla localizer already requires an awful amount of technical skills, please don’t even think of adding more stuff on top of that (“why can’t we or localizers just retrieve the previous string from hg blame?”).

Working on two different repositories is painful (see Native Fennec), I realized that I can’t transplant changesets around because often they change more strings that I need, so I have to move text around manually. I’m scared of seeing what will happen when I’ll merge my work from central to aurora.

1 Thanks to our l10n logic this is not literally true, since products fall back to the English string. From my point of view, this still means “breaking things”: exposing a partial translated UI means lowering the quality of our work, and that’s not something I like to do.

I’m aware that l10n can be a nuisance for a lot of developers – and some localizers (e.g. me) can be a real pain in the *** – but when reviewing a patch that involve a change of existing strings you only have a short and quick checklist to follow:

Does the patch fix a typo or does it make a substantial change to the string? In the first case just fixing the string is fine, in the latter case you need to change the string ID, since not all localization tools (or localizers who simply use a text editor) can catch this kind of change.

Are you changing a string ID? Always check if there’s an associated access key and maintain the relation STRINGID.label <-> STRINGID.accesskey (again, localization tools rely on this kind of structure).

Once in a while a mistake can happen, but three times in a few days seems a bit out of average 😉

This is the dialog window that appears when you try to run a Java Applet on Mac OS X 10.5.7 with the last Java update (I’m running Java 1.5.0_19 according to this test).

Take a look at the checkbox:

In Italian it’s “l’accesso” (definite article+noun), not “laccesso”. The same error appears in the first label, so I suppose they have some difficulties dealing with apostrophes. This problem was already there before the Java update.

Applet’s name and author are gone, replaced by {0} and {1} (this started with the last Java update).

As usual, before the final release we’re doing a lot of QA work on our localized Firefox builds, and this includes a careful check on accesskeys. There are two different issues with accesskeys:

use of a character not available in the label. For example: using “F” as accesskey for “Shiretoko” creates a label “Shiretoko (F)”. This can easily happen if you update the label and forget to correct the corresponding accesskey.

duplicated accesskeys (two or more labels with the same scope share the same accesskey).

In the last 24 hours we found two duplicated accesskeys in the Italian build: the first one is quite hidden (you have to check for updates in the Extension manager and then click on the “More information” button), while the second one is located in the main window (Toolbar search). This last issue affects the en-US build (see bug 498840) and probably also other locales.

I think that we should really start to think about accesskeys and how to introduce automated tests.

The first step should be to create a standard naming convention (it’s not even mandatory, but it would make things easier): right now you can find accesskeys named like “label_accesskey”, “labelaccesskey” or “label.accesskey”. At this point, checking for external characters shouldn’t be a problem.

The real challenge would be to find accesskeys conflicts – using different tables to store all the accesskeys with the same scope – in particular in pop-up menus. Have you ever tried to select different parts of a web page (create a selection with images, links, images with links, text, etc.) and check how the context menu change? Doing this kind of checks manually is simply crazy 😉