Managing localization in Rails

Published Jan 13, 2018

In large applications, managing localization files can become a behemoth of a task, especially if you're working with external translation teams who will be providing you with proper translations of YAML files (or XML, Ruby hash, files, etc.).

Most Rails applications that have an international user base will have localization features built in. This means that in your views, rather than having something like this

<h1>Hello world!</h1>

You'll have something like this instead

<h1><%= t('hello_world') %></h1>

This <%= t('key_name') %> method will search for the matching key in your localization file and display the matching string based on your app's locale setting. Most of how to set your app-wide locale can be easily found by Googling. This blog post will cover the lessons I've learned with to manage localization in a more organized way while working on a large Rails application.

Set team wide formatting rules

The first and most important thing when working in a team environment on a large Rails application is to agree upon a formatting rule when adding in new localization texts. There are several reasons for this.

Consistency between naming convention for localization keys
Eliminates potential duplicate strings
Reduces chance for Rails app to fail booting up due to errors in string formatting
Establishes clear set of rules when working with third party vendors
Let's go over these one by one.

#1 - Consistent Naming Conventions

It's not uncommon to see thousands of localization keys in a large Rails application. It may be easy to find duplicate strings when your application is small, but when your localization files contain thousands of strings, it can be difficult to search for the correct key you want to use when building out your HTML pages. Also, not having a team wide naming convention set can lead to duplicate strings, which can cause hard to catch bugs in your views.

Let's say that for example, you have two keys in your YAML file that goes like this:

All three keys will display the text "Welcome back" with a few differences. The keys "welcome_back" and "label_welcome_back" will both display "Welcome back!" while the key "text_welcome_back" will display the same text with an exclamation point.

Let's say you have an application with multiple developers working on it. If you have a localization file like the one above, different developers may start using different keys based on what their "Find" function in their editor will locate first. Depending on what text you're supposed to display, some developers may end up displaying the incorrect version of the text. For example, what if you're not supposed to display the exclamation point in the view, but you end up doing so because you accidentally used "text_welcome_back"?

The issue becomes even more transparent when you have to start changing the actual text in the views. To change the text of the keys, you'll have to go into your localization files, try to figure out which text you're supposed to change, and hope that changing one of the "welcome_back" texts won't negatively affect all the other views that might be incorrectly using that key. For example, if you change the text for "welcome_back" to close a ticket for your specific page, it has a chance of creating a bug for all other pages that weren't supposed to be using that specific "welcome_back" key.

Therefore, it's important to set team wide rules on how to name localization keys. It really doesn't matter how you do it. The key is to set a ground rule and have your team be consistent with it when adding new keys. I've seen it done in different ways. Some teams have duplicate texts, but have different keys based on where they're supposed to use that specific version of the key. For example, "text_welcome_back" may just be a simple text while "label_welcome_back" may only be used as a label on a form. I've seen teams just stick with "welcome_back" and that's the only key for that text that they're allowed to have.

One final important rule that you should set (because I always have trouble keeping this consistent) is whether you should include punctuation in your keys or not, and if you do, how you should name your keys. For example, if you really do need to have "Welcome back!" as the text, would you leave the exclamation point in your localization text or hardcode it into your HTML? If you have it in your localization file, should you name your keys differently like "welcome_text_exclamation"? Personally, I think you should have the exclamation point inside your localization files because certain languages like Spanish will have the upside down exclamation point before the first word. In these cases, it'll make translating your app into other languages much easier since punctuation will already be accounted for in your localization files for languages that treat them differently.

#2 - Eliminates potential duplicate strings

I mentioned one potential issue that may arise from having inconsistent naming conventions where it may become difficult to figure out which locale string you're supposed to be using and/or changing as you code. Another benefit of having consistent naming conventions is that you eliminate duplicate keys/value pairs in your localization files. For example, when I search for a text to use in a localization file, I'll usually just use my text editor's search feature to look for which key I should be using. Sometimes, I'll find two key/value pairs with the same exact texts but different keys. Sometimes I'll find similar texts but with different keys. This always makes me go, "Okay... which one am I supposed to be using?". Not having any duplicate keys and texts in your localization files reduces this unnecessary mental road block.

#3 - Reduces chance for Rails app to fail booting due to errors in formatting

If your localization files have improperly formatted strings, your Rails app will fail to boot. Worst of all, the logs will not tell you exactly which of your localization file is causing the issue nor will it tell you which line. This is incredibly frustrating to debug and always has me performing a manual binary search (temporarily delete first half of a YAML file, see if app boots up, if not repeat) through a large localization file(s) to figure out which key/value pair is causing the issue. And yes, sometimes it's literally one key/value pair that's blocking the entire app from booting.

The cause for improper string formatting can vary. Sometimes, it can even be caused by inconsistent string formatting between various key/value pairs. Therefore, it's much more sane to keep a consistent rule when it comes to formatting the string values for your texts. I like to follow this rule when it comes to Rails apps.

Always wrap your strings in double quotes
You'll see that some values for certain locale keys won't have any sort of quotation surrounding it. While this may work, if some of your texts require double quotes, you may run into parsing errors in subsequent key/value pairs that follow.
If you wrap your strings in double quotes, you can easily add in single quotes inside your double quotes without any issues.
Double quotes allow you to easily interpolate variables into your locale calls
If your text requires to display double quotes, then simply escape the double quotes with a backspace.
For example, if your text requires you to display Hi "Benjamin", then do something like this hi_benjamin: "Hi "Benjamin"" will allow you to display your double quotes and allow you to go to the next line in your localization files without any parsing issue.
If you follow these two rules consistently, you will have zero (or very few) issues when it comes to experiencing boot up errors in your Rails apps. In addition, having these set of rules will make working with third party translation vendors much easier.

#4 - Establishes clear set of rules when working with vendors

As mentioned in the last point, if you're working with professional third party translation vendors, having a clear set of formatting rules will make your life 10 times easier. If you don't set some sort of ground rules with your translation vendor, they may send you huge YAML files with inconsistent formatting rules, causing your Rails app to crash. However, if your team already has a clearly defined set of rules, you'll have a more smooth experience working with third party vendors.

Autogenerated translations

It is entirely possible to have your app translated into different languages automatically via services like Google Translate and Yandex. Sounds magical? Well, there are advantages and disadvantage to this method.

Advantages:

Fast and easy way to have your app translated into multiple languages
Cheaper than hiring real human translators
You can write a rake task to auto translate any new strings you have added into the languages that are made available in your app
The way this works is that you'll write a script that you run as a rake task that will grab all key/values of your locale texts, loop through them, send a request to a translation API (Google Translate, Yandex, etc.), and build up a new YAML file for the language of your choice. This sounds great until you see the results which is in the disadvantages list.

Disadvantages:

The translation is almost always terrible
Yep, the resulting translation usually is terrible, especially if it's a language that's difficult to translate into. For example, machine translation from English to Spanish may work "ok", but from English to something like Chinese probably will not. This is where working with professional translation services come in if your app requires proper translations.

Working with external vendors

If you want your app to read well (or even make sense) in other languages, you'll have to bring in a professional translation vendor. Most of the time, what you'll be doing is handing over your main YAML (or XML, Ruby hash file, or etc.) to them and then they'll send you over the YAML files in the languages you requested. You'll then take these YAML files and then merge them back into your codebase.

I've already mentioned that you should set text formatting rules with your vendor so that you don't have any problems with your app booting up. Assuming you have this part taken care of, I'll list a few things that can make your life working with vendors much easier.

Setting up a workflow to receive your translated files

Falling back on your autogenerated translation keys if there's no match

#1 - Setting up a workflow for your translation files

There are multiple ways that you can work with translation vendors, but the way I've worked with them involved me providing the English YAML files (usually named "en.yml") and then having the translation vendor send the translated files back to me (includes something like "ko.yml", "es.yml", "fr.yml" and more). I would then take these files and merge them back into the codebase, and then start the local Rails server to check that the app boots up properly.

You can utilize different ways of managing the sending and receiving of the YAML files. I'm sure platforms like Dropbox and Google Drive work fine (heck, I'm sure you can even email them back and forth, although I wouldn't recommend it since managing that would get confusing real quick), but I found that using Github repos work even better. With Github repos, you can easily merge files back and forth between your main code repository and your "locale files" repository, and use git to resolve any differences.

With Github repos, anytime there were changes in the main YAML file (in my case, English), I would sync that file into the locale files repository on Github. The translation vendor would then pull down the latest "en.yml" file, translate the texts, and then push the latest set of YAML files in different languages to the repository. I would then pull down the latest YAML files, and then merge them back into the app's codebase.

Of course, there are different software solutions out there to manage this process, but I found using git to be the most simple and economical to manage localization files when working with translation vendors.

#2 - Falling back on autogenerated translation keys

Sometimes, you might run into issues where you're displaying a text that exists in your default YAML file (en.yml for me) but that key doesn't exist in the YAML files provided by your translation vendor. This can happen since the translation vendors are humans at the end of the day, they might miss translating some texts. If this happens, it's possible to "fall back" on your autogenerated locales if you happened to have those lying around.

To do this, simply have two sets of translation files for each languages. First set is the custom ones from the translation vendor and the second set is the autogenerated ones that you built with making API calls to services like Google Translate.

You want to then organize your translations files in your config/locales folder.

I like to put all of the autogenerated localization files under config/locales/auto_generated and then the custom translated localization files under config/locales/custom folder.

What this will do, especially the config.load_path += line, is that it'll load up all of the localization files in your locales folder, and then sort them alphabetically. And since the word "custom" comes after "auto_generated", when Rails looks up the key/value pairs of locales, it'll first look for the locale key in the auto_generated one and then in the custom one. And since the custom translation comes after the auto_generated one, the custom translation will be utilized rather than the auto_generated one. And if for some reason the custom translation does not exist, the auto_generated translation will be utilized instead since it has already been loaded by Rails.

This way, you can always have a fallback option in case there are missing locale keys in your custom localization files.

Writing tests

When working with translation vendors, there is a chance that the localization files that you get back from them will have typos. The typo I've seen most often have been that the language key that's on the top of the YAML file would be a different one.

For example, here's a typical YAML file that would represent a Korean translation.

---
ko:
hello: "안녕햐세요"

See that "ko" definition? Rather than that, sometimes I would get back files with some other language key instead, like "es" which is usually used for "Spanish". I've had times where I would merge this typo into the codebase, deploy and then have all the Korean users have their version of the app displayed in Spanish.

This is one of those typos that's easy to miss and can cause all sorts of annoyance for your international users. Thus, I like to write tests that check for the integrity of the YAML files before you merge and deploy your app. Here's are some typical tests I like to write to check for the integrity of the YAML files.

All this test does is it loads up the individual YAML files, and then check for the integrity of the key of that YAML file. For example, it'll load up a French YAML file, and then make sure that the key definition of that file is set to "fr", so that French users won't end up accidentally seeing Russian (it's happened once...). This is just an example of a test you can write to check for the integrity of your localization files. I'm sure that there are different types of tests that you can write depending on your situation.

Final Notes

I wrote in the first sentence of this post that managing localization in large apps can be a behemoth of a task. It is if you don't have a process set up for managing it. However, if you set up a organization wide process for managing it in a systematic way, it becomes one of those 5 minute tasks that's painless to do. I'm hoping this post can be helpful for those with apps that need to support multiple languages.