Categories

Morfik Links

Previewing Morfik’s support for internationalization – Part 2

This is the second part of the preview of Internationalization Support, and today we are going to take a look at database localization.

Localizing database-driven content is not an easy task, and there is not much choice when it comes to approaches that can be taken. Consider the simple Blog application which uses a very basic table with only two fields: PostDate and Content.

Let’s quickly go through the existing methods of localizing that table:

Using different columns

In this method, a separate field is used to store a localized version of the string in the same row. For example, if we decide to provide a Spanish version of our blog, we need to add another column Content_es.

This method is relatively simple to implement. The biggest issue with it is that you will have to restructure your table every time you add a new language. You will also have to be careful not to exceed the maximum row size limit that many databases have.

Using different rows

To allow adding languages without the need to restructure your database you can consider using a language identifier as a part of a key field. You will still need to add an additional field to your table, but it is done only once. In our example, the table will now have three fields: Language, PostDate, and Content.

While it avoids the need for restructuring, this method is far from perfect. The content of the row that is not localizable gets duplicated for every language. The implementation is not as simple: all the queries need to be updated to include additional filtering on language, falling back to the default value in case the translation isn’t found might require another query, and maintaining referential integrity or any kind of relationship between tables could become problematic if you have to duplicate the rows. And if the data in the table is being updated, you will have to take care of keeping multiple rows in sync.

Using different tables

Yet another approach is to use different tables for multiple languages. The benefit of this method is that no structure changes are required. Effectively, every localized table is just a clone of the original one. Using different tables does have its share of issues similar to those existing with the previous method. Also, any changes to the table structure will have to be applied to every clone.

Using different databases

The most extreme method is to have different databases altogether. The code changes are minimal, since it all becomes just a matter of connecting to the right database. However, maintaining multiple database connections could cause some performance issues, and data duplication is at its maximum with this approach. Synchronizing the changes becomes really problematic, and database maintenance in general becomes complicated. All in all this approach is not very practical.

Dictionary-based translation

While in some cases one of the above methods can be used, we were not happy with offering any of these as a general purpose solution. What we have come up with is an approach that is somewhat similar to the one used in GNU gettext library.

The idea behind it is quite simple – the original value of the string can be used as a lookup inside the dictionary.

Here is what you have to do to have your application support localized text retrieved from a database – nothing! That’s right, with this approach you don’t have to make any changes to your code, nor to your database. All the work required to perform the localization is done at the framework level, thus freeing you from having to worry about anything.

To localize the database you will still need to provide a translator with a file including all the original text. The process of exporting database content will be made trivial with the upcoming Import/Export package, but that’s a topic for another blog. Once you get the file back, you place it in the same location as your design and code translation files – there is no need to update the database.

Automatic translation is possible, and is quite easy to implement using services like http://code.google.com/apis/ajaxlanguage/
It’s not something that can be recommended for everyone, since the quality of translation is not always perfect.

As for your question about updating the article description in several languages, the answer is yes. In your case when the new record is added, and the text in both languages is provided, the dictionary will simply get updated with new EngText=RusText record.

Great exploration. As far I understood Morfik will use Dictionary-based translation method.
Right now I use different columns method.
I never used Dictionary-based translation method, so I got any questions:
My customer want to sell his products for several countries. So, several languages should be supported.
When he fills English data for current article, he wants internally switching to Russian and fill corespondent data for the same article.
Will this can be done by Morfik method?
Another question – will be possible automatic translation when dictionary method is used? /I suppose “yes”/
And the last question – Have your team plans for some beta release? If yes, we can give some feedback there.
Best Regards,
Ivailo