Context Navigation

Internationalization and Localization

Genshi provides comprehensive supporting infrastructure for internationalizing
and localizing templates. That includes functionality for extracting
localizable strings from templates, as well as a template filter and special
directives that can apply translations to templates as they get rendered.

This support is based on gettext message catalogs and the gettext Python
module. The extraction process can be used from the API level, or through
the front-ends implemented by the Babel project, for which Genshi provides
a plugin.

The simplest way to internationalize and translate templates would be to wrap
all localizable strings in a gettext() function call (which is often
aliased to _() for brevity). In that case, no extra template filter is
required.

<p>${_("Hello, world!")}</p>

However, this approach results in significant “character noise” in templates,
making them harder to read and preview.

The genshi.filters.Translator filter allows you to get rid of the
explicit gettext function calls, so you can (often) just continue to write:

<p>Hello, world!</p>

This text will still be extracted and translated as if you had wrapped it in a
_() call.

Note

For parameterized or pluralizable messages, you need to use the
special template directives described below, or use the
corresponding gettext function in embedded Python expressions.

You can control which tags should be ignored by this process; for example, it
doesn't really make sense to translate the content of the HTML
<script></script> element. Both <script> and <style> are excluded
by default.

Attribute values can also be automatically translated. The default is to
consider the attributes abbr, alt, label, prompt, standby,
summary, and title, which is a list that makes sense for HTML
documents. Of course, you can tell the translator to use a different set of
attribute names, or none at all.

Sometimes localizable strings in templates may contain dynamic parameters, or
they may depend on the numeric value of some variable to choose a proper
plural form. Sometimes the strings contain embedded markup, such as tags for
emphasis or hyperlinks, and you don't want to rely on the people doing the
translations to know the syntax and escaping rules of HTML and XML.

In those cases the simple text extraction and translation process described
above is not sufficient. You could just use gettext API functions in
embedded Python expressions for parameters and pluralization, but that does
not help when messages contain embedded markup. Genshi provides special
template directives for internationalization that attempt to provide a
comprehensive solution for this problem space.

To enable these directives, you'll need to register them with the templates
they are used in. You can do this by adding them manually via the
Template.add_directives(namespace, factory) (where namespace would be
“http://genshi.edgewall.org/i18n” and factory would be an instance of the
Translator class). Or you can just call the Translator.setup(template)
class method, which both registers the directives and adds the translation
filter.

After the directives have been registered with the template engine on the
Python side of your application, you need to declare the corresponding
directive namespace in all markup templates that use them. For example:

These directives only make sense in the context of markup templates. For
text templates, you can just use the corresponding gettext API calls as needed.

Note

The internationalization directives are still somewhat experimental
and have some known issues. However, the attribute language they
implement should be stable and is not subject to change
substantially in future versions.

2.1.1 i18n:msg

This is the basic directive for defining localizable text passages that
contain parameters and/or markup.

For example, consider the following template snippet:

<p>
Please visit <ahref="${site.url}">${site.name}</a> for help.
</p>

Without further annotation, the translation filter would treat this sentence
as two separate messages (“Please visit” and “for help”), and the translator
would have no control over the position of the link in the sentence.

However, when you use the Genshi internationalization directives, you simply
add an i18n:msg attribute to the enclosing <p> element:

Genshi is then able to identify the text in the <p> element as a single
message for translation purposes. You'll see the following string in your
message catalog:

Please visit [1:%(name)s] for help.

The <a> element with its attribute has been replaced by a part in square
brackets, which does not include the tag name or the attributes of the element.

The value of the i18n:msg attribute is a comma-separated list of parameter
names, which serve as simplified aliases for the actual Python expressions the
message contains. The order of the paramer names in the list must correspond
to the order of the expressions in the text. In this example, there is only
one parameter: its alias for translation is “name”, while the corresponding
expression is ${site.name}.

The translator now has complete control over the structure of the sentence. He
or she certainly does need to make sure that any bracketed parts are not
removed, and that the name parameter is preserved correctly. But those are
things that can be easily checked by validating the message catalogs. The
important thing is that the translator can change the sentence structure, and
has no way to break the application by forgetting to close a tag, for example.

So if the German translator of this snippet decided to translate it to:

Note how the translation has changed the order and even the nesting of the
tags.

Warning

Please note that i18n:msg directives do not support other
nested directives. Directives commonly change the structure of
the generated markup dynamically, which often would result in the
structure of the text changing, thus making translation as a
single message ineffective.

2.1.2 i18n:choose, i18n:singular, i18n:plural

Translatable strings that vary based on some number of objects, such as “You
have 1 new message” or “You have 3 new messages”, present their own challenge,
in particular when you consider that different languages have different rules
for pluralization. For example, while English and most western languages have
two plural forms (one for n=1 and an other for n<>1), Welsh has five
different plural forms, while Hungarian only has one.

The gettext framework has long supported this via the ngettext()
family of functions. You specify two default messages, one singular and one
plural, and the number of items. The translations however may contain any
number of plural forms for the message, depending on how many are commonly
used in the language. ngettext will choose the correct plural form of the
translated message based on the specified number of items.

Genshi provides a variant of the i18n:msg directive described above that
allows choosing the proper plural form based on the numeric value of a given
variable. The pluralization support is implemented in a set of three
directives that must be used together: i18n:choose, i18n:singular, and
i18n:plural.

The i18n:choose directive is used to set up the context of the message: it
simply wraps the singular and plural variants.

The value of this directive is split into two parts: the first is the
numeral, a Python expression that evaluates to a number to determine which
plural form should be chosen. The second part, separated by a semicolon, lists
the parameter names. This part is equivalent to the value of the i18n:msg
directive.

For example:

<pi18n:choose="len(messages); num"><i18n:singular>You have <b>${len(messages)}</b> new message.</i18n:singular><i18n:plural>You have <b>${len(messages)}</b> new messages.</i18n:plural></p>

All three directives can be used either as elements or attribute. So the above
example could also be written as follows:

<i18n:choosenumeral="len(messages)"params="num"><pi18n:singular="">You have <b>${len(messages)}</b> new message.</p><pi18n:plural="">You have <b>${len(messages)}</b> new messages.</p></i18n:choose>

When used as an element, the two parts of the i18n:choose value are split
into two different attributes: numeral and params. The
i18n:singular and i18n:plural directives do not require or support any
value (or any extra attributes).

2.2.1 i18n:comment

The i18n:comment directive can be used to supply a comment for the
translator. For example, if a template snippet is not easily understood
outside of its context, you can add a translator comment to help the
translator understand in what context the message will be used:

<pi18n:msg="name"i18n:comment="Link to the relevant support site">
Please visit <ahref="${site.url}">${site.name}</a> for help.
</p>

This comment will be extracted together with the message itself, and will
commonly be placed along the message in the message catalog, so that it is
easily visible to the person doing the translation.

This directive has no impact on how the template is rendered, and is ignored
outside of the extraction process.

2.2.2 i18n:domain

In larger projects, message catalogs are commonly split up into different
domains. For example, you might have a core application domain, and then
separate domains for extensions or libraries.

Genshi provides a directive called i18n:domain that lets you choose the
translation domain for a particular scope. For example:

The Translator class provides a class method called extract, which is
a generator yielding all localizable strings found in a template or markup
stream. This includes both literal strings in text nodes and attribute values,
as well as strings in gettext() calls in embedded Python code. See the API
documentation for details on how to use this method directly.

This functionality is integrated with the message extraction framework provided
by the Babel project. Babel provides a command-line interface as well as
commands that can be used from setup.py scripts using Setuptools or
Distutils.

The first thing you need to do to make Babel extract messages from Genshi
templates is to let Babel know which files are Genshi templates. This is done
using a “mapping configuration”, which can be stored in a configuration file,
or specified directly in your setup.py.

The Genshi extraction plugin for Babel supports the following options:

3.2.1 template_class

The concrete Template class that the file should be loaded with. Specify
the package/module name and the class name, separated by a colon.

The default is to use genshi.template:MarkupTemplate, and you'll want to
set it to genshi.template:TextTemplate for text templates.

3.2.2 encoding

The encoding of the template file. This is only used for text templates. The
default is to assume “utf-8”.

3.2.3 include_attrs

Comma-separated list of attribute names that should be considered to have
localizable values. Only used for markup templates.

3.2.4 ignore_tags

Comma-separated list of tag names that should be ignored. Only used for markup
templates.

3.2.5 extract_text

Whether text outside explicit gettext function calls should be extracted.
By default, any text nodes not inside ignored tags, and values of attribute in
the include_attrs list are extracted. If this option is disabled, only
strings in gettext function calls are extracted.

Note

If you disable this option, and do not make use of the
internationalization directives, it's not necessary to add the
translation filter as described above. You only need to make sure
that the template has access to the gettext functions it uses.

If you have prepared MO files for use with Genshi using the appropriate tools,
you can access the message catalogs with the gettext Python module. You'll
probably want to create a gettext.GNUTranslations instance, and make the
translation functions it provides available to your templates by putting them
in the template context.

The Translator filter needs to be added to the filters of the template
(applying it as a stream filter will likely not have the desired effect).
Furthermore it needs to be the first filter in the list, including the internal
filters that Genshi adds itself:

If you're using TemplateLoader, you should specify a
callback function in which you add the filter. That ensures
that the filter is not added everytime the template is rendered,
thereby being applied multiple times.

Use unicode internally, not encoded bytestrings. Only encode/decode where
data enters or exits the system. This means that your code works with characters
and not just with bytes, which is an important distinction for example when
calculating the length of a piece of text. When you need to decode/encode, it's
probably a good idea to use UTF-8.

If your application uses datetime information that should be displayed to users
in different timezones, you should try to work with UTC (universal time)
internally. Do the conversion from and to "local time" when the data enters or
exits the system. Make use the Python datetime module and the third-party
pytz package.