Project description

What is lingua?

Lingua is a package with tools to extract translatable texts from
your code, and to check existing translations. It replaces the use
of the xgettext command from gettext, or pybabel from Babel.

Message extraction

The simplest way to extract all translatable messages is to point the
pot-create tool at the root of your source tree.

$ pot-create src

This will create a messages.pot file containing all found messages.

Specifying input files

There are three ways to tell lingua which files you want it to scan:

Specify filenames directly on the command line. For example:

$ pot-create main.py utils.py

Specify a directory on the command line. Lingua will recursively scan that
directory for all files it knows how to handle.

$ pot-create src

Use the --files-from parameter to point to a file with a list of
files to scan. Lines starting with # and empty lines will be ignored.

$ pot-create --files-from=POTFILES.in

You can also use the --directory=PATH parameter to add the given path to the
list of directories to check for files. This may sound confusing, but can be
useful. For example this command will look for main.py and utils.py in
the current directory, and if they are not found there in the ../src
directory:

$ pot-create --directory=../src main.py utils.py

Configuration

In its default configuration lingua will use its python extractor for .py
files, its XML extractor for .pt and .zpt files and its ZCML extractor
for .zcml files. If you use different extensions you setup a configuration
file which tells lingua how to process files. This file uses a simple ini-style
format.

There are two types of configuration that can be set in the configuration file:
which extractor to use for a file extension, and the configuration for a single
extractor.

File extensions are configured in the extensions section. Each entry in
this section maps a file extension to an extractor name. For example to
tell lingua to use its XML extractor for files with a .html extension
you can use this configuration:

[extensions]
.html = xml

To find out which extractors are available use the -list-extractors option.

A section named extractor:<name> can be used to configure a specific
extractor. For example to tell the XML extractor that the default language
used for expressions is TALES instead of Python:

[extractor:xml]
default-engine = tales

Either place a global configuration file named .config/lingua to your
home folder or use the --config option to point lingua to your
configuration file.

$ pot-create -c lingua.cfg src

Domain filtering

When working with large systems you may use multiple translation domains
in a single source tree. Lingua can support that by filtering messages by
domain when scanning sources. To enable domain filtering use the -d option:

$ pot-create -d mydomain src

Lingua will always include messages for which it can not determine the domain.
For example, take this Python code:

The first hello-message does not specify its domain and will always be
included. The second line uses dgettext to explicitly
specify the domain. Lingua will use this information when filtering domains.

Including comments

You can add comments to messages to help translators, for example to explain
how a text is used, or provide hints on how it should be translated. For
chameleon templates this can be done using the i18n:comment attribute:

<label i18n:comment="This is a form label" i18n:translate="">Password</label>

Comments are inherited, so you can put them on a parent element as well.

For Python code you can tell lingua to include comments by using the
--add-comments option. This will make Linua include all comments on the
line(s) immediately preceeding (there may be no empty line in between) a
translation call.

# This text should address the user directly.
return _('Thank you for using our service.')

Alternatively you can also put a comment at the end of the line starting your
translation function call.

return _('Thank you for using our service.') # Address the user directly

If you do not want all comments to be included but only specific ones you can
add a keyword to the --add-comments option, for example --add-comments=I18N.

# I18N This text should address the user directly, and use formal addressing.
return _('Thank you for using our service')

Setting message flags in comments

Messages can have flags. These are to indicate what format a message has, and
are typically used by validation tools to check if a translation does not break
variable references or template syntax. Lingua does a reasonable job to detect
strings using C and Python formatting, but sometimes you may need to set flags
yourself. This can be done with a [flag, flag] marker in a comment.

# I18N [markdown,c-format]
header = _(u'# Hello *%s*')

Specifying keywords

When looking for messages a lingua parser uses a default list of keywords
to identify translation calls. You can add extra keywords via the --keyword
option. If you have your own mygettext function which takes a string
to translate as its first parameter you can use this:

$ pot-create --keyword=mygettext

If your function takes more parameters you will need to tell lingua about them.
This can be done in several ways:

If the translatable text is not the first parameter you can specify the
parameter number with <keyword>:<parameter number>. For example if
you use i18n_log(level, msg) the keyword specifier would be i18n_log:2

If you support plurals you can specify the parameter used for the plural message
by specifying the parameter number for both the singular and plural text. For
example if your function signature is show_result(single, plural) the
keyword specifier is show_result:1,2

If you use message contexts you can specify the parameter used for the context
by adding a c to the parameter number. For example the keyword specifier for
pgettext is pgettext:1c,2.

If your function takes the domain as a parameter you can specify which parameter
is used for the domain by adding a d to the parameter number. For example
the keyword specifier for dgettext is dgettext:1d,2. This is a
lingua-specified extension.

You can specify the exact number of parameters a function call must have
using the t postfix. For example if a function must have four parameters
to be a valid call, the specifier could be myfunc:1,4t.

xml: old name for the chameleon extractor. This name should not be used
anymore and is only supported for backwards compatibility.

Babel extractors

There are several packages with plugins for Babel’s message extraction tool. Lingua can use those
plugins as well. The plugin names will be prefixed with babel- to
distinguish them from lingua extractors.

For example, if you have the PyBabel-json package installed you can
instruct lingua to use it for .json files by adding this to your configuration
file:

[extensions]
.json = babel-json

Some Babel plugins require you to specify comment tags. This can be set with
the comment-tags option.

Note - the registered extractor must be a class derived from the Extractor
base class.

After installing mypackage lingua will automatically detect the new custom
extractor.

Helper Script

There exists a helper shell script for managing translations of packages in
docs/examples named i18n.sh. Copy it to package root where you want to
work on translations, edit the configuration params inside the script and use:

3.11 - August 6, 2015

3.10 - May 1, 2015

Update i18n.sh example to show statistics when compiling catalogs. This
reveals catalogs with fuzzy messages. This fixes issue 59.

Fix handling of line number parameter in the Python extractor. This fixes
invalid line numbers generated for Python code embedded in other files,
for example in Mako templates. This fixes issue 58 based on a fix from
Laurent Daverio.

Warn when using a function call instead of a string as parameter in a
gettext keyword in Python code. This fixes issue 57.

3.9 - February 19, 2015

Fix line number reporting for XML/zope/Chameleon extractors.
Pull request 53
from Florian Schulze.

3.8 - January 20, 2015

Add options to sort messages by either location or message id when creating a
POT file. Based on pull request 51
from Emanuele Gaifas.

3.7 - December 17, 2014

Include used lingua version in POT metadata.

Add support for message contexts in translationstring instances.

Add support for i18n:comment attributes in ZPT templates.

3.6.1 - November 11, 2014

Restore Python 2.6 compatibility.

3.6 - November 11, 2014

Extend automatic context-comments for ZPT templates to also show the
canonical text for sub-elements. For example this markup:

#. Used in sentence: "This is just ${wonderful}!"
msgid "wonderful"
msgstr ""

This extra context information can be very important for translators.

3.4 - November 3, 2014

Add support for the i18n:context attribute in ZPT templates. This is
supported by Chameleon 2.17 and later to set the translation context.

3.3 - September 14, 2014

Modify the message format-checker to not consider a space ofter a percent-
character as a format flag. Space is a valid flag but is almost never used,
and this was creating a lot of false positives (for example a sentence like
“take a sample of 5% of all candidates”).

Do not try to extract a message from N_() calls: these are explicitly
intended to be used for situations where you pass in a variable instead of
a string.

3.2 - August 26, 2014

Refactor the extractor API a little bit to make it easier for extractors
to call each other. This is particularly useful when an extractor needs to
call the Python extractor to handle local Python expressions.

2.1 - April 8, 2014

Do not break when encountering HTML entities in Python expressions in XML
templates.

Show the correct linenumber in error messages for syntax errors in Python
expressions occurring in XML templates.

Fix bug in parsing of tal:repeat and tal:define attributes in the
XML parser.

Tweak ReST-usage in changelog so the package documentation renders correctly
on PyPI.

2.0 - April 8, 2014

Lingua is now fully Python 3 compatible.

Add a new pot-create command to extract translateable texts. This is
(almost) a drop-in replacement for GNU gettext’s xgettext command and
replaces the use of Babel’s extraction tools. For backwards compatibility
this tool can use existing Babel extraction plugins.

Define a new extraction plugin API which enables several improvements to
be made:

You can now select which domain to extract from files. This is currently
only supported by the XML and ZCML extractors.

Format strings checks are now handled by the extraction plugin instead of
applied globally. This prevents false positives.

Message contexts are fully supported.

Format string detection has been improved: both C and Python format strings
are now handled correctly.

The XML/HTML extractor has been rewritten to use HTML parser from Chameleon.
This allows lingua to handle HTML files that are not valid XML.

1.5 - April 1, 2013

1.4 - February 11, 2013

1.3 - January 28, 2012

XLS->Po conversion failed for the first language if no comment or
reference columns were generated. Reported by Rocky Feng.

Properly support Windows in the xls-po convertors: Windows does not
support atomic file renames, so revert to shutils.rename on that
platform. Reported by Rocky Feng.

1.2 - January 13, 2012

Extend XML extractor to check python expressions in templates. This
fixes issue 7. Thanks to
Nuno Teixeira for the patch.

1.1 - November 16, 2011

Set ‘i18n’ attribute as default prefix where there was no prefix found.
This fixes issues 5 and
6. Thanks to
Mathieu Le Marec - Pasquet for the patch.

1.0 - September 8, 2011

Update XML extractor to ignore elements which only contain a Chameleon
expression (${....}). These can happen to give the template engine
a hint that it should try to translate the result of an expression. This
fixes issue 2.