About

Pology is a Python framework for custom processing of PO files. It aims to facilitate easy, fast and robust creation of scripts for tackling problems encountered in the "field", everyday translation work, and to collect all sorts of specific, narrow purpose tools written in this direction. It does not aim to be a collection of several feature-rounded, monolithic, and general purpose tools (though it may contain some which could qualify). In particular, it does not attempt to handle any other translation formats but PO. All of Pology's end-user tools and programming interfaces are geared towards the PO format and conventions. The name itself should be parsed as PO-logy, "the study of POs".

At the moment, Pology still has not reached release state. But it can already be used effectively for day-to-day work, especially through end-user scripts. Obtaining and preparing Pology for use is simple: fetch its code repository, set PATH to use the scripts that come with it, and possibly set PYTHONPATH to be able to write own code based on Pology. The following commands should suffice:

(Of course, for continuous use, environment variables should rather be set in ~/.bashrc, or the configuration file of whatever the shell you are using.) After these steps are successfully performed, Pology is fully prepared for use and scripting.

Ready-Made Tools

Pology provides a number of tools for end use, with varying degrees of specificity, embodied as several scripts within scripts/ subfolder of Pology's source. Details of operation of each script are provided within Pology documentation, and the following sections give overview and some examples of their functionality.

Sieving

((To be done.))

Diffing and Patching

Line-oriented diffing and patching, as conducted by diff(1) and patch(1) commands, is not quite appropriate for PO files. Due to PO content being composed of variably-formatted multiline entries, which combine translator, programmer, and automatically controlled elements, line-oriented diff may indicate difference where semantically there is none, or not show real difference in a useful form. One could even claim line diffing of PO files to be almost useless, especially for the purpose of sending patches for translation.

For this reason, Pology contains two scripts, poediff and poepatch, which create message-oriented, embedded diffs of PO files. These diffs can be used both for reviewing changes and applying patches to PO files. Concept of embedded diffing and details on operation of the mentioned scripts are described in a separate article.

Reformatting

((To be done.))

Heavy Artillery

((To be done.))

Writing Own Tools

Pology comes with detailed API documentation, but for a quick start into writing custom tools based on Pology, the following sections will describe and illustrate some of its more salient elements.

Catalogs and Messages

For an obligatory hello-world demonstration, let us create a PO template named hello.pot with a single message of this greet:

Most of these few lines are self-explanatory, except the last one: modifications to catalogs in Pology are never automatically written to disk, instead the sync() method must be called to initiate writes. Catalog is not gone after this, but you can continue to use it normally, including further syncings. In this example, after syncing the file hello.pot will be created in current working directory.

Practically, however, it is usually Gettext tools that will be used to create templates and catalogs, while a much more common use of Pology is to iterate over existing catalogs. The following code will open a catalog with various greetings, look for all messages that contain "hello" in the original text but do not contain "zdravo" in the translation, and report their content to standard output:

if "hello" in msg.msgid.lower():
matched = False
for text in msg.msgstr:
if "zdravo" in text.lower():
matched = True
break
if not matched:
report_msg_content(msg, cat)

Note how msgstr is represented as a list regardless of whether the message is plural or not, the difference being only in the number of elements. This removes the special case of singular/plural translations, and makes programmer always think of plural messages (though plural of original text is accessed through msgid_plural instance variable). Function report_msg_content will output the message to standard output, nicely formatted and preceded with a line stating the originating catalog and message's referent line and entry number in it. But report_msg_content can do much more, e.g. highlight parts of the message in the shell, add notes and delimiters, and so on (its API documentation provides all the details). Since no changes were done to the catalog, it is perfectly fine, even appropriate, not to call sync() at the end.

Of course, the previous snippet is just an illustration of iterating through catalogs and examining messages, in practice superfluous next to the functionality already provided by find-messages sieve: