(The value to <tt>review-tags</tt> is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.

(The value to <tt>review-tags</tt> is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.

Why Review Translations?

Especially to new translators, it may not be obvious to which extent the translation needs to be reviewed. If the translator has exercised due diligence, how "wrong" can the translation be? Even if the translator has good command of the source language (English in context of this article), the answer is "very wrong", when all aspects are considered. Here are some of them.

With comparatively simple grammar of English, the meaning of a short English sentence -- as typically encountered in application user interfaces -- is very dependent on the surrounding context. This context may not be obvious when the translator is going through isolated messages in the translation file, so he may commit the worst of errors from the user's viewpoint, the senseless translation. An experienced reviewer will have developed sense for troublesome contexts, and will have several means to decisively determine the context (including, for example, running the development version of the application).

Even if the context is correctly established, the translator may use "wrong" terminology, which is the next worse thing for the user. A term used in translation does not need to be wrong by itself, in fact it may be exactly the correct term -- in another translation project. The reviewer will have more experience with terminology of the present project, and be able to bring the translation in line with it.

Style in the technical sense is a consistent choice between several perfectly valid constructs in target language when applied to text in the given technical context. For example, how to translate menu titles and items, button labels, or tooltips. The choices may include noun or verb forms, particular grammar categories, tone of address, and so on. There may be a style guide to the project which details such choices, and the reviewer will know it well.

Style in the linguistic sense is especially applicable to longer texts, such as tooltips in user interfaces, and passages in documentation. A typical error of a new translator is to closely adhere to English style and grammar. This may produce translation which is semantically and grammatically valid in the target language, but very out of style -- the "translationese". Reviewer is there to naturalize such passages.

Finally, the reviewer may be an experienced translator, but that does not mean that his own translations need no review. Immersion into the source language, distraction, fatigue, will lead the reviewer into any of the above errors in translation, only with less frequency. So reviewers should also review one anothers' translations.

Classical Reviewing by Stages

Classical review workflow by stages seems simple enough. Translator translates a PO file (or updates existing translation), and declares it ready to review. A reviewer reviews it, and declares it ready to "commit". Committing here should be understood generally, as inclusion into the pool from which translations are periodically shipped to end users. A committer finally commits the file. The process is iterative: the reviewer may return the file to the translator, and translator later again declare it as ready for review. There may be several stages of review (such as proof-reading, approving), each of which may return the translation to a previous stage, or forward it to some special stage. The process can also be more finer grained, where each message in the file goes through stages separately.

Regardless of the particularities, workflows of this kind all have the following in common. Members of the translation team are assigned roles -- such as translator, reviewer, approver, committer -- by which they enter into the workflow (single person can have more roles). The later review stages must wait for the earlier stages to complete, and the translation cannot be updated again before the current version clears the pipeline (or the pipeline is aborted). Most importantly, once the translation is committed, it becomes part of simply "admitted" translations, with no further qualifiers.

The system of prescribed roles requires that team members assign them between themselves, stick to them, and shuffle them along the way. The prescribed review pipeline requires a tool to enforce and keep track of the stages in which translations are. This makes the review workflow complex and rigid, most probably with choke points for efficiency. Distribution of roles may become disbalanced by people coming and going, or the workflow tool may be prohibitive to some scenarios (e.g. single translator making small adjustments in dozens of files across the project, but having to upload each manually through a web interface).

Of course, "rigid", "complex", "inefficient", are comparative qualifications, so what is it that the classical review by stages can be compared to in this way?

Reviewing by Ascriptions

Reviewing by ascriptions is even simpler conceptually, and yet less rigid, less complex, and much more efficient than the review by stages. It works on the message-level, rather than file-level. Anyone can simply translate some messages and directly commit modified files, without any review, but with ascribing modifications to own name. Anyone can review any committed messages at any moment, commit the modifications-on-review and ascribe reviews to own name and (possibly) to certain class -- full review, review of context, of terminology, of style, etc. Only when the translation is to be shipped to end users, the insufficiently reviewed messages are automatically omitted from the package, by evaluating the ascription history of each message.

Most importantly, based on the ascription history, the reviewer can select only some particular messages, and review only the difference between their historical and current versions. For example, Alice can select to review only messages modified since she or Bob had last reviewed them for style; she could see the difference from that last review to current version, e.g. if in the whole paragraph only a single word has changed by Charlie when he reviewed the terminology. In terms of PO workflow, the ascription history propagates through merges, so the reviewer can compare the change in original and the change in translation since the last review, to judge if one fits the other.

Since everyone just commits, translations can be efficiently kept in a version control repository, with the ascription system added on top. After having done some translating, the team member simply substitutes commit command of the version control system (VCS) with ascribe-modifications command of the ascription system (AS, which calls the underlying VCS internally). After reviewing, the team member uses ascribe-reviews command of the AS to commit reviews to ascription history (as well as modifications made during the review). To select messages for review, the team member issues diff-for-review command of the AS (with suitable parameters to narrow the set) and selected messages are marked in-place in PO files and embedded with differences, and possibly popped open in a PO editor.

When the translations are to be released, the team coordinator issues filter-for-release command of the AS, which takes the working PO files and creates final PO files with insufficiently reviewed messages removed. "Release time" is used here only figuratively: this should be a fully automatic process, so it can be performed at any interval of convenience.

What constitutes "sufficient review" can be defined in fine detail. It could be specified that messages modified by Alice need to have only review for terminology, but not necessarily for style; Charlie may belong to the group which needs to be reviewed on style, but not necessarily on context; Bob's reviews for style may be nice to have, but never blocking if missing. These decisions do not preclude released messages to be reviewed later on missing points, after higher priority reviews have been completed. The definition of sufficiency may be changed at any point, e.g. as team members get more experienced and require less review, without interfering with direct translation and review work.

In summary, with reviewing by ascriptions the lean efficiency of raw VCS operation is preserved while providing for great flexibility of review. All team members can be given commit access, no web or email detours are needed. There are no prescribed roles, but an equivalent of role assignment happens automatically at last possible moment, and can take into account both translators' and reviewers' abilities. There is no staging between completing and committing the translation, which enables translator to keep on polishing the translation undisturbed until the reviewer comes around. There is no inefficiency in handling small changes throughout many files, since single AS command commits all changes just as single VCS command would. AS in effect abstracts VCS, so general team members do not have to know the particularities of the underlying VCS. On commit operations, AS can also apply checks (e.g. decline to commit syntactically invalid PO files) and modifications (e.g. update translator's data in the PO header).

Ascription System in Pology

Pology is a collection of various modular tools for supporting translation based on PO files. Among them is the script poascribe, which implements an ascription system (AS); at present, it can use Subversion or Git as the underlying VCS. poascribe is still in experimental stage, so what follows is a brief description of how to use it in context of the KDE translation project. However, very little is truly specific to KDE; the only major assumption is that there exists a VCS repository with PO files of a given language grouped together, and that the translation team can use it without special restrictions.

Very important for the AS is how branches are handled (in KDE, the rolling trunk and stable branches). AS can in principle be deployed by branch, but then there is the added complexity of porting translations between branches, which ascriptions should follow. Therefore, the AS implemented by poascribe is currently limited to assumption that there is a single branch of translations at all times. The article "Translating in Summit" explains how a KDE translation team can set up and operate such a single branch, the summit, and this is the prerequisite for the following instructions. (Note that the summit system is useful on its own, and should be conductive to any kind of review workflow.)

Setting Up

The summit branch for the language LANG is positioned like this in the KDE repository:

$KDEREPO/

trunk/
l10n-support/
LANG/
summit/
messages/
docmessages/

The team coordinator already has this part of the repository tree locally due to regular summit operations. For the same reason Pology is already set up. Setting up the ascription system is now simple. The file ascription-config is created in the root of the summit:

The ascript-root setting should be exactly ../summit-ascript, for the reason mentioned later.

commit-message field, if defined, allows team members to commit without providing a commit message. The value given by this field will be used by default, with translator's user name appended to the end in special syntax. For example: Translation updates. [>alice]. (Translator's user name is also appended to manually supplied commit messages.) Translators can still supply a commit message when they wish, as shown later. If this field is not set, the commit message is supplied as usual on committing.

Team members are defined by [user-USERNAME] sections. Ascription user names can be any valid ASCII identifier: ASCII letters, digits and underscores only, digit cannot be the first character. Ascription user names have no technical relation to the underlying VCS accounts, though it is mnemonically convenient if they are the same (in case of SVN). This means that a translator who does not have a VCS account (yet) can and should be added here, with assigned user name (best one suitable as SVN account name later); why this should be done will be explained later.

original-name field in user sections is there in case the preferred renderings of the name in English and in target language are not the same. When this is not the case, original-name can be omitted.

As soon as the ascription-config file is committed, the ascription system is ready for operation. Only regular modifications to this file are those of adding new team members. (On the other hand, team members should never be removed, because even after they no longer contribute, their ascription records remain in the system.)

Initial Ascription

The most common situation at start of ascription workflow is that there already exists a body of translations, contributed to by many different people over time. The coordinator should ascribe all existing translations as initial modifications, but to whom? It cannot be said precisely who translated what. The solution is to introduce a generic user in ascription-config, suitably known as "Unknown Hero" (or "Lost Translator", you can be inventive):

[user-uhero]
name = Unknown Hero
original-name = Незнани јунак

and ascribe all existing translations as modified and reviewed by this user. The coordinator does this with the following command:

The argument reviewed is the ascription mode, and the -u option provides the user name to which ascriptions are made. This is an important point: ascriptions are made to a user defined in ascription configuration, and have nothing to do with VCS accounts; someone who has the account can commit in the name of someone who does not. The -C option prevent automatic committing by poascribe, which is useful for this initial step. Finally the paths which contain all summit catalogs are given.

When the poascribe command is issued, a progress bar will appear, and the following output will start to unfold:

The number in parenthesis indicates how many messages have been ascribed in the given PO file, and at the end the totals are given. Catalogs are processed twice, first to ascribe modifications and then reviews, because a review cannot be ascribed before the modification has been ascribed. Ascribing the complete summit for the first time will take quite some time (say 15-30 minutes).

After the initial ascription has been made, the ascription tree will appear next to the summit tree. This tree will contain one ascription PO file for each summit PO file, with the same name and relative location within the tree:

The ascription tree can now be committed as usual (poascribe will have already added it):

$ svn commit LANG/summit-ascript/ -m "Initial ascription."

Daily Use for Translators

Team members other than the coordinator, whether translators or reviewers, need to keep around only the trunk/l10n-support/LANG/ directory. But they always need to update this directory fully (rather than just one particular module or file under .../*messages/), so that the summit tree and the ascription tree (and configuration) are kept in sync.

In order not to have to issue their own user name (-u option to poascribe) all the time, translators can set it in Pology user configuration ~/.pologyrc, in [poascribe] section:

poascribe will add ascription records into ascription catalogs corresponding to summit catalogs to be committed, and commit them all. Like svn commit, poascribe mo can take any number of file or directory paths, and can be issued from any working directory (it will always find ascription catalogs). If default commit message has not been set in the ascription configuration, poascribe will ask for it; or it can be given in command line through -m option.

Translators Without Commit Access

With the ascription system in place, every regular team member should have commit access. But, there may be some period of time before new translators are given accounts, revision control may be too technical for some, and even those with the account may not be able to commit temporarily for some reason.

These translators may send in their work by email, to any team member with commit access (not necessarily the coordinator or a reviewer); this team member can commit received files without any review, as review can be conducted at any later time. If Bob sends some files to Alice, she can commit them immediately by stating Bob's user name:

$ poascribe mo -u bob ...

For this to work, the translator who sent in the files has to be defined in the ascription configuration. There are no hidden costs or security issues to this (as opposed to opening a VCS account), so every new translator should be defined there before any work of that person is committed.

Daily Use for Reviewers

The ascription system opens up all sorts of possibilities for concrete review patterns. Reviewers should keep in mind that for each message the full modification and review history is available, so that the team can think about how to make good use of it. Therefore, what follows are some examples to illustrate the review facilities that poascribe provides.

Basic Reviewing

At the very basic level (which is the only level in classical review by stages), messages can be classified into simply unreviewed and reviewed, without further qualifiers. Alice now wants to review all unreviewed messages in a group of PO files, say kdetoys module. She issues (di is short for diff):

Unreviewed messages have now been marked and diffed, inside the listed PO files. What is this about "diffing"? If the files had already been reviewed before, some of the messages modified since then (those marked for review) may have changed very little (e.g. a few words in a paragraph-length message, or even just punctuation). Therefore, for each message marked for review, Alice also wants to see the diff since last review to current version. Here are two messages in typical review states added by poascribe di:

In the first message, the first to note is the #. ascto: comment. This comment succinctly lists who did what with the message since the last review; here charlie:m means that Charlie is the one who modified it. Then, there is the ediff flag, which alice can use it to jump through messages marked for review. Finally, the original and translation have been diffed; here they show that, since the last review, the message was fuzzied by changing "You won" to "Tie", and what Charlie did in translation to unfuzzy it. Even on a message as short as this, the diff tells something useful to Alice: the phrase "Game over" likely has a formulaic translation, and the fact that it is not part of the diff means that the earlier reviewer had made sure it is consistent, so Alice does not have to check that.

The #. ascto: comment of the second message reveals that both Charlie and Bob had been translating it. ediff-total flag instead of plain ediff means that this message had no review at all up to now, so there are no embedded diffs in text fields.

Alice can now go through marked files and messages, review translations, and possibly make modifications. When making changes in a message with embedded diffs, she can freely edit text outside of difference segments and within {+...+} segments (as these are the ones which belong to current version of the text). While reviewing, Alice does not remove any of the added message elements while reviewing (save for an occasional difference segment, when translation should be modified), as these elements are needed for later. If a message is particularly hard and Alice wants to defer its review for later, she can add the unreviewed (or urev or nrev for short) flag to it.

Once the review is complete, Alice commits the reviewed files in reviewed mode (short re):

Three things have happened here. First, all review states (flags, embedded diffs, etc.) have been removed, restoring the PO file to normal. Then, any modifications that Alice have made during review are ascribed to her (here 3 out of 21 messages). Finally, all marked messages are ascribed as reviewed by Alice (any with unreviewed/urev/nrev flags would have been omitted here). When committing, the only summit catalog that got committed is the one with modifications made during review, and all the ascription catalogs were committed because of the reviews recorded in them.

When many files with few changes in each are to be reviewed, it becomes burdensome to manually open each and every diffed for review, and then to make sure that all are committed with poascribe re. To make this easier, -w torevivew.out option can be added to poascribe di, which requests that paths of all diffed PO files are written into torevivew.out file. This file can then be used to batch open POs for review in the editor, as well as fed back on poascribe re with -f torevivew.out. There is also the -E option which causes poascribe to directly open PO files in a PO editor, though this is currently applicable only to Lokalize. Putting it together, to efficiently review a whole bunch of small changes throughout many files, Alice can:

Selecting Messages for Review

Invocations of poascribe di without any options, as in the previous section, were actually equivalent to this:

$ poascribe di -s modar PATHS...

Option -s is issuing the message selector. modar is the default selector for diff mode, and stands for MODified-After-Review: it selects the earliest historical modification of the message after the last (or no) review of that message, if there is any such. By selecting a historical modification of the message, the diff from it to current version can be computed and embedded into the PO file, as in previous examples.

There are various specialized selectors, and fall into two groups: shallow selectors and history selectors. Shallow selectors look only into the current version of the message, and cannot select historical versions, which means that they cannot provide embedded diffs. History selectors (modar is of this type) can select messages from history and provide diffs. Several selectors can be issued on the command line, and the message is selected only if all selectors select it. Shallow selectors are then normally used as a pre-filter for history selectors. For example, to select messages modified after last reviewed, but only those found in stable branch, branch and modar selectors are chained:

$ poascribe di -s branch:stable -s modar PATHS...

It is important that the history selector is given last, because the last selector determines which historical message is selected. If the ordering had been reversed here, same messages would get selected, but they would not have embedded diffs, because branch is a shallow selector.

Selectors can take parameters themselves, like branch:stable in the previous example. Parameters are separated from the selector name by any non-alphanumeric character; this is colon by convention, but if a parameter contains a colon, something like slash, tilde, etc. can be used. Number of parameters can be variable, and modar in particular can take from none to three. If Alice wants to review only those messages modified by Charlie since last review, she states this by first argument to modar:

$ poascribe di -s modar:charlie PATHS...

If Alice does not give too much credit to other reviewers, she can request selection of messages modified after last review by her with second parameter to modar:

$ poascribe di -s modar::alice PATHS...

Here the first parameter ("modified by..."), which is not needed, must be explicitly skipped, before going to the second parameter ("reviewed by..."). The third optional parameter of modar will be mentioned in the next section.

When a selector parameter is a user name, normally it can also be a comma-separated list of user names (modar:bob,charlie) or prefixed with tilde to negate, i.e. select all other users (modar:~alice).

Any selector can be negated by prepending n to its name. For example, the history selector modafter:DATE selects first modification after the given date; to select messages modified after last review, but only if modified during June 2010:

Negating a history selector produces a shallow selector: while modafter is history selector, nmodafter is shallow. But the order of the two in the previous command line is not important, as the last selector is the usual modar.

Selectors can be issued in other modes too. If the PO file is big and Alice has reviewed messages up to and including entry 246 when she has to pause until another day, she can commit reviews only up to this entry by issuing the espan selector:

$ poascribe re -s espan::246 PATHS...

(the first parameter to espan is the first entry number, given if messages are not to be selected from the first). There is also the counterpart lspan selector, which works with referent line numbers (those of msgid keywords) instead of entry numbers.

Fine-Grained Reviews

In the introduction, several distinct types of what can go wrong in translation were described. Not all reviewers may be able to check translation against all those problems. Here is a typical scenario of this kind:

Alice is very computer-savvy and knows the translation project inside and out, which means that she can review well for context, terminology, and technical style. But, her language style leaves something to be desired, which shows through longer sentences and passages. Dan, on the other hand, is a very literary person, but not that much into the technical aspects. Dan's style reviews would thus be a perfect complement to Alice's general reviews.

poascribe can support this scenario in the following way. A review type tag for language style is defined in the ascription configuration, using the review-tags field:

[global]
...
review-tags = lstyle

(The value to review-tags is a space-separated list of identifiers, when more than one special review type is needed.) With this addition to configuration, Alice can continue to review as she did before, without any changes to her workflow.

Dan selects messages for review quite similarly to Alice, with the exception of giving the lstyle tag as third parameter of modar:

$ poascribe di -s modar:::lstyle PATHS...

When committing reviews, Dan must also state this tag, in order to ascribe reviews as of language style type:

$ poascribe re -t lstyle PATHS...

If Dan is always going to review the language style, in order not to have to issue the selector and tag in the command line all the time, he can make them default per mode in ~/.pologyrc:

With this Dan can use plain poascribe di and poascribe re, just like Alice does.

The important point of review tags is that they make reviews by types independent. For example, Dan may come around to review the language style of the given message after several modifications and general reviews have been ascribed to it -- modar:::lstyle will simply ignore all reviews except for lstyle reviews. This is going to be reflected in the ascto: comment to marked messages:

...

. ascto: charlie:m alice:r bob:m

...

msgid "..."
msgstr "..."

Here Alice has made one review between Charlie's and Bob's modifications, and that review, being general instead of lstyle, did not cause modar to stop at it. After Dan reviews this message for language style, Alice runs selection for review and gets this:

...

. ascto: bob:m dan:r(lstyle)

...

msgid "..."
msgstr "..."

Again, since lstyle reviews do not mix with general reviews, Dan's review did not hide Bob's modification that Alice did not check so far.

(General review too has a tag assigned, the empty string, in case the reviewer needs to explicitly issue it in some context.)

Daily Use for The Coordinator

After setting up the ascription system, the team coordinator should have to do very little to maintain it.

Ascribing Merges

Modifications made to summit catalogs by merging with templates must also be ascribed. This ascription is made in the name of the reserved fuzzy user, which exists by default and is not defined in ascription-config. Therefore, after merging the summit (posummit ... merge ...) the coordinator substitutes the VCS command:

$ svn commit LANG/summit/ -m "Merged summit."

with the poascribe command in modified mode:

$ poascribe mo -u fuzzy LANG/summit/ -m "Merged summit."

Since -C option is not issued, poascribe will automatically commit all modified summit and ascription catalogs when done.

Shuffling Ascription Catalogs

Sometimes summit catalogs are shuffled in the repository: moved to another module, renamed, one catalog split into two, two catalogs merged into one. Such shuffling should be exactly mirrored in the ascription tree, and this too is done on the repository side, at the same time. This relies on the ascription root being set exactly to ../summit-ascript in the ascription configuration. So the team coordinator has nothing special to do here.

If instead in the central KDE repository the translation team is working in an external repository, by consequence the ascription system must be set up in that repository. But so long as process_orphans.sh script from trunk/l10n-support/scripts/ is used to shuffle catalogs in the external repository as well, the ascription catalogs will be properly handled.

Filtering for Release

The last component of the ascription system is how to prevent insufficiently reviewed messages from leaking into a release. In context of Pology and summit workflow, poascribe itself is used directly to this end. Instead, in the summit configuration (as opposed to ascription configuration), the team coordinator defines filters which pass messages by applying selectors.

Each top level PO tree has its own summit configuration file, named MSGTREE.extras.summit:

For the simple case of all reviews being general reviews, the filter is added to summit configuration like this (anywhere within *.extras.summit file):

S.ascription_filters = [

("regular", ["nmodar"]),

]

Here the filter is named regular, and is defined as application of nmodar selector, the negation of modar. This simply means: pass all messages not modified after the last review.

When the team coordinator scatters to branches (executes posummit scatter), messages from summit POs which do not pass this filter will not be sent to branch POs. The count of stopped messages by branch PO will be reported in the output as scattering proceeds.

Why did we have to name the filter regular? (Those knowing some Python will also notice that it is defined as a list element.) Because it is possible to define more than one filter, and select which one is used on each scattering. For example, the coordinator may wish that, when the release is near and time is short to review everything, messages from a few experienced translators can be passed into release without review. If those translators are Alice and Bob, an "emergency" filter can be defined like this:

S.ascription_filters = [

("regular", ["nmodar"]),
("emergency", ["nmodar:~alice,bob"]),

]

By default, posummit uses the first filter in the list. When the coordinator needs to do emergency scattering, he requests the emergency filter by the -a option:

What if several selectors are needed to pass the message? For example, the language style review (the earlier example with Alice and Dan) too may be requested for regular scattering, but omitted from emergency scattering. The filter setup for this scenario looks like this:

The regular filter now reads: pass the message if it has not been modified after the last (general) review and has not been modified after the last style review.

Simple combination of predefined selectors by AND-conditions may not be sufficient for more involved scenarios. When this is the case, the coordinator may write (or ask someone to write) a custom selector in Python, and plug it in as the second element in the filter tuple (instead of the list of predefined selectors).