What is the problem?

Starting September 2008, Marta Stojanovic and Alain Désilets of the National Research Council of Canada will start a 12 month R&D effort around the Cross Lingual Wiki Engine Project.

As many of you know, choosing a good research question is very difficult task, so please help us by reading the possible ideas below, providing comments, and rating them. A good research question is one for which:

The answer is not known already, and cannot be found easily.

The answer matters and has large practical consequences for a particular community.

Thx for your help. We are aiming to choose one of them by mid-september.

Note: We're doing this partly as an experiment in the spirit of « The wisdom of crowds » :

Why is this question important?

This is important so we know what has been done already, so we can figure out what the important unresolved problems are, and can focus on solving those instead of re-inventing the wheel.

What makes this a research question?

This is not hardcore quantitative research, but it it falls in the category of qualitative research. It will involve gathering information, writing and analysing surveys, and synthesizing the information into a big picture.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q2: How to best integrate a collaborative translation platform with existing Computer-Assisted Tools?

Description

Professional translators have all sorts of Computer Assisted Translation (CAT) tools at their disposal (ex: terminology databases, translation memories), which amateur translators working in collaborative fashion often do not have.

Which of these tools should be integrated into collaborative translation platforms, and if so, how?

In this project, we would integrate open source CAT tools into TikiWiki, have them used in an actual environment involving amateur translators, and gather feedback about their usefulness, limitations, and suggested improvements.

Another related issue is that some organisations like the UN and EU agencies want to use collaborative translation to outsource translation to communities. But they already have expensive CAT infrastructure, which includes workflow management systems and large terminology database and translation memories. How to integrate these proprietary tools in an open source collaborative platform?

Why is this question important?

This is important because CAT tools have great potential for increasing the productivity of volunteer translators in a collaborative environment.

What makes this a research question?

CAT tools are pretty mature, and we know how to build them for professional translators. We also know that they have a good impact on productivity.

But it's not clear to what extent tools need to be different to help amateur translators, and the extent to which it will actually improve their productivity.

Moreover, the more open and unpredictable technical environment in which amateur translators work poses a number of design questions that will be interesting research from a Human Computer Interaction perspective.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q3: How can Machine Translation help collaborative translation communities?

Description

Collaborative translation communities often do not have sufficient human resources to cover all language pairs, and to provide translation of all content in a timely fashion.

Machine Translation might help in several ways:

Automatically provide a "gist quality" translation of new content. This would be only a temporary measure until a human translator finds the time fix it.

Allow volunteer translators to translate content from a source language that they can't read. For example, the MT system would provide a "bad" English translation of a page written in Japanese, and the user could fix that bad English without having to actually read the original Japanese.

Why is this question important?

This is important because communities don't want to spend most of their human resources and energy in translation as opposed to creation of original content. MT has the potential of providing "good enough" translation at a fraction of the cost in human resources that fully manual translation can offer.

What makes this a research question?

MT is still bleeding edge technology, so application that uses it is definitely research.

While there have been studies of the use of MT outputs for the purpose of gisting, and as first drafts to be post-edited by human translators, those have focused on translation of whole documents.

In the context of a collaborative community, we are more likely to want to apply MT to updates to pages. There are some interesting new issues with that context.

For example, consider a French page that is perfectly translated by a human. Someone adds two sentences to the English page. Wouldn't it be nice to be able to insert an MT translation of just those two sentences into the French page, maybe highlighting them in yellow with a warning saying that they were MT translated? Could it be that two, potentially poorly MT translated sentences are more easily understandable when presented in the context of a perfectly translated document? Also, how do we go about reliably inserting those two sentences at the right place in the French (ex: using alignment technology).

Also, suppose I have an English page that is initially all translated by MT to French. Then, I manually correct the bad MT translation to make it perfect. In particular, I modify the structure of sentence 2 to make it sound more like a French sentence (the MT translation used an English-like sentence structure). Then, someone changes the English sentence number 2. What should the MT system do? Should it replace French sentence 2 by an MT translation of the newly modified English sentence 2? If so, chances are that I will have to redo the structure modification in the French sentence 2. Is there a way that the MT system could learn from my correction made to the original French sentence 2, and use the same sentence structure to retranslate the updated English sentence 2?

There may also be some "softer" Human Computer Interaction types of issues. For example, how best to entice readers of a bad MT translation (either of a whole page, or just of a page), to become an active participant in the community by fixing the translation?

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Q4: How useful is the current implementation of CLWE?

Description

How useful is this to end users as it is now? What are the remaining problems to be addressed?

Why is this question important?

CLWE is still at beta stage, and it is crucial to evaluate it in real-use situations, in order to improve it.

What makes this a research question?

This is not hardcore, quantitative style of research, but it falls within the realm of more qualitative Human Computer Interaction research.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q5: How to better isolate textual elements in a page that need translation?

Description

The CLWE system does a pretty good job at knowing when say, the French page is missing some edits that have been made in the English and Spanish pages.

But it does not do a great job at identifying the actual textual elements in the English and Spanish pages that need to be reproduced in French.

The actual issues are complex and a bit hard to explain, but are described in the paper entitled "The Cross-Lingual Wiki Engine: Enabling Collaboration Across Language Barriers" (soon to be available on the web... Google for the title). See the Limitations section of that paper for a description of the problem, and the Future research section for a description of potential solution.

Why is this question important?

The current implementation of displaying what needs to be translated is based on diff technology, which can cause a lot of confusion if there are new page changes interleaved with translations from another language. For example, when translating a change from English to French, the system might in cases where there are interleaved modifications to the English page, indicate that certain portions of the English page need translation into French, when in fact, these English passages were actually created in French originally, and translated to English.
This can cause the users to completely lose faith in the system.

What makes this a research question?

While diff technology is pretty straightforward, patching technology isn't, and often requires that the human be kept in the loop. The main challenge of this project is to find a way to:

Take a diff between say, versions v5 and v6 of the English page

Show those diffs in the context of the current version of the English page, say v9.

As far as we know, this is not a trivial problem. More advanced isolation of textual elements in a page that need translation significantly complicates the range of possible translation workflows.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q6: What is the value of supporting cross-lingual searching, and how best to implement it?

Description

In a site that is collaboratively translated, some of the information may be available only in particular languages and not in others.

When searching for information, users probably want to find the information no matter in which language it is present. But obviously they don't want to write the same query in different languages.

There are experimental technologies for doing cross-lingual search. For example, writing a query in English, and having the system search for that in all languages (usually by automatically translating the query to different languages). Combined with Machine Translation system for translating the hits found in different languages, this might be good enough for people to find and understand information in pages written in languages that they can't read.

Does such a feature have value for collaborative translation communities? If so, where does it lie? How can we best implement such features?

Why is this question important?

This is another way to deal with the fact that in collaboratively translated sites, is not always possible to translate all relevant information to all languages in a timely fashion.

What makes this a research question?

Cross Lingual Search technology is still bleeding edge, so it's not clear that it will work to a sufficient level to provide value to end users. We plan to find out by building it and trying it out with real end users.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Description

The CLW system currently relies heavily on the user to tell it when a particular translation task is complete (Complete Translation versus Partial Translation buttons for saving). If the user mistakenly pushes the wrong
button, this may result in changes not being propagated to
other languages, or in substantial confusion for subsequent
translators of the same page.

One way to alleviate this problem would be to use automatic
bilingual sentence alignment technologies to perform
a basic sanity check on the alignment of the saved
target page with the source page. The system could then
notify the user when the alignment does not seem to correspond
to his choice of Complete Translation versus Partial
Translation button

Why is this question important?

Users are currently confused about what button to press and often press the wrong button at the wrong time. As pointed above, this can have dire consequences.

Even if you know what to push when, it's easy to get into a grove where you always click on the Partial Translation button to do partial saves, and then forget to get out of that groove for the final save. Finally, even, it's easy to forget to translate say, a sentence, or accidentally delete one, so having the Complete Translation do a sanity check would be useful.

What makes this a research question?

Although bilingual alignment is a fairly mature technology, it's still not 100% robust. Also, it's usually not employed to do sanity checks on translations. It's more used in the context of producing parallel sentences to train MT systems, or to pour into Translation Memory.

So figuring out how to make it work well for that particular context will involve some amount of applied research.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q8: Design translation editing interface to prevent original contributions in the context of a translation transaction

Description

The CLWE system currently requires that users not mix translation and original contributions within the same transaction. In our limited experience using the system, this can be hard to do, especially when one notices an important mistake in the source text, while in the midst of translating it. Unfortunately, if a user makes an original edit while in the midst of a translation dialog, that original edit may never be propagated to other languages.

There does not seem to be an easy way to allow users to mix original edits and translations in the same transaction. However, we can constrain the translation user interface in
such a way as to prevent the temptation. For example, instead
of displaying the full text of the source page in an edit
box, we could display most of it in a read-only text box,
and only display those parts that need to be translated in
editable text boxes.

This constrained user interface may also help track translations
at a sentence-by-sentence level, which in turn may help perform sanity checks on translation alignments (as per
the previous section). Or, it could be that conversely, automatic bilingual alignment technology is needed in order to
identify which sentences the user should be able to edit in
the target text (that is, which sentences in the target text
correspond to changed sentences in the source text).

Why is this question important?

As pointed out above, mixing original content with translated content in a same translation transaction is a fairly common thing which results in original content not being translated in other languages. That's bad.

What makes this a research question?

Also, it may require the use of bilingual alignment technology, which, while relatively mature, is still not 100% accurate and has never been used for that particular purpose.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

Q: Experiment with alternative up-to-dateness

Description

The current CLWE system uses a measure of up-to-dateness is acceptable and provides useful information as-is. However, it is imprecise and could certainly be improved. Potential solutions include:

Changing the unit used for counting changes and employ

words or characters instead of sentences.

Changing the insertion/deletion weights.

Dynamically adapting the change counting unit as well

as the insertion/deletion weights, based on the length
of the page.

Performing deeper content analysis to determine if an

edit actually modified the meaning of a sentence.

Presenting the measure graphically instead of numerically

(ex: an up-to-dateness gauge) to better convey
the imprecise nature of the value to the end user.

Why is this question important?

Providing readers and volunteer translators with a measure of up to dateness is important to help them:

Figure out which version to read to get most up to date information.

Figure out which language needs most amount of translation work.

What makes this a research question?

Not clear how to measure up to dateness. Will require lots of trial and error to come up with something that makes most sense.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

AD: I think this is low priority. The current measure is not perfect, but it seems to do the trick.

Q: Incorporate translation management tools

Description

Although current features of CLWE allow users
to finnd out what translation work needs to be done for any
given page, users have no way of easily assessing which
pages, among all those on a given site, are in most need
of translation work. To deal with this issue, we could implement
simple reporting and visualization tools to help users
answer questions such as:

What urgent translation requests need to be fulfilled

in my native language?

What highly-visited pages in my native language are

currently severely out of date?

What's the average state of up-to-dateness for pages

in my native language?

Why is this question important?

Such tools can increase participation by translators, by making it easy for them to find important and relevant translation work. It may also allow people to act as volunteer coordinators and motivate translators by telling them where their contributions are most needed.

What makes this a research question?

May require a bit of trial and error, and some usability research to figure out the optimal mix of features.

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).

Your assessment

AD: While this seems like important features, I think it doesn't really qualify as research.

Q: ???

Description

Why is this question important?

What makes this a research question?

Proposal assessment (please prefix your scores with your initials)

Please help us by providing your own assessment of this research question, on three levels.

Importance: To what degree, do you feel that the answer to this question have important concrete consequences for the community of people doing collaborative translation.
1 = Not important, 5 = Critical importance

Workload: How many person month do you think it will take to answer that question?

Research level: To what degree, do you feel that this qualifies as research?
1 = This is not research at all, 5 = This is definitely research.

Make sure you make up your own mind before looking at other people's assessments (you can view them by clicking in the minus sign below).