Pages

Wednesday, February 29, 2012

This is a summary of what I think are some interesting recent articles on the web on subjects relating to MT.

The Big Wave, an Italian initiative that focuses on the changes happening in language technology released details and proceeding papers from their conference held in Rome in the summer of 2011. There are many interesting papers related to MT, controlled language and collaborative translation related issues. These papers provide a balance of practitioner, academic and user perspectives on these subjects and are worth a close examination.Some highlights include:

Linguistic resources and MT trends for the Italian language by Isabella Chiari discusses the implications of various kinds of data and their value for building data-driven MT systems and provides some specifics for EN <> IT MT systems. The paper is a great overview on the kinds of data that can be used and also provides insight on what data to use and where to use it with summary implications. It also makes a great case for the inevitability of corpus driven approaches in MT (without meaning to) by providing the theoretical rationale for this and points to rising momentum of the data driven approach.

“In this context, it seems logical to think that if prices, quality and times are already established for TMs according to different level of fuzzy matches then we just need to compare MT segments with TM segments, rather than comparing MT to human translation. “

This study also helps to establish that in reality MT is just a new kind of TM fuzzy match. Even though the test only involved a small number of translators and a small amount of work, it was done with care to ensure the translators saw a mixture or MT, TM and new segments in a way that was “blind” and then carefully measured the productivity of the translators in processing these different segments.

The results show that MT had higher productivity than TM or New segments and that on average MT produced higher productivity. (We are certain these results would have been more pronounced with an Asia Online customized system). Interestingly this study also shows that weaker translators seem to benefit more from MT and TM than the “best” translators. There are some interesting observations about the error analysis which showed that TM produced the greatest amount of final errors.

I would hypothesize that a test with more translators in the pool, and a bigger set of test data would be useful to do, as the results would establish the benefits of the use of customized MT much more clearly. It may even be useful to include “bad” or free MT to show how differently translators react to a segment that looks like it is an 85% match and to one that looks obviously like raw free MT or instant customization (50% TM match) that some use today.

The growing importance of open collaboration in B2C relationships

The Rise of Asia and BRICI which requires huge amounts of new content in new languages

These forces, together amount to a shift towards more dynamic content, and increase the need to handle streaming flows of information that simply cannot be done without more automation and MT.

MT: the new 'lingua franca' is a fascinating perspective by Nicholas Oster, a historian of world languages on how MT is enabling linguistic diversity on the Internet.

“Between 2000 and 2009, Arabic on the internet grew twentyfold, Chinese x20, Portuguese x9, Spanish X7 and French x6, while content in English ‘only’ tripled. Proportionally, then, English is declining in importance relatively quickly.“The main story of growth on the Internet … is of linguistic diversity, not concentration.”

Ostler sees a key role for MT in this new environment. Just as the print revolution changed the ‘ground rules of communication’ in 16th century Europe, he expects that language and translation technology will revolutionize global communications tomorrow, removing the need for a ‘single lingua franca for all who wish to participate directly in the main international conversation.’

Translation errors or nuances in both humans and computers can naturally have an important impact. But there is no point in dismissing MT by judging it by some presumed norm of ‘perfect’ human translation. MT is a revolutionary tool that can help the world communicate better. TAUS will be welcoming Nicholas Ostler as a speaker at the upcoming TAUS European Summit on May 31 – June 1 in Paris.

This article provides some interesting feedback for those who insist that MT only has value when it approaches human quality, and since MT rarely reaches human quality it has very limited value. In this study, English news was translated into FIGS by MT, but users were always given access to the English source. The study measures the usefulness of the MT in the context of assessed translation quality as shown below andinterestingly MT is considered useful even when the quality falls short of excellence. Since this study was performed some time ago we would assume that the usefulness curve continues to shift upwards, driven by improving MT quality, whatever some translators may think about the quality.

The graph shows that although the machine translation quality was evaluated as being far from perfect, the translation’s usefulness was regarded as higher than its quality. However, this applies only when translation quality is above certain threshold. Bad or poor quality machine translations are naturally deemed as useless.

This result confirms what many MT proponents have themselves experienced. Pure MT can be rough – often obscure, frequently humorous – but it can be useful. If one really has little facility in the source language, pure MT translations, however clumsy, can be a boon to understanding and, by extension, to productivity.

The graph below illustrates the breakdown of responses to the question, “How would you rate the overall quality of the newsletter translation?” by language group. Note that Germans felt the quality was more lacking, possibly because the MT was poorer in quality or possibly because they had higher expectations. It is actually well known in the MT community that German <> English is more difficult than English <> Romance languages.

When we segment answers to the question, “How would you rate the usefulness of the newsletter translation?” by the respondents’ English ability, we see an even stronger vote in favor of MT by the two lower groups. Thus users who had a self-measured poorer English ability, found the MT much more useful. In fact even many who responded has having “Good” English ability found the MT very useful or essential.

There have also been some interesting discussions in LinkedIn that cover the dialogue and tension between translators and MT advocates and also expose some of the hyperbole that some MT enthusiasts are prone to. While the discussion does meander between translator emotions about plans to “eliminate” them and less than scrupulous business practices by some MT vendors, it is an interesting thread. In their rush to get on the technology bandwagon some LSPs may overlook the privacy and data security issues that they inadvertently agree to when they use instant Moses and DIY kits. So caveat emptor.

Interview with Translator David Bellos: author and award-winning translator David Bellos knows a thing or two about translation would be an understatement. With over 40 years of experience, he has achieved international recognition for his works as a translator and biographer and has an impressive list of acclaimed publications to his name.

Some interesting excerpts from the interview:

“What I expect is that machines will allow the demand for translation to carry on growing, and for translation to become an ever more integral part of the world we live in.

However, since there are almost 49 million translation directions between all the languages in the world and there is never going to be a 49-million-fold community of translators, machines might well be a useful adjunct to actual translation for many of the under served directions that exist.