MT & MAT

Machine Translation

The perfect translation system, be it a human or machine, does not exist. Moreover, a well-trained human translator is still produces better material than the most expensive, specially trained computer-based translation system. However, the gap between the two is narrowing, and so the question becomes whether or not machine translation equivalent to what an expert human translator can produce is in principle possible.

This question is not often asked, except in certain research laboratories and amongst philosophers of artificial intelligence. This question might seem pointless, or impossible to answer. But given that developing MT systems will involve hundreds or thousands of people working for years or perhaps even decades and spending billions of dollars in the process, a little theory seems like a good idea.

The arguments against machine translation state that language is too subtle and complex for a computer to understand and translate. There are just too many variables to consider in any given sentence. Linguistic communication relies too heavily on deep context and real-world knowledge to be handled by a computer. Computers will never be fast enough or powerful enough to deal with the immense requirements of language translation. Computers will have to understand what they read in order to translate, and therefore will have to be sentient themselves, in some fashion similar to what we humans experience as self-awareness. And perhaps the most fundamental argument against machine translation lies is the claim that the human brain is capable of actions and behaviors that cannot be reduced to algorithms.

However, there is an argument for machine translation being possible in theory. It is sufficiently powerful and compelling to obviate all the above arguments. In simple terms, the argument for machine translation goes like this: "If that three-pound piece of meat in your head can do it, why not a hunk of technology?" In essence, the proof for machine translation being possible in principle is sitting in every translator's head. That three-pound pulpy grayish mass that we call the brain allows a translator to translate. A brain is an organic machine consisting of roughly one-hundred billion cells, neurons and glial cells, each with a multitude of connections to other neurons, communicating chemically with each other through synapses whose activities are modulated by neurotransmitters. Regardless of how little is actually understood about the brain, and regardless of the obvious deficiencies of my description above, the brain remains a finite object, its individual neurons can assume only two states (firing or not firing), and there is no research or even theory that suggests the brain cannot be modeled algorithmically. As such, the brain can be considered a machine, or if you prefer a less mechanistic metaphor, a piece of organic technology, which can be understood and reproduced. Therefore, a computer that translates as well as a human translator is in principle possible.

But So What?

Although theory is important, what can actually be done in the real world is ultimately what matters. Right now, and for the foreseeable future, machine translation is only viable for certain uses with certain types of material in certain language pairs. Although a handful of companies around the world actively create and market machine translation software, it is worth noting that Microsoft and the other major players are staying away, suggesting that the technology is just not mature enough.

But it is maturing. Systran, possibly the largest producer of MT packages, provides much of the automated online translations we see on the Web, in addition to supposedly supplying the National Security Agency with their machine translation systems. Simple automated translation software is now available for under $100 in most major languages, though the output is at best useful for getting the gist of source material, and certainly will not replace a human translator.

Many of the problem as originally predicted to delay or derail machine translation has been for the most part solved. Computers are plenty fast, and processing speed will continue to double every 18 months as per Moore's Law, giving us a computer with raw processing power equivalent to a human brain in less than two decades. Memory, in the form of RAM or hard disk space, is so cheap as to be virtually limitless for ordinary business applications. OCR is now fast and accurate, at least with alphabet-based languages, thus eliminating the problem of getting a text into a computer on the rare occasion that the document is not already available in electronic form.

Furthermore, research in the past decade has produced new, viable approaches for the difficult aspects of machine translation. Statistical modeling of natural language, large corpus-based reference databases, and improved syntax generation mean that output from today's machine translation systems is no longer so easily dismissed as useless.

Ultimately, the market decides what is good enough for the market. Although professional translators may decry the low quality of machine translation systems, using such phrases as "word salad" or quoting famous stories 1950s and 60s about early U.S. government attempts to translate Russian material, machine translation nowadays already has a place in the translation industry, and its place is likely to expand.

Good enough means acceptable to those who want the translation. Consider this: a company wants all the specifications for an automobile translated from English into French, Spanish, German, Italian, Dutch, Portuguese, Chinese, and Japanese. The specifications total over 5,000 pages, approximately 1 million words. Assume that a translator can do 5,000 words per day (I realize this is high, but assume it anyway). It will therefore take 200 days of work to produce the translation per language. A team of ten translators will still take 20 days, plus the time to unify the text after the translators are finished. At $0.25 per word (what the agency might charge the automobile company), the total cost per language would be $250,000. And these numbers are for each language involved. Therefore, if a machine system can translate the information at 20,000 words per hour, we see that the job might be done in a little over two days, plus clean-up time. And the computer plus software will cost considerably less, maybe $3,000 for the computer and $4,000 for the software for each language pair.

Of course, clean-up time is where the argument really occurs these days. In carefully prepared documents with controlled language, well-defined subject matter, and good existing terminology references, the amount of clean-up is sufficiently small that a machine translation system would be very efficient. Conversely, if the material represents colloquial language, with cultural nuances, slang, and neologisms, the output from a machine translation will be useless. So the problem of quality from a machine translation system remains the major issue, because the other two factors important a business, cost and speed, are where machine translation excels.

It is important to remember that the majority of material translators work on is information, ideas, or beliefs on a particular subject, and most often the material is nothing more than instructions, directions, or explanations, with a minimum of style of literary content. The material is generally bland and dry, for instance software or hardware manuals, engineering specifications, scientific or other technical research material, financial or corporate reports, fiscal analyses, clinical trial reports, patents, and so forth. Accurately rendering the subtle style of a source text is rarely an issue that translators struggle with, or even discuss much amongst themselves. So if the current human translators don't have to deal with the subtleties and nuances of well-written literary prose, then neither will the machine systems.

So for businesses considering the use of machine translation, the decision becomes a cost-benefit analysis. Although the initial cost of introducing a well-designed, and customizable machine translation systems such as is offered by Systran are still prohibitive for many businesses, those that make the initial investment often recoup it fairly quickly, given the cost of human-based translation. Further, the cost of such systems will come down, making them more accessible to the majority of businesses, and thus putting greater pressure on human translators.

At some distant point in the future, I believe, translation will be performed by machines in all but the most esoteric, obscure cases. We are at this time, however, nowhere near that point. The transition will be steady, providing many opportunities for translators and linguists to earn a living developing, testing, deploying, and supporting such systems. Humans still make much clearer, more informed, and more accurate decisions about the meaning of written language than computers do, and so will remain an important part of the translation process for the foreseeable future.

MAT

Currently, Machine Assistant Translation (MAT) is the hot topic in the translation industry, particularly in the localization field. No longer an esoteric application with a steep learning curve and little real-world value, MAT is now a part of the everyday toolset used by most translators, so much so that the translators entering the industry without the skills, and freelancers entering the industry without the actual software, are a great disadvantage.

Products like Trados, Corel Catalyst, and Aatril Software's Deja Vu lead the MAT market at present. Each tool has its relative strengths, though Trados currently holds about 80% of the market overall. All of the tools support all major languages, including double-byte languages like Japanese and Chinese, offer terminology and glossary management tools, provide document version control, work with a variety of common document formats, and use the MMX standard for translation memories. Trados seems more focused on the localization sector, while Catalyst offers built-in features resembling machine translation, with fuzzy matches and suggestions offered to the translator to approve or reject.

This is just the beginning. Future systems will offer that much more. Not only will they come with vast pools of sample translations mined from the terabytes of such material already available and extensive terminology and glossary listings, but they will also offer intelligent matching of untranslated text that far outperforms today's best 'fuzzy' guesses, real-time collaboration between non-local sites via the Internet, constant and automatic updating of sample translations and word lists via bot searches of the Web, and so forth.

The future translator will not sit at a desk with a printed copy of a text to one side of the keyboard and some dictionaries or other resources to the other. In fact many translators already work primarily if not exclusively with electronic source material and use at least some Web-based resources for terminology research. Instead future translators will likely have a live link to their client's web site, working directly in real time with the other translators and project manager involved in the project. They will prepare the source material for 'translation' by the MAT system, then monitor the output and work on the parts that the system cannot handle. They will also perform considerable editing, proof-reading, and QA work, along with developing and maintaining glossaries, sample translation databases, and other necessary resources for the MAT system.

There are, however, several problems. The first is cost. Not only is the software itself quite expensive for freelance translators to add to their office arsenal, but also it requires more RAM, more hard disc space, and a large monitor to be used efficiently. In addition, a scanner with good OCR software is extremely useful. This whole bundle could run as much as $3000, depending on which combination of hardware and software one opts for. This is a substantial investment for a freelance translator, particularly since many translation vendors prefer to pay translators who use MAT or MT software less than they otherwise would. In fact, some translators who use MAT go as far as not telling their clients about it so as to avoid the issue of reduced rates when using MAT. In sum, cost reduction has to become a focus for MAT, particularly for Trados, whose product is currently the most expensive.

Second is the question of content rights. Translators are independent contractors who translate on a work-for-hire basis. They do not own what they produce. If a translator creates a glossary or terminology list in an MAT package while doing a translation for a client, who owns that list? If the translator cannot recycle or reuse such lists, much of the value of MAT will be lost. The same can be said for the organizations that want the translations done, too. Moreover, how would a translation vendor know if I were reusing a terminology list that I created while working for them? And should they care? Such problems are common with Internet and computer technologies. Just consider the issues surrounding MP3 if you are uncertain as to the arguments on both sides. Further, anyone who can access translation memories for a particular translation essentially has the whole translation, for free. This creates considerable problems when working on proprietary or secret material. Solutions to these problems are forthcoming, and will allow far greater collaboration between translators and sharing of resources.

The third and final problem is translators themselves. Many translators seem resistant to MAT because of the paradigm outlined above. They see translation as a highly intuitive, creative process, one which involves careful analysis of the source text, meticulous research in "quaint and curious volumes of forgotten lore," and then creative writing to formulate a target text that balances form and function. MAT takes much of this away, they believe. It is too automated, too computerized. Such translators are not necessarily Luddites; many are resisting a tendency in the industry to put speed above everything else. Translators thrive on the challenge of creating a high-quality translation; MAT is perceived by many as a way to crank out in very short times a translation of at best marginal quality. "Good enough so that we don’t get sued" is how one localization manager put it. So translators are being forced to adapt, and many won't. The good people who the industry loses will be replaced by those who can accept working with MAT software. It's the way the industry is going, and there's no turning back.

Evolution

Because the translation profession is intimately connected with the high-tech sector, all of the rapid changes we have seen with the advent of the Web, the spread of powerful home computers, and the development of expert systems are directly impacting translators. Unfortunately, many translators have a liberal arts background with little interest in or comfort with technology. Many feel frustrated as they realize that the only path to a stable future in the translation profession involves daily use of computers with sophisticated software tools to work on the translation of generally technical material. This is how the industry is evolving, and the individual who opposes it is doomed to extinction.

That said, there are esoteric areas within the translation profession there are currently not subject to the pressures of technology. The most obvious is literary translation, typically done by university professors with doctorates in language and literature. Also, the intelligence community relies heavily on human translators for certain types of work, because the machine systems simply are not good enough. Last, original research in the sciences is often unavailable in electronic format, involves many new terms and concepts, and is not at all similar to prior material, thus making it almost immune from the benefits of MAT software.

As the MT and MAT software technologies discussed in this article evolve, there will be fewer areas within the translation profession that remain untouched. Translators who have not already begun the process of mastering such technology and adapting to using it on a daily basis should begin immediately. Finally, the most stable, lucrative jobs with the greatest long-term stability will go to the individuals who can not only use such system is to expedite the translation process, but can also train, evaluate, and support such systems. If you are planning on a long stint in the translation profession, these are skills you will have to develop.