New from Cambridge University Press!

Sociolinguistics from the Periphery "presents a fascinating book about change: shifting political, economic and cultural conditions; ephemeral, sometimes even seasonal, multilingualism; and altered imaginaries for minority and indigenous languages and their users."

The author focuses on one particular type of idiomatic expression, idiomaticVerb Phrases (iVPs) in German and their translation to English. The authorshows that the METIS-II system does the automatic translation with the help of abilingual dictionary, a monolingual corpus in the target language, and fourtypes of manually constructed morphosyntactic rules. Three corpora from threedifferent resources are used to evaluate the results. The first corpus consistsof 80 sentences sampled from Europarl (EP). The second has 275 sentencesfiltered out from the web (MDS) and the last consists of 131 sentencesconstructed from a part of the digital lexicon of the German language in the20th Century (DWDS). With a German-English idiom dictionary of 871 entries, thesystem achieves over 80% precision, recall and F1 for all these three evaluationcorpora.

The book consists of eleven chapters, which can be categorized into fivesections. The first chapter introduces the definition of translation, and themotivation and contribution of the current research. The next three chaptersreview the literature on Machine Translation (MT). Chapter five extensivelyreviews the theories of idiomatic expressions. From chapter six to ten, theauthor explains her experiments on MT for idiomatic expressions. Chapter elevenis the conclusion and discussion of further research.

Chapter two of this book describes the history of MT from the perspective ofprojects, companies and patents related to MT technology. In chapter three, theauthor introduces a brief history of Example-based Machine Translation (EBMT)and compares it to another two popular MT frameworks, Rule-based MachineTranslation (RBMT) and Statistical Machine Translation (SMT). The authorintroduces EBMT as a system between RBMT and SMT. Similar to RBMT, itstranslation rules are manually extracted. However, unlike RBMT, such translationknowledge usually serves as templates and can be used repeatedly in the system. EBMT is similar to SMT in the sense that EBMT uses bilingual or monolingualcorpora to extract knowledge about sentence formation. However, it does not usestatistical models to decode the alignment or generate the translation.

In chapter five, the author reviews the broad literature on theories of idioms. As stated in various previous works, it is concluded that idioms are mainlymulti-word expressions (MWEs) and no single universal definition works for allof them. Idioms can be compositional or non-compositional, continuous andnon-continuous. In addition, idioms are also limitless since new idioms areappearing in languages daily. These properties of idioms pose a substantialchallenge for recognizing and translating them automatically.

Chapters six to ten explain the idiom treatment experiments conducted. Thesource idioms are iVPs in German and the target language is English. Theseidioms are either continuous or dis-continuous within a sentence. In chapterseven, the author introduces experiments with three commercial MT systems andconcludes that these systems cannot identify discontinuous idioms. In chaptereight, she describes an RBMT system, CAT2, and conducts a small-scale experimentwith 58 sentences. Since her evaluation achieves 100% precision and recall, sheconcludes that CAT2 can handle iVP translation successfully. Finally inchapters nine and ten, the author discusses how the EBMT system, METIS-II,treats iVP idioms with a German-English bilingual dictionary, four manuallyconstructed morphosyntactic rules and a monolingual corpus in English. Thesystem assumes that the idioms are listed in the bilingual dictionary. For acontinuous idiom, only one rule is necessary to identify it within the sentenceand then do the dictionary look-up to translation. The other three rules areused to handle the cases where the iVPs are discontinuous within the sentence.Sentences containing discontinuous idioms are constructed manually according tothe German topological field model in order to be identified by themorphosyntactic rules. The author conducted three small-scale evaluations onthree different data sets to evaluate the system, and the experiments show morethan 80% precision and recall for all experiments and for both continuous anddiscontinuous iVPs.

EVALUATION

This book is structured clearly, from theoretical review to system descriptionand finally to system comparison and evaluation. It offers the reader arelatively comprehensive view of theories of idioms, provides a brief history ofEBMT and introduces different stages to identify and translate idioms in one ofthese EBMT systems. The author lists ample iVP examples in German and showssystematically how the EBMT system can translate them automatically. However,the method offered in this book only focuses on one specific idiom type, iVPs, and the sizes of the evaluation corpora used in this study are all very small.The whole thesis would be significantly strengthened if the author would showhow the method used in the system to translate iVPs can be adapted to translateother idiomatic phrases, and evaluated it with larger corpora.

The book identifies several key challenges in MT for idiom translation. However,the method described in this book does not seem to provide a general approach totackle these challenges. The first key challenge is the Out of Vocabulary (OOV)problem related to idioms. As mentioned in chapter five of this book, newidioms are constantly appearing in languages through various communicationchannels and updating these OOV idioms within any MT systems is a non-trivialtask. However, the method provided in this book assumes the existence of allidioms in the bilingual dictionary. To update OOV idioms, labor-intensive manualmaintenance of electronic dictionaries is required constantly within the system. In addition, the morphosyntactic rules within the system are also manuallyconstructed and different types of idioms need different rules. This constraintalso limits the scalability and adaptability of the proposed method. The secondchallenge mentioned in this book is to distinguish the literal and idiomaticusage of idioms, and the author suggests manually constructing simple heuristicsand matching rules to handle this phenomenon. Similar to the approach offeredby the author to solve the OOV problem, manually constructing rules for eachidiom usage is hard and very labor intensive. The author neglects solutions tothese challenges addressed in STM literature which offer more robustalternatives to tackle these challenges in this field.

One final note: there are some incongruities between certain chapters of thisbook. For example, chapter four about Translation Memory, which is only remotelyrelated to the main thesis, could be incorporated in the previous chapter on thehistory of EBMT. Chapter six, which is related to a historical view on idiomtreatment within MT systems, could also be included in the chapter on thehistory of EBMT. In addition, chapter six lists several schemes on thetranslation equivalence between source and target language. However, there is noclear description in later chapters to show which scheme is used in the currentstudy.

''Idiom Treatment Experiments in Machine Translation'' offers a specific approachto handle a specific type of idioms within the framework of EBMT. It providesvaluable resources such as heuristics and rule templates for EBMT. However, theproposed method, which consists of manually constructing rules and heuristicsfor only one type of idioms in German, is not flexible enough to adapt totranslate other types of idioms, and is labor-intensive to maintain as well. Ifthe book could survey some techniques used in SMT on how to tackle thesechallenges posed by idioms, it would have a bigger impact and provide thereaders a more comprehensive view on automatic idiom translation.

ABOUT THE REVIEWER:
Yuancheng Tu is a PhD student in the Department of Linguistics at the
University of Illinois at Urbana-Champaign. Her primary research interests
are Natural Language Processing (NLP), machine learning and computational
lexical semantics. She is also interested in structure learning in NLP and
Text Mining. She is now working on her PhD dissertation on recognizing and
learning of complex verb predicates, such as factive/imperative verbs,
light verb constructions and other inference rules with instantiated or
typed predicates. Her dissertation proposes a general approach to handle
these complex verb predicates within the framework of lexical and
relational similarities and to use them in real NLP applications such as
the task of Textual Entailment.