LINGUIST List 14.2534

Wed Sep 24 2003

Review: Computational Ling: Ravin & Leacock (2002)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>

What follows is a review or discussion note contributed to our Book
Discussion Forum. We expect discussions to be informal and
interactive; and the author of the book discussed is cordially invited
to join in.
If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for review." Then contact
Simin Karimi at siminlinguistlist.org.

Ravin, Yael and Leacock, Claudia, ed. (2002, paperback ed., 1st ed.
2000) Polysemy: Theoretical and Computational Approaches. Oxford
University Press.
Announced at http://linguistlist.org/issues/13/13-3251.html
Eleni Koutsomitopoulou, Georgetown University, Washington DC, and
LexisNexis Butterworths Tolley London, UK.
DESCRIPTION OF THE BOOK
This book is a broad survey of the issue of polysemy in theoretical
and computational linguistics. It is a collection of 11 papers
including an overview of the subject by Ravin & Leacock.
What each paper is about: Each paper in this edition sheds light on a
different aspect of this multifarious issue. The theoretical
approaches deal with the issue of polysemy as part of semantics (see
the papers by Pustejovsky and Dowty), cognitive semantics (Fillmore &
Atkins) and Goddard and discourse (Cruse) and grammar (Fellbaum). The
computational approaches cover almost the entire spectrum of
computational methodologies: from lexical solutions ala WordNet, to
NLP and connectionism.
Ravin & Leacock's overview is a thorough survey of the issue and a
preliminary introduction to the various approaches that are presented
in the book. For instance, in the editors' review, polysemy is
discussed vis-à-vis homonymy and indeterminacy. Also discussed is the
role of context in sense disambiguation, as well as the various
underlying (formal as well as cognitive semantic) theories of meaning
and computational practices for word sense disambiguation.
Cruse's paper focuses on the role of context in polysemy, degrees of
word dependency on context, and semantic discontinuity and
''distinctness''. Words that are stand-alone semantically (enough to
be relatively unaffected of context) are called ''discrete words'',
whereas words of lower ''semantic density'' are more easily affected
and defined by context. Polysemy in this paper is explored from a
lexical-semantic point of view as the result of a ''wide spectrum of
possibilities for context-dependency'' for individual words. The paper
is a great typology of context-word relationships with plenty of
examples. An interesting ramification of the lexical-semantic
perspective is that antonyms and hyponyms cannot assert
context-independent meaning, or, worst, there is no such thing as
absolute hyponymic or absolute antonymic sense/term. Cruse jokingly
calls this new realization about word meaning the ''soft semantics''
which is definitely on a par with structuralism and perhaps even
formalism. At the same time, Cruse also appears pessimistic about
prototype theories of meaning, as prototypes are again representations
of a chaotically behaving system of word meaning.
Fellbaum discusses ''autotroponymy'' (dubbed as polysemy) (from the
Greek ''tropos'' which means ''manner'') in the English verb and noun
systems. She argues that in English some verbs refer to specific
ways/manners of performing actions denoted by other verbs
(''stammering'' for ''talking'', ''sneaking'' for ''walking''
etc). She points out that the ''manner'' relation between verbs is
highly polysemous in the English verb system when compared, for
instance, to the semantic relation of causative verbs to the
corresponding inchoatives (John opened the door. The door opened.)
This paper is a typology of changes in syntactic behavior in alignment
with the various meanings of polysemous verb and noun forms (The kids
behaved. vs. The kids behaved badly.). An interesting aspect of this
study is that it examines polysemy/autotroponymy as the conflation
between a ''semantically specified sense'' and its ''more general
superordinate''. The troponyms (i.e. the polysemous terms) differ from
their homophonous superordinates in their syntactic arguments. They
also differ from their co-troponyms either in their syntactic
properties or in their particular lexicalization ways (or both). For
instance compare ''behave'' (semantically specific sense) with
''behave well/bad/etc'' (superordinate/troponym) and ''be a
good/bad/etc boy'' (co-troponym).
Pustejovski zooms in the issue of argument structure vis-a-via
polysemy within his generative lexicon theory. He argues that the
known phenomenon of lexical shadowing typically occurring in the case
of cognate object verbs such as ''butter'' (butter the bread) and
''dance'' (dance a dance) also shows up in other classes of verbs such
as those noted in Fillmore and Atkins (1992) where the expression of
an argument completely shadows the expression of another argument to
the verb (risk my health/life -- risk illness/death).
Pustejovski also discusses various types of relations as denoted by
verb argument structures, such as ''containment relation'' ((in a)
book, (on a) disc etc) and ''complex relation'' (read the book, read
the articles, read the articles in the book, read the book of
articles). Since polysemy from this point of view refers to the
semantic nuances that are due to the presence and various
configurations in the argument structure, Pustejovksi also proposes a
typology of ''optionality'' of arguments, which defines the types of
arguments that are optionally expressed in a predicate. The article
includes a general overview of the basic premises of the author's
theoretical framework. Although this is a highly technical discussion
that presupposes a fair amount of familiarity of the reader with
Pustejovski's particular theory of generative Lexicon (1991, 1995),
the article is relatively simple conceptually and the points it makes
are well-known in literature.
Fillmore and Atkins provide a lexicographic analysis of word sense
variety by examining the contents and structure of four British
English language dictionaries (CIDE, COBUILD, LDOCE, OALD). They make
the point that the number of different sense corresponding to a unique
term in actual corpora far exceeds the number of sense variations
pinpointed in the Dictionaries. Also absent in the dictionaries
according to Fillmore & Atkins are metaphorical senses of terms. This
study also includes crosslinguistic data by examining ''matching
senses'' of a term by its equivalence in bilingual corpora. They close
by criticizing traditional lexical semantics attempts to word sense
disambiguation and proposing the methods of word sense analysis of the
Berkeley FrameNet project (See general info about the project at:
http://www.icsi.berkeley.edu/~framenet/book/FNIntro.html ).
Dowty casts doubts on the traditional view, that he calls the
''fallacy of argument alternation''. According to this fallacy
differing constructions (syntactic forms) may express identical
intended meanings and correspond to identical propositions, an
argument for the universal nature of semantic structure in natural
language. Dowty instead points out that syntactic permutations serve
to convey significant semantic or conceptual variations, and hence
they should not be discounted in the name of propositional
equivalence. To prove his point he examines a number of argument
permutation phenomena such as passivization, tough- construction,
middle construction, raising etc. In particular, he focuses on
comparing constructions such as the intransitive ''swarm''-
alternation (Bees swarm in the garden. The garden swarms with bees.)
and the transitive ''spray-load''-alternation (Mary sprayed paint on
the wall. Mary sprayed the wall with paint. Mary loaded hay onto the
truck. Mary loaded the truck with hay.) from Fillmore 1968. After
presenting the superficial commonality between these two different
types of constructions, Dowty argues that they are fundamentally
different and focuses on the former. The author goes as far as
claiming that the intransitive ''swarm''-alternation is a phenomenon
of semantic extension and offers some pertinent historical linguistic
evidence from German and French languages.
Goddard is a proponent of the Wiersbicka's Natural Semantic
Metalanguage theory (NSM). He points out the capacity of NSM theory to
tackle both word-level and syntactic-level polysemy. The entire theory
is based on the notion of semantic primes that supposedly safeguard
the lexicon from obscurity and circularity in lexical sense
definition. Substitution is one of the tests for the validity of
periphrases used to express alternate meanings corresponding to a
unique term in the lexicon. The paper claims to offer a ''semantic
methodology'' for lexical definition and consequently for
polysemy. The papers makes the interesting point that grammatical
constructions may also manifest polysemy, and it proposes a treatment
for figurative language (within the same NSM framework) in relation to
polysemy. A drawback of the NSM approach seems to be that meaning is
treated as a tractable phenomenon and hence it is considered
''accessible, concrete, and determinate'', a perception that classical
meaning typologies have repeatedly failed to prove true.
In computational linguistics, the treatment of polysemy falls into the
class of issues that are tackled under the term ''word sense
disambiguation''. Unlike their theoretical counterparts, the
computational approaches are more interested in the development of
efficient methods for word sense disambiguation rather than justifying
the various historical, stylistic and theoretical issues surrounding
polysemy.
Miller & Leacock focus on lexical representations for sentence
processing. They argue that what is missing from dictionaries and
semantic theories is a ''satisfactory treatment of the lexical aspects
of sentence processing''. They deduce this problem to an examination
of various methods for a more efficient representation of
context. ''Local context'' is defined primarily by the syntactic
categories of a term, i.e. the noun category of contexts, the verb
category of contexts etc. Some terms may belong to more than one
syntactic category and hence to more than one local contexts. Simple
rule-based systems may address this issue. Miller & Leacock recognize
the role of semantic information in determining the local context of a
term's sense, and the fact that semantic information is not always
present in the local context. For this reason they define a broader or
''topical context''. Topical context is defined as the general topic
of a text or discourse, and the same term may mean different things as
topic in different contexts.
For instance, consider the different meanings of ''shot'' in
marksmanship, in a chat with a bartender, or a photographer, in a
hospital, or in the context of a game of golf or basketball. The basic
hypothesis of the authors is that if the linguistic context provides a
clue about the primary discourse topic we can easily decide on the
intended meaning of ''shot'' in the particular linguistic
context. They then proceed to define how people define the topic of a
discourse, and present some theories that determine the topic based on
a statistical classification of the vocabularies and sub-vocabularies
of a polysemous word in a discourse (although initial attempts have
been applied to homonymous terms such as ''crane'' and ''bass''). The
problem is then that polysemy allows for finer distinctions between
senses than that in the case of homonymy (for instance, ''bass'' is
not only a distinction between fish and deep voice but also between
deep voice and the man who carries it, the lowest frequencies in
musical harmony, a bass horn or a bass violin and so on). In other
words, in the case of polysemy the information of topical context
alone may not be always sufficient.
Additional experimental comparison of three different statistical
classifiers (a Bayesian classifier, a content-vector and a back-
propagation neural network) showed that as the number of different
senses of a term increase so does the difficulty of the algorithm to
make accurate distinctions between them. In addition, some contexts
seem to be inherently harder to identify than others. Compared to
humans the three tested classifiers performed at about the same level
of accuracy. In addition, topical information was proved to be useful
when the polysemous terms were presented in sentences rather than in
the context of co-occurring terms. Combined local and topical
information methods may yield better results but still not as good as
those yielded in human comprehension tests. The authors suggest that
research in sentence processing in particular in argument structure
and coreference would help elucidate sense disambiguation issues.
Stevenson and Wilks are concerned with polysemy (or Word Sense
Disambiguation, WSD) in large corpora. They particularly point out
that evaluation methods for WSD are usually based on small trial
selection of text versus large corpora with dubious generality of
results and performance. Another problem with current approaches that
Stevenson and Wilks point out is the increased chances to meet novel
word senses in large corpora, senses not yet lexicalized in existing
dictionaries. Finally, the authors of this paper recognize that most
research in NLP may use different ways of encoding or conceptualizing
information, but in the case of WSD the variety of tools and
techniques applied seem to be taken as representing different types of
WSD information themselves. The above three issues render WSD a hard
problem to solve.
For their experiments the authors used the machine-readable version of
the LDOCE dictionary in order to make use of both a large-scale
inventory of senses and a broad knowledge base for sense
disambiguation. In the process of analyzing the lexical knowledge
sources they faced the question of what is context to which they
replied by selecting ''larger linguistic structures'' such as
sentences and/or entire discourses, that offer the pertinent
topics. For their experiments they also focused on the issue of
combining various knowledge sources and they used a ''memory-based
learning algorithm'' that provided a filter that removed senses from
consideration thereby simplifying the WSD tasks, and also made use of
various partial taggers which uses different knowledge sources from
the lexicon in order to suggest a set of possible senses for an
ambiguous term. For the evaluation of their experimental results they
merged a WordNet list of manually tagged content words with the
ontological hierarchy of the LDOCE dictionary, which they used as a
''gold standard'' of texts.
The authors conclude that both high-level (word-level) and
fine-grained (sense-level) WSD is achieved at a level of over 90%
accuracy with the high-level WSD tests obtaining higher accuracy
between the two.
Dolan, Vanderwende and Richardson present MindNet, part of MS-NLP,
which is a ''broad-coverage, application-agnostic'' NLP system
developed by Microsoft Research. MindNet ''provides the representation
capabilities needed to capture sense modulation''. Acquisition of new
senses and new words is also possible via MindNet. Context is crucial
in MindNEt and understanding of the meaning of a term equals to
''producing a response that has been tied to linguistically similar
occurrences of that word.'' The system learns by example (it is
characterized as a ''highly processed example base''). Inferencing via
structured representations is also possible. These representations are
''directed labelled graphs'' that help overcome the limitations of
word order and take advantage of hierarchical relationships outside
the realms of syntactic relations (e.g. in order to show the indirect
relationship between ''car'' and ''truck'', they use the graph: car-
Hypernym -> vehicle <- Hypernym-truck, where ''car'' and ''truck'' are
connected by virtue of their relationship to the same hypernym
''vehicle''). The paths between the terms are weighted in a way that
reflects their salience. A known current weakness of Mind-Net is that
it is a static representation of relations with fixed weights that
depend on the current associations the system contains. This means
that anything beyond the level of individual words and at best
sentences (for instance, inter-sentential relations and hence context
and discourse) lies beyond the capabilities of Mind-Net.
Schutze's paper offers us a glimpse at the phenomenon of polysemy from
a connectionist point of view. The author reminds us that models such
as those of Rumelhart et al. 1986 and McClelland et al. 1986 aim at
first to design a disambiguation algorithm that is psychologically
plausible and is also applicable at a large-scale. The author explains
the notion of semantic priming (''flower'' will be read more quickly
after the sentence ''They held the rose'' was presented vs. a sentence
like ''They all rose.'' containing an homonymous term) for sentence
processing and connectionist methods for disambiguation. He then
proceeds to explain word vectors, context vectors and sense vectors of
activations in his proposed algorithm. He concludes that similarity in
contexts is a crucial factor for determining word-level similarity and
hence a reliable guide for grouping (clustering) and disambiguation of
word-level senses. This paper presents promising work in polysemy in a
manner that is psychologically plausible, but it fails to view
polysemy as a generalized phenomenon affecting natural language not
only at word-level but also at a level of sentences and discourses.
Finally, each paper comes with a wealth of bibliographical references
pertinent to the particular model and strategy of analysis.
CRITICAL EVALUATION
The book contains a wealth of useful information (principles, data
configurations, methods, strategies and viewpoints) on the manifold
problem of word sense ambiguation.
The theoretical linguistics papers focus on offering a typology of
linguistic data, which they examine and then group in order to
pinpoint the apparent regularities in them. At times (as in
Pustejovski's work) theoretical approaches additionally offer a formal
descriptive representation of the regularities in the data. The
undoubted merit of such a theoretical approach to polysemy lies in the
precision of the description and analysis of an otherwise not-so-
systematic and homogeneous phenomenon in natural language. The obvious
defect of such an approach lies in the nature of polysemy in natural
language. Rules cannot adequately describe future behavior or
presently undetected patterns in the data, since new rules need to be
invented to encompass new data. In addition, rules have no explanatory
power or value and describing natural language phenomena such as
polysemy in formal rules offers no real understanding of the way
language works.
The computational linguistics papers in this book focus on the
applicability of various methods and tools proposed from various
theoretical and computational sources and they render any pertinent
issues of performance and evaluation prominent in research. Word sense
disambiguation traditionally has been examined within the realms of
word-level analysis, lexica, corpora, thesauri, knowledge bases and
related tools and representations for paradigmatic relations. Most
researchers in polysemy point out this obvious inadequacy of the
computational approach, i.e. that it fails to take into account
crucial factors such as (linguistic and pragmatic) context, and
instead it is tied to a word-level partial solution of the
problem. Computational systems and theories that incorporate
disambiguation efforts as part of the set of their offering tool-set
are usually more successful for this reason. In such systems, broader
linguistic context is taken into account during the disambiguation
process.
ABOUT THE REVIEWER
Eleni Koutsomitopoulou is a PhD candidate in Computational Linguistics
at Georgetown University (Washington DC) and a senior Indexing Analyst
at LexisNexis Butterworths Tolley in London, Great Britain, where she
currently lives. Her main research interests include Neural network
models for Natural Language Processing, cognitive linguistics,
indexing and pattern recognition applications for natural language.