LINGUIST List 12.970

Fri Apr 6 2001

Review: Learnability in Optimality Theory

Editor for this issue: Terence Langendoen <terrylinguistlist.org>

What follows is another discussion note contributed to our Book Discussion
Forum. We expect these discussions to be informal and interactive; and
the author of the book discussed is cordially invited to join in.
If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for discussion." (This means that
the publisher has sent us a review copy.) Then contact Simin Karimi at
siminlinguistlist.org or Terry Langendoen at terrylinguistlist.org.

Tesar, Bruce and Paul Smolensky. 2000. Learnability in
Optimality Theory. Cambridge, MA: MIT Press. 138 pp.
$25.00/�16.95
Reviewed by: Larry L. LaFond,
University of South Carolina, Columbia.
This small but intriguing book focuses on a particular
problem in language learning--how learners, who often
receive overtly ambiguous language data, are faced
with a serious paradox: they cannot determine a grammar's
hidden structure until they have constructed a grammar
based upon their interpretation of the overt forms they
hear, but they cannot construct a grammar without some
analysis of the hidden structure. To address this paradox,
Tesar and Smolensky (T & S) have proposed a learning
procedure where learners' first guesses at a structural
analysis are used to improve their grammar, and this
improved grammar is then able to improve the analysis.
In other words, through successive approximation,
learners acquire progressively better interpretations and
a progressively better grammar simultaneously. T & S
look to Optimality Theory (OT) for the core principles
that inform this learning strategy, and in this book they
evaluate their proposed model, Robust Interpretive
Parsing/Constraint Demotion (RIP/CDA), both for accuracy
and computational efficiency, through a series of
computer simulations and by a set of formal proofs.
The organization of the book:
Chapter 1 (18 pp.) presents the central claim of the
book--that OT provides the learning mechanism (RIP/CDA)
through which the interdependence of grammars and
structural descriptions are overcome, allowing the
learner to both assign structure and to learn grammar
at the same time. This chapter also gives a broader
context for this claim through a review of the issues
surrounding learnability and Universal Grammar, and
through a terse introduction to the tenets of OT and a
decomposition of the learning problem into several
parts--deducing hidden structure in language data,
using the data to improve the existing model, assigning
an improved hidden structure to the original overt data,
and once again learning the grammar (using a 'robust'
parser). This divides the problem into one of parsing
and grammar learning.
Chapter 2 (19 pp.) develops the overview of OT begun in
Chapter 1, and illustrates the tenets of OT through a
phonological example, an analysis of basic CV syllable
structure, and a syntactic example, based on the
analysis of null subjects given by Grimshaw and
Samek-Lodovici (1995).
Chapter 3 (20 pp.) is devoted to a discussion of
'Constraint Demotion', i.e., that constraints violated
by grammatical structural descriptions must be demoted
(not promoted), in the total ranking of constraints,
below constraints violated by competing (ungrammatical)
structural descriptions. The same phonological and
syntactic examples used in Chapter 2 are again employed
here to demonstrate how learners use the interaction of
violable principles to converge upon the target
structure. Chapter 3 includes an important discussion
of the relationship between data complexity, the number
of constraints, and the learnability of a grammar.
T & S demonstrate that, although the total number of
possible rankings in an OT system may be quite high
with even a limited number of constraints, the
restrictiveness of the structure OT places on the
grammar permits learners to efficiently arrive at a
target grammar in a reasonable number of learning steps.
Chapter 4 (22 pp.) applies the proposed iterative
learning algorithm to the domain of metrical stress. The
goal of the chapter is to present an empirical test
demonstrating that the RIP/CD algorithm can overcome
ambiguity in overt forms. To accomplish this, T & S use
a computer simulation where 124 languages are presented
with 62 overt forms from a target language. These forms
were processed by the languages via the learning algorithm
T & S propose. Each performance of Constraint Demotion
was considered a learning step.
The results showed that 120 of the 124 simulations resulted
in a successful learning of the grammar in an average of 7
steps, a number of steps that T & S highlight is well
below the number of constraints.
Chapter 5 (11 pp.) addresses, albeit very briefly, two
central issues in language learning: first, how the learner
is constrained to select the most restrictive language
consistent with the data (subset principle) and, second,
how the language-specific inventory of lexical underlying
forms is learned. In regards to the subset principle,
T & S propose that in learners' initial hierarchies, all
markedness constraints dominate all faithfulness
constraints. In regards to the lexicon, T & S attempt to
extend the same iterative strategy used for grammar
learning/parsing to also encompass the simultaneous
learning of constraint rankings and underlying
representations of the lexicon.
Chapter 6 (6 pp.) is entitled 'Learnability and
Linguistic Theory' and serves as a concise apologetic for
the use of an OT approach to address issues of language
learning. In so doing, T & S argue that OT learning
algorithms are derived solely from general grammatical
structure and are informed by a specific theory of grammar.
They contrast this to other generic search procedures, or
to theories such as Principles and Parameters, where T & S
see conflicting needs for parameters to be independent
with restricted effects, but also explanatory, with
wide-ranging effects.
Chapter 7 (20 pp.) is quite dense, consisting primarily of
formal proofs regarding the correctness and data complexity
of CD. The focus of these proofs is to show, first, that
given an adequate data set, the RIP/CD algorithm is
guaranteed to converge upon correct ranking and, second,
that the amount of data needed to form an adequate data set
is never more than N (N-1) informative examples, where N is
the number of constraints.
Chapter 8 (18 pp.) examines production-directed parsing,
the process T & S consider responsible for learners' ability
to efficiently compute informative competing structural
descriptions. In this chapter, T & S argue that
production-directed parsing uses the same computational
procedure as robust interpretive parsing, they discuss this
use in language learning, and they supply algorithms for
performing production-directed parsing. In so doing, they
address 'parsing' as an issue not solely related to
comprehension, but as a process that more generally assigns
structure to an input and, thus, a process important for
both comprehension and production.
The book concludes with 4 pages of endnotes, divided by
chapter, a list of references (6 pp.) and an index (2 pp.).
Comments
This book represents the long-anticipated result of years
of collaborative research between the authors, and papers
published by the authors (1993, 1995, 1996, 1998).
The end result is a much clearer and accessible
presentation than any of the previous treatments. The book
now represents a solidly presented application of formal
learning theory to the problem of language acquisition.
T & S clearly view their proposal as proceeding from the
central principles of OT and, in turn, supporting OT as a
theory of language. The closeness of the ties between this
learning proposal and OT are an asset from the standpoint
of producing a coherent account of how linguistic theory
and the issue of learnability relate, but this same
closeness may be a liability for the broader acceptance
of T & S's ideas beyond an audience not already amenable
to OT.
The RIP/CD algorithm requires constraint interaction (and,
hence, violable constraints) for its operation, since
evaluation of whether a form is 'best' in comparison to its
competitors is through an operation that assesses the
number of violations incurred by a pair of candidates,
scores out marks common to the winning and losing
candidates, and demotes the constraints violated by the
winner down in the hierarchy so they can be dominated by
the constraints violated by the losing candidate. The
algorithm demotes constraints only as far as necessary and,
although the learning process operates within a hypothesis
space consisting of stratified hierarchies, the end result
is a total ranking of a hierarchy that correctly converges
on the target grammar. Since the operation of the grammar
and learnability are so closely connected in this process,
T & S's proposal is intuitively appealing.
As with many aspects of OT, research on learnability in OT
is still in its infancy, and T & S's proposal represents
not only a pioneering effort, but also one of the most
fully developed proposals to date. Other proposals for
ranking algorithms (e.g., Broihier 1995; Pulleybank and
Turkel 1995; Boersma 1997, et al.) have addressed various
problems encountered in this line of research. For
example, the 'Gradual Learning Algorithm' developed by
Boersma (1997) claims certain advantages over T & S's
proposal here, namely that it can handle free variation
and noisy learning data, and that it can account for
gradient well-formedness judgments.
All of these approaches, especially in as far as they
wish to also account for syntactic data within an OT
framework, still require further explanation concerning
the nature of input in OT. Under this system, learners
must have access to input data and GEN (McMahon 2000:52
notes this must also involve access to both candidates
and their violation marks, a problem given the paucity
of negative evidence normally available in learning data
(Kager 1999:302)), still, it is not yet fully clear
whether learners assume everything outside the input is
suboptimal, whether they begin with an unranked set of
constraints, or how variable input may provide data rich
enough for grammar learning.
These issues are not a unique challenge for the present
volume, however, and T & S have succeeded in providing a
clear presentation of OT applied to language acquisition
problems. T & S's major claim is tightly argued with few
departures, and fewer annoyances to distract the reader
from the main point (one exception readers will note is
typographical error in the text on p. 27; the reference
to 'table 1.1' in the second paragraph should read 'table
2.1'). It was perhaps the stringency of the argumentation
that sometimes leaves the reader wanting more, particularly
in the less developed 'Learnability and Linguistic Theory'
chapter of only 6 pages.
This book may find use in introductory courses, especially
as an introduction to OT (since it provides an introduction
to the theory, an explanation of its usefulness for issues
of learnability, and illustrative examples of its
application), although many will find parts of Chapters 7
and 8 rather inaccessible. This book is certainly
appropriate for graduate linguistic courses or seminars.
In such courses, it will no doubt serve a discussion-
provoking purpose as robust as the interpretive parser
it proposes.
References
Boersma, P. 1997. How we learn variation, optionality,
and probability. Ms. University of Amsterdam. ROA-221.
Broihier, K. 1995. Optimality-theoretic rankings with
tied constraints: Slavic relatives, resumptive pronouns
and learnability. Ms., Department of Brain and Cognitive
Sciences, MIT. ROA-46.
Grimshaw, J. and V. Samek-Lodovici. 1995. Optimal
subjects. University of Massachusetts Occasional Papers
in Linguistics (UMOP), 589-605.
Kager, R. 1999. Optimality Theory. Cambridge: Cambridge
University Press.
McMahon, A. 2000. Change, chance, and optimality. Oxford:
Oxford University Press.
Pulleybank, D. and W. J. Turkel. 1995. Traps in constraint
ranking space. Paper presented at Maryland Mayfest 95:
Formal Approaches to Learnability, University of Maryland,
College Park.
Tesar, B. and P. Smolensky. 1993. The learnability of
Optimality Theory: An algorithm and some basic complexity
results. Technical Report CU-CS-678-93. Department of
Computer Science, University of Colorado, Boulder. ROA-2.
Tesar, B. and P. Smolensky. 1995. The learnability of
Optimality Theory. In Proceedings of the 13th West Coast
Conference on Formal Linguistics, ed. R. Aranovich, W.
Byrne, S. Preuss, and M. Senturia, 122-137. Stanford, CA:
CSLI Publications.
Tesar, B. and P. Smolensky. 1996. Learnability in
Optimality Theory. John Hopkins University Technical
Report JHU-CogSci-96-3.
Tesar, B. and P. Smolensky. 1998. Learnability in
Optimality Theory. Linguistic Inquiry 29:229-268.
- --------------------
Larry LaFond, a Ph.D. candidate at the University of
South Carolina, has research interests in second language
acquisition theory, discourse analysis, and intercultural
pragmatics. His dissertation research employs an OT
framework in a developmental account of the acquisition
of null subjects, inversion, and that-trace effects by
native speakers of English learning Spanish.