LINGUIST List 14.2139

Tue Aug 12 2003

Review: Computational Ling: Androutsopoulos (2002)

Editor for this issue: Tomoko Okuno <tomokolinguistlist.org>

What follows is a review or discussion note contributed to our Book
Discussion Forum. We expect discussions to be informal and
interactive; and the author of the book discussed is cordially invited
to join in.
If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for review." Then contact
Simin Karimi at siminlinguistlist.org.

Androutsopoulos, Ion (2002) Exploring Time, Tense and Aspect in
Natural Language Database Interfaces. John Benjamins (Natural Language
Processing Series, Vol. 6), x+307pp, hardback, ISBN 9027249903 and
1588112691, US$104, EUR116.
http://www.linguistlist.org/issues/13/13-2681.html
Pablo A. Duboue, Computer Science Department,
Columbia University, USA
SYNOPSIS
Natural Language Interfaces to Data Bases (NLIDBs) are the focus of
study of this book, an unabridged version of Dr. Androutsopoulos'
doctoral dissertation (Edinburgh, 1996). He specifically deals with
modeling temporal aspects of English questions into temporal
extensions to the most widespread data base query language (SQL).
Even though his dissertation precedes the book by a good six years,
the book is still worth reading for a wide audience.
Researchers on NLIDBs and most importantly, on temporal aspects of
them, will be the readers that will profit from the book as a whole.
However, active researchers in that area will surely be more
acquainted with leading edge advances. Therefore, the book can well
be considered as fundamental reading, ideal to researchers new to the
field.
Aside from its obvious candidates, the book can be of particular
interest to linguists working on representing temporal issues (time,
tense and aspect) and instructors of HPSG courses. The theoretical
nature of the book makes it accessible to people with no computer
science background, while its empiricist tendency greatly helps ground
certain abstract concepts such as aspectual taxonomies and meaning
representation languages. Moreover, the author's example HPSG grammar
plugs directly into Pollard and Sag (1994), allowing an instructor
teaching HPSG grammar to include a ''real world'' example.
Finally, a developer working on adding temporal capabilities to a
NLIDB may want to look at the book. A word of caution is in order in
this case, as ''Exploring...'' can be pretty terse reading for
developers and the prototype system is directly coded in Prolog and
generates TSQL2, a temporal extension of SQL that was not persuasive
enough on standards committee grounds.
DETAILED ANALYSIS
The book can be roughly divided into three parts:
* Linguistic temporal issues (Chapters 1-3)
* From English to a meaning representation (TOP) (Chapter 4)
* From the meaning representation to SQL (Chapters 5-6)
Each of these parts has a different focus and can be of interest to
different audiences. However, they normally use material from earlier
chapters, so isolated reading may be difficult.
The discussion in the book is grounded in a fictional ''airport''
domain. The domain contains 20 relations, expressing the different
temporal phenomena the book is interested in covering.
The first part describes the aspectual taxonomy employed by the
author. The four classes (states, activities, culminating activities
and points) are inspired on the work from Vendler (1967) and similar
to the classes employed by Moens (1987), Passonneau (1988), and
Blackburn, Gardent and de Rijke (1994).
>From there, the English tense system is introduced, explaining at
some length how the different aspectual classes interact with it. The
discussion is well motivated and this part of the book is by far the
easier to understand in a first reading. Moreover, in a classroom
setting, it can be of interest to present the interaction between
aspect and tense by means of questions taken from the ''airport''
domain. Different linguistic phenomena are presented and the
discussion of them seems very comprehensive. However, a subset of
them is actually implemented in the rest of the book. This decision
is sensible, in the sense that computational systems are always of
limited coverage. However, the rationale behind each decision is not
empirically motivated. At times it seems to obey more reasons of
simplicity or the ability of the state of the art to capture more
easily some phenomena than others. The author seems to work on the
assumption that full coverage of every linguistic phenomenon is an
attainable and desirable goal. A more bottom-up approach, working
from actual questions to real temporal databases can be extremely
interesting for the sake of comparison.
The second part of the book defines a meaning representation language
(TOP) and a methodology to transform English questions to it. The
meaning representation language is very rich and it is designed to
easily capture temporal expressions. It is a language based on
temporal operators, similar to Prior (1967) operators. However, TOP
is a formal language, not a logic. No inference rules for TOP are
provided and the author claims the language is only suitable as an
intermediate language for transference to SQL or other database access
languages. Nevertheless, the language is thoroughly defined, it seems
easy to understand when written down (it is unclear the same goes for
temporal logics, for instance) and a considerable part of the book is
devoted to transforming English to TOP. I would like to think TOP can
be applicable in other settings as a means to capture the temporal
meaning of English expressions. If that were the case, then the
book's potential contribution would be much more significant, as the
description of both TOP and the transformation from English to TOP is
very detailed. Such effort seems worth reusing whenever possible.
In particular, the book provides a very nice example of application
and extension of off-the-shelf HPSG theory, as defined by Pollard and
Sag (1994). The author relies on a simplified semantic analysis,
without the situation theoretical approach from Pollard and Sag
(1994). A new ''aspect'' feature is added to the HPSG signs and an
''aspect principle'' is added to the theory. Domain information is
represented as an extension to the sort hierarchy. The author
analyzes the grammar in full detail, together with the mechanism to
extract the TOP sign from it. All in all, Chapter 4 is an appropriate
synthesis of computational linguistics: a sound linguistic discussion,
to a level of detail required by a computational implementation.
The third part is the most terse segment of the book. It is mostly
intended for computer scientists, as it defines the methodology for
transforming TOP expressions to machine executable instructions (SQL).
The actual transformation rules are somewhat easy to grasp, but the
discussion is very thorough and a formalization of both Temporal SQL
and the transformation mechanism is provided. Moreover, the mechanism
is proved correct. At first glance, this level of formality seemed
unnecessary, but a closer inspection shows that it is a requirement,
taking into account the generated SQL is not plugged into a real
system (avoiding any chance of an actual empirical evaluation of
correctness).
This third part should be of interest to implementers of NLIDBs, more
precisely, researchers developing experimental NLIDBs, given the
complexity of the section.
On behalf of real systems, the author points out several extra
requirements that are missing from the system as described in the
book:
* An input pre-processing module, in the form of appropriate
tokenization and domain terminology detection and conflation.
* A disambiguation module among parse trees, as the HPSG grammar
returns two or three parse trees for a good number of cases.
* A Natural Language Generation module to generate cooperative
responses in the event of ambiguity or to expose the right answer to
the user when false implications may be detected (e.g., the user may
pose a question such as ''Does plane BA737 circle?'' that implies
'circle' as having an habitual reading, but if that is not the case on
the domain, the system will respond just 'no' instead of explaining to
the user that the question is not possible).
As generation is of particular interest to this reviewer, I include
here some specific observations on the author's treatment of this
issue. Cooperative response generation is discussed in a good dozen
places throughout the book. While its need is remarked and the places
where it is needed are highlighted, it is unclear that the overall
framework can be easily extended to cope with response generation.
In particular, it may be the case that not enough information is kept
after parsing in the form of a TOP formula to build such a cooperative
response. It would have been interesting for the author to
investigate this issue further, but from the very beginning he made
clear that response generation was not going to be dealt with in this
work.
The book concludes with a discussion of related and further work.
Since the time Dr. Androutsopoulos finished his dissertation at
Edinburgh, Dr. Nelken defended in 2001 a related dissertation at the
Technion Institute (Israel). Dr. Nelken made several comments and
observations (sometimes negative) to the author's work. The book is
one of those rare opportunities to read in print comments made a
posteriori in a work done a priori. That discussion, together with
the roughly 15% new bibliographic items added since the dissertation
was defended, render the book up-to-date.
The book is accompanied with all the source code (in the ALE grammar
workshop) in a companion website. The website has no broken links and
contains not only the necessary source code, but also pointers to the
language (SWI Prolog) and grammar interpreter (ALE workshop).
Downloading language, interpreter and code was a matter of minutes and
all the examples from the book executed correctly.
OVERALL ANALYSIS
The book is true to its name; it explores the issues of time, tense
and aspect, keeping NLIDBs as an empirical grounding for an otherwise
theoretical discussion. However, the discussion in the book is
leading the state of the art in NLIDBs towards more empirical and less
exploratory work. Moreover, the book contains clear place-holders for
researching issues such as evaluation and cooperative response
generation. On that behalf, the book provides a necessary milestone
on that worthy path.
REFERENCES
Blackburn, P., Gardent, C., and de Rijke, M. (1994). Back and forth
through time and events. In D.M. Gabbay (Ed.), Proceedings of the
First International Conference on Temporal Logic (pp. 225-237). Boon,
Germany. Springer-Verlag.
Moens, M. (1987). Tense, Aspect and Temporal Reference. Ph.D. thesis,
Centre for Cognitive Science, University of Edinburgh, U.K.
Passonneau, R.J. (1988). A computational model of the semantics of
tense and aspect. Computational Linguistics, 14(2),44-60.
Prior, A. (1967). Past, PResent and Future. Oxford University Press.
Pollard, C. and Sag, I.A. (1994). Head-Driven Phrase Structure
Grammar. University of Chicago Press and Center for the Study of
Language and Information, Stanford.
Vendler, Z. (1967). Verbs and times. In Linguistics in Philosophy,
Chapter 4 (pp.97-121). Cornell University Press.
ABOUT THE REVIEWER
Pablo Ariel Duboue is a PhD candidate working under the supervision of
Dr. Kathleen McKeown at the Natural Language Processing group,
Columbia University in the City of New York (USA). His research
interest falls in the area of Natural Language Generation, mainly on
the automatic construction of content planners from aligned corpora.
More information about Pablo is available at
http://www.cs.columbia.edu/~pablo