LINGUIST List 14.226

Wed Jan 22 2003

Review: Computational Ling: Jackson & Moulinier (2002)

Editor for this issue: Naomi Ogasawara <naomilinguistlist.org>

What follows is a review or discussion note contributed to our Book
Discussion Forum. We expect discussions to be informal and
interactive; and the author of the book discussed is cordially invited
to join in.
If you are interested in leading a book discussion, look for books
announced on LINGUIST as "available for review." Then contact
Simin Karimi at siminlinguistlist.org.

Jackson, Peter and Isabelle Moulinier (2002) Natural Language
Processing for Online Applications: Text Retrieval, Extraction and
Categorization. John Benjamins Publishing Company, x+226pp, paperback
ISBN 1-58811-250-0, $29.95, Natural Language Processing series.
Book Announcement on Linguist:
http://linguistlist.org/get-book.html?BookID=4059http://linguistlist.org/issues/13/13-2579.html
Zhongdong Zhang, Novator Systems Ltd., Toronto
SYNOPSIS
The growth of online applications and the World Wide Web has caused
intense interest in Natural Language Processing. More and more Natural
Language Processing techniques have been applied in commercial
systems. The book provides a theoretical and practical introduction
to several Natural Language Processing related technologies: document
retrieval, information extraction, text categorization, named entity
extraction, text summarization, and topic detection. It gives a clear
introduction and explanation to various approaches to the selected
techniques. General principles and best practices as well as in-depth
discussions are given based both on current research results and on
the authors' own experience with these technologies. Every chapter
ends with an evaluation of the techniques discussed. The authors
succeed in providing a good and concise reference book to technology
practitioners in the Internet space. The explanations to most
techniques are clear, and based on these explanations, readers can
implement these techniques directly. Furthermore, the book provides a
comprehensive bibliography to the techniques it covers.
Throughout the book readers will find two things very useful: sidebars
and pointers. Sidebars provide a clear explanation or demonstration of
techniques being discussed and thus allow readers to be able to get an
easy understanding. Pointers provide supplementary bibliographical
resources.
Unlike some well known books on Natural Language Processing techniques
(e.g., Allen 1995, Manning & Schuetze 1999, Cole et al 1997) which
deal with core theories, approaches and techniques as well as general
applications of Natural Language Processing, this book focuses on
several selected technologies which are identified by the authors as
main tasks and super-tasks of language processing applications on the
Web (page 8). The book doesn't pay much attention to the relationship
of NLP and these tasks, as discussed in Allan 2000 and Voorhees 1999;
rather, it focuses mainly on the technical aspects of the selected
tasks.
As the authors emphasize in abstract, the book is neither a vendor
guide nor a recipe for building applications, although it does deal
with general principles and practical issues of building applications
with the selected techniques. Issues like architecture, design and
implementation of Natural Language Processing techniques embedded,
robust, efficient, and scalable systems (e.g., Kowalski 1997, Basili
et al 1999) are not discussed explicitly in this book. Some general
discussions on these issues are available and some software and
toolkits are mentioned throughout the book.
Chapter 1. Natural Language Processing. In this chapter the authors
give an overview of Natural Language Processing. Key theories and
techniques of Natural Language Processing, such as tokenization,
tagging, grammars, parsing, and named entity recognition are
introduced. Discussions on advantages, drawbacks and pitfalls of
various approaches or options are given. These discussions can help
readers to easily make decisions when choosing appropriate
techniques. This chapter also gives a relatively complete theoretical
and practical resource guide. Many useful software tools are discussed
or referred to as well. This introduction and resource guide serve not
only as a foundation for the rest of book, but allow
readers/developers to be able to get started to construct natural
language processing systems quickly.
Chapter 2. Document Retrieval. In chapter 2 the authors focus on
document retrieval. After introducing the indexing technology and
different query processing techniques such as Boolean search, Vector
Space model, probabilistic retrieval and language modeling, an
in-depth discussion on search engines and Web search is given. The
application of natural language processing techniques in document
retrieval is discussed here and some useful thoughts are given.
Chapter 3. Information Extraction. In this chapter the authors present
many results from the Message Understanding Conferences, which have
been dedicated to information extraction. Different approaches,
systems and theories behind them are introduced and reviewed. An
evaluation of current technology of information extraction is
given. The authors conclude in the chapter that information extraction
technology has come of age.
Chapter 4. Text Categorization. In this chapter the authors first give
a general analysis of applications and tasks as well as key issues
regarding text categorization technology. Various methods of text
categorization, from handcrafted rule based methods, statistical
methods to combination of multiple classifiers, are then explained and
discussed. The chapter also gives a detailed introduction to the
evaluation of text categorization systems.
Chapter 5. Towards Text Mining. In this chapter the authors describe
several promising applications of Natural Language Processing, namely,
named Entity recognition, reference resolution, automatic text
summarization and topic detection. Various approaches for these tasks
are introduced and evaluated. The authors finish this chapter, as well
as the book, by giving their thoughts on future prospects of Natural
Language Processing.
DISCUSSION
As mentioned above, the book covers document retrieval, information
extraction, text categorization, named entity extraction, text
summarization, and topic detection - these are techniques identified
as main tasks and super tasks of language processing applications on
the Web. Question Answering (QA) and natural language conversation are
thus not discussed. However, as demand for Question Answering and
conversational systems grows very quickly - in fact in different
channels like the Web, Email and phone / voice, a discussion on this
topic would absolutely be very interesting (e.g., Voorhees 2001, Allen
et al 2001).
In addition, the book uses endnotes instead of footnotes and
bibliography. Although plenty of valuable thoughts can be found in the
endnote of each single chapter, it might be more convenient for
readers to have a separate list of bibliographical references.
In general, the book is a very good, concise reference book filled
with many theoretical principles and practical guidelines. I recommend
this book to anyone who wants to build applications related to text
retrieval, information extraction and categorization.
REFERENCES
J. Allan (2000) Natural Language Processing for Information Retrieval.
Tutorial presented at the NAACL/ANLP Language Technology Joint
Conference in Seattle, Washington, April 29, 2000.
J. Allen (1995) Natural Language Understanding, 2nd ed. Benjamin/Cummings.
J. F. Allen, D. K. Byron, M. Dzikovska, G. Ferguson & L. Galescu
(2001) Toward Conversational Human Computer Interaction. AI Magazine,
Winter 2001, pp. 27-37.
R. Basili, M. Di Nanni & M. T. Pazienza (1999), Engineering if IE
Systems: An Object-Oriented Approach. In M. T. Pazienza, ed.,
Information Extraction: Towards Scalable, Adaptable Systems,
pp. 134-164. Springer.
R. Cole, J. Mariani, H. Uszkoreit, A. Zaenen & V. Zue (1997) Survey of
the State of the Art in Human Language Technology. Cambridge
University Press.
G. Kowalski (1997), Information Retrieval Systems: Theory and
Implementation. Kluwer Academic Publishers.
C. D. Manning & H. Schuetze (1999) Foundations of Statistical Natural
Language Processing. MIT Press.
E. M. Voorhees (1999) Natural Language Processing and Information
Retrieval. SCIE, pp. 32-48
E. M. Voorhees (2001) Overview of the TREC 2001 Question Answering Track.
ABOUT THE REVIEWER
Zhongdong Zhang is a Senior Software Developer at Novator Systems Ltd.
in Toronto. Currently he is working on automated customer service
solutions (Question Answering and Natural Language Conversation) by
using various techniques of natural language understanding,
information retrieval and text categorization.