Thesis Proposal

Date: May 1st 2003, 4:30 p.m.

Thesis Committee

Abstract

A content planner is a major component in a generation system,
responsible for determining content and structure of the generated
output. It takes a knowledge base and communicative goals as input
and provides a document plan as output. It can use content planning
schemata to guide the construction of the document plan; the task of
building such schemata is normally recognized as tightly coupled with
the semantics and idiosyncrasies of each particular domain. In the
thesis outlined in this proposal, I investigate the automatic
construction of schemata from a resource consisting of texts and
associated knowledge bases. This resource is a collection of
human-produced texts together with the data a generation system is
expected to use to construct texts that fulfill the same communicative
goals. Schemata are better suited for descriptive texts with a strong
topical structure and little intentional content. Thus, I focus on
such domains where texts are also abundant in anchors (pieces of
information directly copied from the input knowledge base). My
methods involve the application of shallow understanding techniques to
obtain information about the aggregative behavior of the texts. My
proposed learning process involves matching of knowledge within the
text, mining of order constraints on the knowledge side, and using
such constraints to build the schema. Evaluation criteria throughout
the process are also discussed.