Answering complex questions : supervised approaches

Abstract:

The term “Google” has become a verb for most of us. Search engines, however, have
certain limitations. For example ask it for the impact of the current global financial crisis
in different parts of the world, and you can expect to sift through thousands of results for
the answer. This motivates the research in complex question answering where the purpose
is to create summaries of large volumes of information as answers to complex questions,
rather than simply offering a listing of sources. Unlike simple questions, complex questions
cannot be answered easily as they often require inferencing and synthesizing information
from multiple documents. Hence, this task is accomplished by the query-focused multidocument
summarization systems. In this thesis we apply different supervised learning
techniques to confront the complex question answering problem. To run our experiments,
we consider the DUC-2007 main task.
A huge amount of labeled data is a prerequisite for supervised training. It is expensive
and time consuming when humans perform the labeling task manually. Automatic labeling
can be a good remedy to this problem. We employ five different automatic annotation
techniques to build extracts from human abstracts using ROUGE, Basic Element (BE) overlap,
syntactic similarity measure, semantic similarity measure and Extended String Subsequence
Kernel (ESSK). The representative supervised methods we use are Support Vector
Machines (SVM), Conditional Random Fields (CRF), Hidden Markov Models (HMM) and
Maximum Entropy (MaxEnt). We annotate DUC-2006 data and use them to train our systems,
whereas 25 topics of DUC-2007 data set are used as test data. The evaluation results
reveal the impact of automatic labeling methods on the performance of the supervised approaches
to complex question answering. We also experiment with two ensemble-based
approaches that show promising results for this problem domain.