Article Structure

Abstract

Introduction

Human communications in real world situations interlace multiple topics which are related to each other in conversational contexts.

Dialog Topic Tracking

Dialog topic tracking can be considered as a classification problem to detect topic transitions.

Wikipedia-based Composite Kernel for Dialog Topic Tracking

The classifier f can be built on the training examples annotated with topic labels using supervised machine learning techniques.

Evaluation

To demonstrate the effectiveness of our proposed kernel method for dialog topic tracking, we performed experiments on the Singapore tour guide dialogs which consists of 35 dialog sessions collected from real human-human mixed initiative conversations related to Singapore between guides

Conclusions

Topics

tree kernel

In A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia

Our composite kernel consists of a history sequence and a domain context tree kernels , both of which are composed based on similar textual units in Wikipedia articles to a given dialog context.

Page 1, “Introduction”

Our composite kernel consists of two different kernels: a history sequence kernel and a domain context tree kernel .

Page 2, “Wikipedia-based Composite Kernel for Dialog Topic Tracking”

3.2 Domain Context Tree Kernel

Page 3, “Wikipedia-based Composite Kernel for Dialog Topic Tracking”

Since this constructed tree structure represents semantic, discourse, and structural information extracted from the similar Wikipedia paragraphs to each given instance, we can explore these more enriched features to build the topic tracking model using a subset tree kernel (Collins and Duffy, 2002) which computes the similarity between each pair of trees in the feature space as follows:

Page 3, “Wikipedia-based Composite Kernel for Dialog Topic Tracking”

In this work, a composite kernel is defined by combining the individual kernels including history sequence and domain context tree kernels , as well as

feature space

Appears in 3 sentences as: feature space (3)

In A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia

Since our hypothesis is that the more similar the dialog histories of the two inputs are, the more similar aspects of topic transtions occur for them, we propose a subsequence kernel (Lodhi et al., 2002) to map the data into a new feature space defined based on the similarity of each pair of history sequences as follows:

Page 3, “Wikipedia-based Composite Kernel for Dialog Topic Tracking”

The other kernel incorporates more various types of domain knowledge obtained from Wikipedia into the feature space .

Page 3, “Wikipedia-based Composite Kernel for Dialog Topic Tracking”

Since this constructed tree structure represents semantic, discourse, and structural information extracted from the similar Wikipedia paragraphs to each given instance, we can explore these more enriched features to build the topic tracking model using a subset tree kernel (Collins and Duffy, 2002) which computes the similarity between each pair of trees in the feature space as follows:

manually annotated

In A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia

All the recorded dialogs with the total length of 21 hours were manually transcribed, then these transcribed dialogs with 19,651 utterances were manually annotated with the following nine topic categories: Opening, Closing, Itinerary, Accommodation, Attraction, Food, Transportation, Shopping, and Other.

Page 4, “Evaluation”

For the linear kernel baseline, we used the following features: n-gram words, previous system actions, and current user acts which were manually annotated .

Page 4, “Evaluation”

All the evaluations were done in fivefold cross validation to the manual annotations with two different metrics: one is accuracy of the predicted topic label for every turn, and the other is precisiorflrecall/F-measure for each event of topic transition occurred either in the answer or the predicted result.