Brisbane, Australia
September 22-26, 2008

Xiaodan Zhu, Xuming He, Cosmin Munteanu, Gerald Penn

University of Toronto, Canada

This paper studies automatic detection of topic transitions for
recorded presentations. This can be achieved by matching slide
content with presentation transcripts directly with some similarity
metrics. Such literal matching, however, misses domain-specific
knowledge and is sensitive to speech recognition errors. In this
paper, we incorporate relevant written materials, e.g., textbooks
for lectures, which convey semantic relationships, in particular
domain-specific relationships, between words. To this end, we
train latent Dirichlet allocation (LDA) models on these materials
and measure the similarity between slides and transcripts in the
acquired hidden-topic space. This similarity is then combined with
literal matchings. Experiments show that the proposed approach
reduces the errors in slide transition detection by 17.41% on
manual transcripts and 27.37% on automatic transcripts.