Bookmark

Computer Science > Computation and Language

Title:Automatically Segmenting Oral History Transcripts

Abstract: Dividing oral histories into topically coherent segments can make them more
accessible online. People regularly make judgments about where coherent
segments can be extracted from oral histories. But making these judgments can
be taxing, so automated assistance is potentially attractive to speed the task
of extracting segments from open-ended interviews. When different people are
asked to extract coherent segments from the same oral histories, they often do
not agree about precisely where such segments begin and end. This low agreement
makes the evaluation of algorithmic segmenters challenging, but there is reason
to believe that for segmenting oral history transcripts, some approaches are
more promising than others. The BayesSeg algorithm performs slightly better
than TextTiling, while TextTiling does not perform significantly better than a
uniform segmentation. BayesSeg might be used to suggest boundaries to someone
segmenting oral histories, but this segmentation task needs to be better
defined.