EUROSPEECH '97
5th European
Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Semi-Automatic Phonetic Labelling of Large Corpora

O. Mella, D. Fohr

The aim of the present paper is to present a methodology to
semi-automatically label large corpora. This methodology is based on three
main points: using several concurrent automatic stochastic labellers,
decomposing the labelling of the whole corpus into an iterative
refining process and building a labelling comparison procedure which
takes into account phonologic and acoustic-phonetic rules to evaluate
the similarity of the various labelling of one sentence. After having
detailed these three points, we describe our HMM-based labelling tool
and we describe the application of that methodology to the Swiss
French POLYPHON database.