In this paper, we propose a semi-supervised learning of acoustic
driven phrase breaks and its usefulness for text-to-speech
systems. In this work, we derive a set of initial hypothesis of
phrase breaks in a speech signal using pause as an acoustic cue.
As these initial estimates are obtained based on knowledge of
speech production and speech signal processing, one could treat
the hypothesized phrase break regions as labeled data. Features
such as duration, F0 and energy are extracted from these labeled
regions and a machine learning model is trained to perform the
classification of these acoustic features as belonging to the class
of a phrase break or not a phrase break. We then attempt to bootstrap
the machine learning model using unlabeled data (i.e., the
rest of the data).