On the Global FO Shape Model using a Transition Network for Japanese Text-to-Speech Systems

Yasushi Ishikawa, Takashi Ebihara

In this paper, we describe a model of fundamental frequency control.
In general, a two stage model which consists of a global model and a
local model is used as a FO control method for Japanese text-to-speech
systems. We propose a model which is represented by transition network
as a global model that generates parameters of a local pitch model from
linguistic parameters of a sentence. In the proposed model, syntactic
analysis and generation of FO parameters are integrated, and the nodes
of a network represent the linguistic and prosodic state of a sentence.
The parameters of a local model is generated when taking transition.
We also propose a training method of the network. The prediction results
showed our model can predict the phrasal accent parameters with
satisfactory high accuracy. We also describe the model can be applied
prediction of pause position.