The effect of normalization -- a case study in speech synthesis

If you have a question about this talk, please contact Shakir Mohamed.

Undirected graphical models are ubiquitous in application domains of machine learning. However the normalization constants in these models are often difficult to compute, and as a result are frequently dropped altogether. In this talk we’ll look at the qualitative effect of this lack of normalization in the domain of statistical speech synthesis.

Specifically we’ll compare the predictive distributions of the standard unnormalized speech synthesis model, its globally-normalized undirected counterpart, and a more tractable directed graphical model. Along the way, we’ll highlight some of the general issues surrounding the choice between undirected and directed graphical models for sequence data.

The introduction to speech synthesis I will give will be aimed entirely at machine learners with no background in modelling speech, and will I hope be realistic (close to state-of-the-art), self-contained and framed in terms of
probabilistic modelling.