Analysis of Duration: Are Emphases Achieved Differently in Different Languages in Terms of Duration?

To achieve a more expressive, personalised speech-to-speech translator, quite a bit of paralinguistic information can be extracted and transferred from input natural speech into output synthesised speech, in particular, a user’s focuses carried by emphasised spoken words in the input speech. For the purpose of examining how differently emphases are achieved in different languages in terms of duration, this technical report presents results of analysis of sentence-level and wordlevel duration, which was extracted from parallel English and German speech data. It is found that considering durations of neutral parts of the utterances unaffected by nearby emphasised spoken words was reasonable. The realisation of emphasis in terms of word-level duration was speaker-dependent to some extent and language-independent for most speakers. Though the duration of an emphasised word in a source language could not give a definite hint about that of its spoken translation in a target language, collectively, durations of emphasised words in the source language would be still able to provide a clear hint as to a speaker’s own style of emphasising words in the target language. The research work presented in this technical report was funded by the SIWIS project (SNSF Grant CRSII2-141903)Show more