The Animatronic is the doppelgänger of the german author Thomas Melle, and it (or can be a Robot be addressed as he?) „act“ one hour on stage speaking about himself and other meta-related Topics.

An interesting part of the programming was to automatise the lip-sync. We want to take the voice of Thomas Melle, put in a software as input and get the positions of the animatronic’s mouth motors as output. To do that we needed to lern some theory.

The sound of a spoken sentence and the mouth position haven’t a correspondence one to one. The sounds that we are hearing is described with phonemes, and the mouth shapes that we are seeing are visemes. For example the sentences: „You have salad“ and „You have talent“ look the same if lip-readed. The youtube channel BadLipReading is a good example of that; he is using this language property as technique to create his videos.

Than we need rules to map the phonemes to the visemes. The simplest method it to create phonemes groups mapped to the respective visemes, and ignore coarticulation rules.

With this group we get already pretty believable animations, other phoneme groups like the one the Hanna-Barbera studios invented is describe in the repo of the really nice tool rhubarb-lip-sync created by Daniel Wolf. This tool can generate from an audio file directly the visemes, it uses internallly cmusphinx to align the phoneme to the time.

If we want to do more realistic animation we can use coarticulation rules. A viseme can be influenced by the adjacent one, so we need to analyse the viseme sequence and change some accordingly. For example the T of this looks closer than the T of that. Because the T in this is „preparing“ itself to the I sound and in that is „preparing“ itself to the A: sound.
We didn’t want to write complex coarticulation rules so we used a blending method. To do that we blend sequentially visemes togheter to some percents: the mouth shape for the T blended with the I is more narrow than the T blended with the A:

This method still a work in progress, I will try different and more specific coarticulation rules. And also try to implement something like timber, volume, pitch to tweak different mouth shape strategies. A paper that describe how to add some dynamics to the mouth shapes http://www.dgp.toronto.edu/~elf/JALISIG16.pdf.