The first DIVA was built with GRASSP (Gesturally Realized Audio Speech and Song Performance) that includes a Holmes parallel formant speech synthesizer. Our first DIVA will refine the Holmes speech synthesizer (formant synthesis) based on work by Fels and Hinton. We include new types of sound sources and adaptive adjustment of various synthesis parameters.

The second of three speech synthesis methods involved in this project makes use of a two-dimensional acoustic tube model. In this part of the project the facial movements and configurations are coordinated with sound production directly. Data mapping from the hardware controllers will allow the DiVA performers to use the same gestures from the formant synthesis methods to control the faces and create similar speech in the acoustic tube model method. This is the first step towards using a complete articulatory speech synthesis model.

The final speech synthesis method involves 3D articulatory synthesis. Three dimensional models of the vocal tract and facial muscles (including the lips) will be combined with acoustic tube models to produce vowels and consonants respectively. Again, the learned control gestures will map onto the same speech results, allowing the DiVA performers to move easily between each of the speech synthesis modes.