Jeju Island, Korea
October 4-8, 2004

Audiovisual Perceptual Evaluation of Resynthesised Speech Movements

Matthias Odisio, Gérard Bailly

INPG, France

We have already presented a system that can track the 3D speech
movements of a speaker's face in a monocular video sequence. For
that purpose, speaker-specific models of the face have been built,
including a 3D shape model and several appearance models. In this
paper, speech movements estimated using this system are
perceptually evaluated. These movements are re-synthesised using a
Point-Light (PL) rendering. They are paired with original audio
signals degraded with white noise at several SNR. We study how
much such PL movements enhance the identification of logatoms,
and also to what extent they influence the perception of incongruent
audio-visual logatoms. In a first experiment, the PL rendering is
evaluated per se. Results seem to confirm other previous studies:
though less efficient than actual video, PL speech enhances
intelligibility and can reproduce the McGurk effect. In the second
experiment, the movements have been estimated with our tracking
framework with various appearance models. No salient differences
are revealed between the performances of the appearance models.