MMASCS – Multi-Modal Annotated Synchronous Corpus of Speech

The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech is a collection of speech recordings for research purposes. It consists of a total of 770 sentences spoken by a male Austrian German speaker, recorded at slow, normal and fast speaking rates, in in the following modalities:

A combined video showing a frontal view rendering of the marker data, a side view rendering of the marker data, the gray scale video, the color video, and a plot of the waveform with a time indicator (H264/AAC MPEG-4, 1200x900 pixels, 100 fps)

A text file listing the phones spoken in the utterance including begin and end times of all phones

A HTK quin-phone full-context label file

Furthermore, the package contains:

The 320 sentences in plain text

A software which can play back pickle files as mentioned above (Python, OpenGL).
This video shows an example: