AMI Meeting Corpus

The AMI Meeting Corpus contains 100 hours of meetings captured using many synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple gaze and head movement.

Two-thirds of the corpus consists of recordings in which groups of four people played different roles in a fictional design team that was specifying a new kind of remote control. Controlling the data in this way allows us better measures of how well the groups are doing, and to compare to new data where groups use our technologies that proves they help. However, it also limits the things people talk about.

The remaining third of the corpus contains recordings of other types of meetings.

The AMI Meeting Corpus is available for free download, after registering, under a Creative Commons license. For more information, visit the corpus website.