Corpus: SIARAD

The Siarad corpus consists of 69 recordings and transcripts of conversations from 151 speakers, totalling 40 hours, and containing 460,000 word tokens. The conversations were collected over the period 2005-07, and transcribed over the period 2006-08. A detailed documentation file is available in pdf format, and a spreadsheet contains the output from the questionnaires.

Information about the participants in the conversations are given below. Click on the filename to examine the conversation in more detail.