LECTRA (LECture TRAnscriptions in European Portuguese)

Corpus LECTRA (LECture TRAnscriptions in European Portuguese)

ID:

ELRA-S0366

This corpus is composed of the audio and the manual transcriptions of the LECTRA Corpus: classroom LECture TRAnscriptions in European Portuguese. The corpus includes seven 1-semester University courses. All lectures were taught at Technical University of Lisbon (IST), recorded in the presence of students, except IICT, recorded in another university and in a quiet office environment, targeting an Internet audience. The corpus contains a total of 28 hours of audio speech that were manually transcribed by several trained annotators.

Two files per lecture are provided:a) a RAW file: audio fileb) a TRS file: containing the manual transcriptions. The TRS format is a kind of XML format that a standard transcription software such as Transcriber can open. Annotations in the TRS files are at word-level. They are fine-grained transcriptions that include disfluencies. The characters in the text files are encoded in ISO-8859-1 (Latin1).