PELCRA time-aligned spoken corpus of Polish (CC-BY-NC)

PELCRA-SP-2

ID:

510

A subset of the PELCRA corpus of conversational Polish, time-aligned on the utterance level, licensed under the CC-BY-NC license. This resource contains 386 744 words in 73 transcriptions of over 43 hours of recordings made in the years 2008-2010. The texts are provided as TEI P5-compliant XML files with custom PELCRA extensions and in the XLIFF format.

Capturing device type details: The conversations were captured using a digital voice recorder.

Capturing device type: Microphone

Capturing environment: Complex

Capturing details: Whenever possible, an attempt was made to take the recordings without the speakers being aware of the fact of being recorded. All participants were asked for permission to use the recordings afterwards.