Transcriber (Review)

Views

This is a review of Transcriber written by Thomas Schmidt for the working group "Transcription and annotation of primary data" at the E-MELD workshop 2006.

Transcriber is “a tool for segmenting, labeling and transcribing speech [...] assisting the manual annotation of speech signals“ (quote from the website). It is open source software, and the development is ongoing. I tested version 1.5.2 of the software. The website offers a lot of documentation which answered more or less all of the questions that came up during testing.
Transcriber can handle a large number of sound file formats (wav, mp3, au and snd among them). Apparently, the Macintosh version is also able to handle video, but I was not able to test this (my Windows version did not support video).

Transcriber organizes a transcription into one or several sections. Each section consists of one or several speech turns, and each speech turn consists of one or several transcription lines. Background noise conditions can be transcribed independently of the section/turn/line organization of the transcript. All of these units can be timestamped, and navigation in the transcript is synchronized with navigation in (a waveform representation of) the recording.

Installation of Transcriber was no problem at all. After I had understood the section/turn/line distinction and the basic elements of the GUI, I was quickly able to use Transcriber efficiently for a transcription of a multi-party discourse. Navigation in the recording was smooth and error-free, I encountered no bugs while using the software (sometimes, however, dialog boxes would hide a button that was crucial for achieving certain actions).

My general impression is that Transcriber will be most useful for a broad, speech-only transcription of relatively “well-behaved” spoken interaction (e.g. interviews, scripted monologue). For this, it may even be the most suitable tool on the market. It also offers some support for special tasks like the annotation of named entities, which I have not tested.
For users who are dealing with spontaneous multi-party conversation with many overlaps etc., the tool's data model is probably not quite as intuitive (it requires a multiple speaker assignment to overlapping stretches of speech instead of allowing for single-speaker turns with temporal overlap). For detailed multi-modal or other multi-level annotation of speech data, I suspect that Transcriber is not the right tool (again, because of its relatively simple data model which makes it difficult to represent complex temporal relations between transcription entities).

Transcriber “natively” uses an XML format, which I was able to understand without further documentation. It also offers the possibility to import from other formats (e.g. CHAT, TIMIT, UTF, two NIST formats ESPS/XWaves) and to export into other formats (e.g. CHAT, HTML, STM). All of this suggests that interoperability is taken very seriously in the development of Transcriber. Transcriber also flawlessly supports Unicode – it is possible to switch to an UTF-8 encoding for the data, and I was able to enter Greek, IPA and Chinese characters and save and retrieve the transcription with these characters without any problems.