SWC Transcription

The annotation and transcription are performed on the 4 channel headset audio recordings
with tool XTrans
(figure above). The default output file is in ".tdf" format, which is then converted into ".mlf" and ".stm" format
in release. The information about strips for dataset definition
is added into the ".stm" format. Below are details for each format for those who are not
familiar with them.

The .stm file starts with label information lines.
These information will be read by
NIST scoring tool SCTK so that
WER will be analysed for each category and each label during scoring. Such label information
lines start with ";;", while the main transcription does not have ";;".

For main transcription, i.e. the lines without ";;", each line displays several parts of information
for each annotated speech utterance in the following order:

It is worth emphasizing that the second column (microphone channel) differs when evaluating
the ASR output based on individual headset microphone (IHM) recordings and the ASR output
based on single or multiple distant microphones (SDM/MDM). For SDM and MDM, the value for
the second column should be the same for all utterances. For NIST scoring tool there are a lot of
options for this column as long as the string is the same among all utterances. However in Kaldi
default setup, there is one validation script that only accepts either "A" or "B" as the value for this
column.

The label column quotes one or multiple labels with "< >". This column indicates recording
ID (swc1/swc2/swc3) and strip ID (A/B/C), and both will be used
to decide which dataset that utterance belongs to. Below is one
piece of transcription for SWC1 for IHM in .stm format.

Each segment name is followed by word transcription with each word in one line.
Below is a piece of transcription for SWC1 in .mlf format.

#!MLF!#
"*/SWC1-00001_mn0001_000110_000180.lab"
YOU\'RE
NUMBER
TWO
.
"*/SWC1-00001_mn0003_000185_000274.lab"
I\'M
NUMBER
ONE
.
"*/SWC1-00001_mn0002_000279_000424.lab"
THAT\'S
OVER
THERE
.
"*/SWC1-00001_mn0001_000429_000536.lab"
DAN
IS
NUMBER
ONE
.
"*/SWC1-00001_mn0001_000827_000950.lab"
NOT
PAYING
ATTENTION
.

The .tdf format is the default format as XTrans
output. It looks similar to .stm formt,
while the field separation is tab ("\t") rather than space (" "). The
.tdf format is included
in the release package for users' convenience to edit and verify the transcription or
annotation along with audio (.wav file) in case of need.