SUMMARY OF THE SECOND PROSODIC TRANSCRIPTION WORKSHOP:
THE TOBI (TOnes and Break Indices) LABELING SYSTEM
(NYNEX Science & Technology, Inc., 5-6 April 1992)
This is a summary of the second prosodic transcription workshop. The
purpose of this summary is to help inform others of the activities and
plans arising from this workshop. The most important outcome of the
workshop was a prosodic labeling scheme. Below we outline briefly the
motivations of the workshop and the labeling scheme. Details will be
presented by Kim Silverman at the upcoming ICSLP meeting in Banff (2nd
morning session of Friday Oct 16 -- FR.sAM.2 -- room 2).
(attending: James Allen, Gayle Ayers, Mary Beckman, Lin L. Chase, Rene Collier,
Nancy Daly, Donna Erickson, Julia Hirschberg, Bob Ladd, Christine H. Nakatani,
Mari Ostendorf, John F. Pitrelli, Patti Price, Kim Silverman,
Stefanie Shattuck-Hufnagel, Liz Shriberg, Judith Spitz, David Talkin,
Jacques Terken, Nanette Veilleux, Colin Wightman)
A common notational system enables the sharing of corpora and other
data. Shared corpora not only provide the important scientific
benefit of promoting reproducibility and enabling comparative
evaluation, but they also make far more data available than would be
available from any one site. This is an important attribute when
automatic training techniques are used. It is also an important
attribute if one desires to observe naturally occurring (as opposed to
laboratory speech) and still control contextual variability. A
standard prosodic notation is critical to advances in prosody
research. Since prosody is at the intersection of a variety of
disciplines (from speech signal processing through discourse analysis)
it is not surprising that a common notation that suits the needs of
these diverse groups has not yet been formed. However, with growing
interest in prosody in spoken language processing, and with growing
cross-disciplinary ineraction, the time is right for forging consensus
on this issue. In this spirit, there have now been two workshops
aimed at coming to agreement on prosodic notation: in July 1991 Victor
Zue hosted a workshop at MIT, and in April 1992 Kim Silverman hosted a
second workshop at Nynex.
The goal of the workshops was to produce a prosodic notation system to
meet the following criteria:
- Since no one system will suit all needs of the diverse group working
on prosody, the consensus transcription should form a common core to which
others may add additional detail within the format of the system,
- Since the system will be used by different people at different sites and
times, the system should be relatively easy to train people to use, and should
provide good consistency within and across labelers,
- Since we are not yet prepared to meet the needs of prosodic transcription for
all languages, the system should focus on the needs for English, however, we
note that the two key aspects transcribed (word groupings and prominences) are
likely to be rather universal.
TOBI consists of four transcription tiers:
1. an orthographic tier, for specifying the words in the utterance
using ordinary English orthography;
2. a tone tier, for specifying the tonal properties of the f0 contour
of the utterance (this tier has a shorthand notation that marks pitch
accents with an asterisk but does not label the tonal attributes).
3. a break-index tier, for specifying the degree of disjuncture
between words in the orthographic transcription; and
4. a miscellaneous tier, for additional TOBI notations and for
individual or local additions.
The conventions include some diacritics for marking some disfluencies.
Conventions are specified for both simple text-based transcription
using this system and for WAVES(tm) label files and formats to
accompany a speech file and associated time-aligned analysis records
for the utterance. Sample WAVES(tm) scripts are available.
As they are completed, we propose to make available the results of our
analyses concerning the ease of learning the system, and consistency
measures within and across labelers, included uses for non-American
varieties of English.
We hope to place the description of the TOBI system and supporting
materials in a location from which sites could easily FTP these
materials. For those without FTP access, we hope to make available a
cassette tape with sample waveforms. The transcription conventions
would be modified based on feedback from users and updated in the
common location. We will use the prosody mailing list and other
mechanisms for announcing availability of additional materials or
tools. In addition, we hope to obtain funding for organizing a
workshop for training people in the use of the proposed system. There
will be an ICSLP 1992 paper that will provide further details.