For each syllable in the database bearing a Tilt event label, a set of
40 features was extracted. The features include the number of
syllables, stressed syllables, and accented syllables proceeding and
succeeding the syllable within the phrase; distance, in syllables,
from the previous and to the next event; the number of non-major
phrase breaks since the last major break; onset and rhyme length
[11]
[8]; percent of the syllable which
is unvoiced; and position of the syllable within a word (e.g. initial,
final, medial). The features also include, with a two-syllable window
on either side, accentedness, lexical stress, onset and coda types
(cf. [11]), Tilt event type and
syllable break values. Specifically these are the features which are
available at F0 generation time during synthesis from raw text.

Once the features have been extracted, training sets are created on
the basis of event type (accent, boundary, connection, silence) and
individual models were built for each Tilt parameter (starting F0,
amplitude, duration, tilt, peak position).

A CART training algorithm [3] is used to develop a
decision tree for each parameter, using an optimised subset of the
features extracted. The twelve decision trees are used to generate an
intonation description file composed of Tilt events and their
parameters. The description files are processed to generate the final
F0 contours.