Modeling phonetic context is one of the key points to get natural
sounding in concatenativc speech synthesis. In this paper, a new
quantitative method to model context is proposed. In the proposed method,
the context is measured as the distance between leafs of the top-down
likelihood-based decision trees that have been grown during the
construction of acoustic inventory. Unlike other context modeling
methods, this method allows the unit selection algorithm to borrow unit
occurrences from other contexts when their context distances are close.
This is done by incorporating the measured distance as an element in the
unit selection cost function. The motivation behind this method is that
it reduces the required speech modification by using better unit
occurrences from near context. This method also makes it easy to use long
synthesis units, e.g. syllables or words, in the same unit selection
framework.