Currently the only publically distributed signal processing method in
Festival is residual excited LPC. To use this, you must extract LPC
parameters and LPC residual files for each file in the diphone
database. Ideally, the LPC analysis should be done
pitch-synchronously, thus requiring that pitch marks are created
before the LPC analysis takes place.

A script suitable for generating the LPC coefficients and residuals
is given in festvox/src/general/make_lpc and is repeated
here.

Note the (optional) use of ch_wave to attempt to normalize the
power in the wave to a percentage of its maximum. This is a very crude
method for making the waveforms have a reasonably equivalent power.
Wildly different power fluctuations in power between segments is likely
to be noticed when they are joined. Differing power in the nonsense
words may occur if not enough care has been taking in the recording.
Either the settings on the recording equipment have been changed (bad)
or the speaker has changed their vocal effort (worse). It is important
that this should be avoided as the above normalization does not make the
problem of different power go away it only makes the problem slightly
less bad.

A more elaborate power normaliziation has been successful, but it is a
little harder, though it was definitely successful for the KED US
American voice that had major power fluctuations over different
recording sesssions. The idea is to find the power during vowels in
each nonsense word, then find the mean power for each vowel overall
files. Then, for each file, find the average factor difference for each
actual vowel with the mean for that vowel and scale the waveform
according to that value. We now provided a basic script which does this

bin/find_powerfacts lab/*.lab

This script creates (among others) etc/powfacts which if it
exists, is used to normalize the power of each waveform file during
the making of the LPC coefficients.

We generate a set of ch_wave commands that extract the parts of
the wave from that are vowels (using -start and -end
options, make the output be in ascii -otype raw-ostype
ascii and use a simple script to calculate the RMS power. We then
calculate the mean power for each vowel with another awk script using
the result as a table, then finally we process the fileid, actual vowel
power information to generate a power factor to by averaging the ration
of each vowel's actual power to the mean power for that vowel. You may
wish to still modify the power further after this if it is too low or
high.

Note that power normalization is intended to remove artifacts caused by
different recording environment, i.e. the person moved from the
microphone, the levels were changed etc. they should not modify the
intrinsic power differences in the phones themselves. The above
techniques try to preserve the intrinsic power, which is why we take the
average over all vowels in a nonsense word, though you should listen to
the results and make the ultimate decision yourself.

If all has been recorded properly, of course, individual power
modification should be unnecessary. Once again, we can't stress
enough how important it is to have good and consistent recording
conditions, so as to avoid steps like this.

If you want to generate a database using a different sampling rate than
the recordings were made with, this is the time to resample. For
example an 8KHz or 11.025KHz will be smaller than a 16KHz database. If
the eventual voice is to be played over the telephone, for example,
there is little point in generating anything but 8Khz. Also it will be
faster to synthesize 8Khz utterances than 16Khz ones.

The number of LPC coefficients used to represent each pitch period can
be changed depending on sample rate you choose. Hearsay, reasonable
experience, and perhaps some theoretical underpining, suggests the
following formula for calculating the order

(sample_rate/1000)+2

But that should only be taken as a rough guide though a larger sample
rate deserves a greater number of coeeficients.