In Epos, nearly all of the TTS processing is controlled by a rule file;
there is one rule file per language and it usually has the .rul
suffix. The rule file for the German language, for instance, resides
by default in lng/german/german.rul. The rules may also slightly
vary for the individual voices using the
soft options.

The text being processed by Epos is internally stored in a multi-level data
structure suitable for the application of transformational rules. Every phonetic
unit (or an approximation of one) is represented by a single node in the
structure. The nodes are organized into layers corresponding to linguistic
levels of description, such that a unit of level n can list its
immediate constituents, that is units of level n-1. Every layer
also has a symbolic name, which is used to refer to it in the rules.

The number and symbolic names of individual levels can be specified
with the unit_levels option before the languages are defined.
An example is given in a table.

Level name

written TSR semantics

spoken TSR semantics

text

the whole text

the whole text

sent

sentence construction

terminated utterance

colon

sentence/clause/colon

intonational unit

word

word

stress unit

syll

word

syllable

phone

letter

sound

segment

segment

Available Text Structure Representation layers (an example)

Every unit, be it segmental level or not, may contain a character. The TSR,
as generated by the text parser, contains the appropriate punctuation
at suprasegmental levels (that is, levels above the phone level):
spaces at the word level, commas at the
intonational unit level, periods, question marks and such will
become the contents of a sentence (terminated utterance) level.
Some suprasegmental units will have no content, because they have
been delimited only implicitly; for example, a colon-final word
has been delimited by a comma, but the comma is actually a colon
level symbol: the last word will have no content. This content
may be modified by the rules and actually, it often is. This allows
marking up a unit for a later use (changing its content into an
arbitrary character, such as a digit or anything else, then applying
some rules only within units having this contents using a
rule of type inside.)

The rules are applied sequentially, unless stated otherwise.
Each rule operates units of a certain level within a unit
of some other level; for instance, a rule may assimilate
phones within a word, another rule may change the syllabic
prosody within a colon. The smaller units being manipulated
are called target units, the larger unit is referred
to as a scope unit; the respective levels are
called scope and target. Each scope unit
is always processed separately (from any other scope units)
as if no other text ever existed. For example, if the scope of some
assimilation happens to be "word", every word will have the rule
applied in isolation and the assimilation will never apply across
the word boundary nor will be able to distiguish a word boundary from
sentence boundary.

Any line of the rules file may contain at most one rule and
possibly some comment.
The rule begins with an operation code specifier (what to do),
followed by the parameter (one word, opcode specific), and possibly
by scope and target specification, if the defaults (usually word
and phone, respectively) are not suitable.

The scope and the target can be one of the
available levels of linguistic description as defined
with the unit_levels option. If target or even scope
for a rule is not specified, the default_target or
default_scope option value, respectively, will be used.
The typical defaults are phone and word, respectively.

Every rule is evaluated within certain unit, and the scope specifies,
what kind of unit it should be.
The meaning of the target is somewhat opcode specific,
but generally, this is the level which is affected by that rule,
or the lowest level affected by that rule within the scope.
See the individual rule descriptions in this section
in conjunction with the real world rule files for exact interpretation
of the target level.

The code, scope and target identifier is not case sensitive, but the parameter
usually is.

You can use the backslash to escape any special character including the
backslash itself anywhere in the rules just as in the configuration files.
See the
corresponding section for details.

Notice especially the possibility of referring to several internal
pseudocharacters with the nice property that they can never be found
in the input text and therefore are suitable for temporary markers
of all kinds in the rules. See the
raise rule
example.

Any text starting with a semicolon or # not in the middle of a word up to the
end of the line is a comment. It will be properly ignored. If a line
doesn't contain anything except whitespace and/or comment, it is also
ignored. The &commat;include directive can be used to nest the rule
files. The same rules apply within .ini files; for more
details, see
the &commat;include directive in configuration files.

A line which doesn't contain a rule may contain a macro definition instead.
It is specified as identifier = replacement, for example,

$vowel = aeiouy

Alternatively, the keyword external may follow an identifier instead of
the equality sign and the replacement:

$some_pathname external

This way the macro identifier is assigned the value of its corresponding
configuration parameter (for the current language if possible).

The macros will get expanded anywhere where they occur except for their own
point of definition. Therefore, $vowel $short$long will be a valid macro
definition, provided that $short and $long have already been defined. The
expansion is performed at the definition time and it is not iterated, because
the replacement is not expected to contain the dollar sign.

Macros can later be redefined if you wish and they can be local to a block
of rules as described below.

If there be any uncertainty concerning the exact length of the identifier,
you can use braces to delimit it: ${name} is usually equal to $name, but
$nameaeiou is not equal to ${name}aeiou. It is also possible to use
a colon or an ampersand as a delimiter: $name&aeiou.

It is a good practice to use macros extensively for classes of symbols
so that the same sets and subsets of characters are listed only once
in the rules and therefore are kept consistent throughout. The exact
values of the macros are however always language specific and so Epos
doesn't specify any built-in macros. If any macros are used
in the examples for specific rules below, reasonable definitions of the
macros are assumed to precede the rule.

Whenever an unordered list of tokens should be specified within the parameter
to some rule (use common sense and/or individual rule descriptions above),
you can also make negative specifications, such as "all consonants except
l and r". To do this, use the exclamation mark serving as an "except" operator:
$consonants!lr (The right operand is subtracted from the left one.)
If there is no left operand, say in !x, the semantics is "all but x".
A consequence is that ! alone means "everything".

The operator is right-associative; !$vowels!ou means "all excluding vowels,
but o and u don't count as vowels just now". Therefore,
o and u are included in this unordered list.

This operator never works for ordered lists, not even
for the syll rule sonority groups. But there is
a similar usage associated with rule types if, with,
prep and postp, where the exclamation mark can be
used to negate the condition; see the respective rule types.

The rule types described in this subsection operate in some way
on a list of words (or other strings), which can range from a few items
up to machine-generated megabytes of data. These strings are usually listed
in a separate file, while the parameter of such a rule is the file name.
Alternatively, the strings can be quoted inside the rule file, especially
if only a few ones are listed. Such a collection of strings
is called a dictionary and obeys the same format for any rule type
which needs external data (except for the neural networks).

The dictionary consists of multiple lines, each of which contains a single
dictionary item. An item consists of two whitespace separated words,
the former being the item itself, the latter being some string associated with
the item. Often, the second string is used to replace every occurrence of the
first string in the text being processed. That's why the strings are called
replacee and replacer, respectively. The order of dictionary items
is not significant.

We use adaptive hash tables -- and balanced
optionally bounded depth AVL trees for collisions
-- for representation of the dictionary in memory to achieve instant lookups of
any item, even in a huge dictionary.

The replacee cannot contain whitespace (unless escaped with a backslash),
but the replacer can. That is, if more than two words are found on a line,
the first one is the replacee and the rest of the line, except for any
post-replacee and/or trailing whitespace, becomes the replacer. However, some
rule types may not allow multiple word replacers.

The dictionaries follow the
same conventions
for character encoding, escaping special characters,
inclusion directives and comments as the rule files and other text files.

Instead of a file name reference, it is possible to quote the contents
of the dictionary directly; this is done by encapsulating the contents
in double quotes. Dictionary items are in this case whitespace-separated,
every replacer and replacee are separated by a comma.

The dictionary may either be parsed and loaded into memory at Epos startup
or at the moment of the first use. The former option's advantage is
early error reporting, while the latter can sometimes completely avoid
loading a huge unused dictionary. Use the option paranoid to choose
your preference.

Type subst

Substring substitution. The replacers replace every occurrence
of their respective replacees; longer matches are matched first; the
process is iterated until no replacee occurs in the string.
It there is a tie between several matches of equal length, the rightmost
match is chosen.

It is required
either to have a phone target, or to keep all the replacers
and replacees the same length (because of the descendants
of the units affected). Note also that to be considered a match
in the former case (target phone), all characters other than phones also
have to match (must be found or not found on the same positions in both
the replacee and the occurrence in question). The only exception is
the terminating scope-level separator (if any), which is ignored and
preserved.

Any replacer may begin with a ^ or end with a $. That forces
the substring being replaced to be at the beginning or the end
of the scope unit, respectively. This ^ or $ also counts
as a character when determining the longest match.

The replacer should not contain units of the scope level or higher.
Unless the paranoid option is set, this is tolerated, but the
replacer is truncated at the first of such characters.

With the phone target, this rule type will drop the
internal structure of the replaced text as soon as a match is found.
In other words: an affected scope unit with a replacer is re-parsed as
any other plain text. With any other target the original structure
is always kept.

Infinitely looping substitutions are currently reported as an error
condition.

As this rule type should not be used for trivial tasks with short
and often matching dictionaries, the example we shall now give
is somewhat involved:

The purpose of this example sequence of rules is to form stress units out of
graphical words, based on the following assumptions for the given
language: polysyllables are retained as stress units, but following
monosyllables may be merged to them; monosyllables which are colon-final
should be retained; other monosyllables may merge to each other and/or
to the preceding polysyllable; the merges should not produce too long
stress units.

The first part of the example is used to mark all non-colon-final
(more exactly: space delimited) words with the letters m,d,t,q,p
based on the number of syllables; note that the p is used
not only for pentasyllables, but also for all words of more than five
syllables. Then the substition rule is used to relabel some
monosyllables (destined as heads of stress units consisting solely
of monosyllables) with x. Finally, all monosyllables that
haven't been relabeled to x are merged to the preceding
stress word if there is any using the postp rule.

The substitution rule in the example has 21 dictionary items,
the first three being applicable only at the colon-initial position.
Mostly it directly lists the resulting labeling for the whole
colon, but with extremely long sequences of monosyllables it
relies on the facts that the longest matching replacee (and the
rightmost one if there are multiple) is chosen and that the
substitution process is iterated. For example,
pmmmmmmmmm would be first relabeled to pmmmmmxmmm
using the last item as listed in the dictionary and then once more
using a different item to pxmmxmxmmm.

Type prep

Preposition. If the scope unit is identical to some replacee,
it gets replaced with its respective replacer and merged to its right-hand
neighbor. If there is no such neighbor, nothing happens. As with the
subst rule, the target
must currently be phone or all the replacers of sizes corresponding
to their respective replacees.

Let us take a typical example:

prep preps.dic

where the referenced file contains a list of prepositions for
the language, e.g. for Czech:

bez
do
k
ke
ku
na
nad
o
od
po
pod
pro
pRed
pRes pRez
s
u
v
ve
z
za
ze

You can see that most of the prepositions have the replacers
identical to the replacees, so that the preposition doesn't
change except for being merged to the left if found.
There is however one irregular monosyllabic preposition
in Czech which does change its behavior with regard to
the voicing assimilation in Czech, and this can be done
too as shown. Notice also that the unit
(here: the word) must match the dictionary item
exactly, as opposed to mere substring matching required
by the subst rule.

As a special case, if the parameter begins with
an exclamation mark, then the rest of the parameter
is parsed as usual and any substitutions are performed
exactly as usual, but the scope units which get finally
merged to their respective right hand neighbors are
exactly those which are not found in the dictionary.

A typical example can be the following rule whose purpose
is to abolish all syllable boundaries (within each word).
The rule defines an empty dictionary and then merges each
word which is not found in the dictionary and which has
something to be merged to.

prep !"" syll

Type postp

Postposition. See rule type
prep for
the description and examples, but the resultant unit
is merged to its left-hand neighbor instead of the right-hand neighbor.

Type analyze

This rule type analyzes a unit of level immediately below
the scope level into a sequence of units based on a dictionary
of known contents of the new units at the target level
and priorities assigned to them. We will explain the
operation of this rule in terms of the morphematic analysis, i.e.
the most common use with the scope level being the word,
the result of the analysis will be the morpheme level
(just below the word level) and the target will be phones.

Each item of the dictionary corresponds to a single
morpheme (some linguists would prefer to say "morph" here).
The replacee is the form of the morpheme expected
withing the word, the replacer is a numeric value
which expresses the "badness" of this particular string.

In addition to these values, the dictionary must also
include two additional items, !META_unanal_unit_penalty
and !META_unanal_part_penalty. These serve as
global parameters of the analysis process.

The rule will split every affected word into morphemes
so as to minimize the sum of badnesses of each of the new
morphemes. Each such possible analysis may contain
parts (morphemes) which have been found in the dictionary
and which incur the badness specified there, and also
parts which failed to be found in the dictinary. For
each part not found, both a per letter penalty
(as specified with !META_unanal_unit_penalty)
and a per part penalty (as specified with
!META_unanal_part_penalty) is added to the total
badness.

If two alternative analyses are available with the
same total badness, the first (leftmost) part
which is not of identical length in both is considered
and the analysis with the longer one is chosen.

Usually the global parameters are set so
high that the algorithm will resort to an analysis
to morphemes found in the dictionary whenever at least
one is possible, and also to one which consists of
a smaller number of morphemes and also one which avoids
morphemes, which are known to the dictionary, but
repelled by larger badness values.

In this case, the word cowslip will not be analysed
as cow-s-lip, because that would yield badness 15,
while there are two analyses with badness 10, namely
cows-lip and cow-slip. Here the first (and wrong) one
will be taken due to the tie breaking rule, which will
probably bring about voiced pronunciation of the fricative.
Several alternative changes to the dictionary can be proposed
to avoid the misguided morpheme boundary inside slip.
No cows in the dictionary, an explicit cowslip item
with badness less than 10, or decreasing the badness of
slip can serve as examples.

In the unchanged case, the word gas will be analysed
as ga-s with total badness 355, as the only alternative
analysis gas has total badness 450: that is, one unanalysed
part consisting of three unanalysed units (letters).
A solution would be, for example, to increase the badness of the
miniature morpheme s to a value somewhere between 101 and 249;
with this adjustment it will still be cheaper than an isolated
unanalysed letter s in an otherwise known context,
but it will not be recognized at the border of an unanalysed context.

Type prosody

This rule type is a prosody modeling rule which uses a dictionary
of prosodic adjustments to be applied.
More details below.

Type segments

You don't want to read about this rule type, unless you are
preparing a new voice for a synthesizer with the traditional
segment-level interface based on a newly structured
speech segment inventory.

Setup the segment layer below the phone layer.
The parameter names a file, which contains
phone to segment mappings, again in the dictionary format.
The replacees each represent a three character
segment identifier, the replacers are the respective segment
codes (decimal).
It is possible, and indeed typical to include multiple identifiers
for the same segment number.

The middle character denotes the phone the resulting segment will
be assigned to. The left hand and right hand characters may either
be a question mark, or they may specify the right hand and/or left
hand neighbors to match a specific character. The question mark is
therefore a kind of wildcard.

If both fully specified and partly specified segments exist for
a given triplet of phones, they will be placed from left to right
in this order: lt?, ?t?, ?tr, ltr.

A sentence may contain these segment with the Czech diphone inventory
by Tomas Dubeda:

p l o u t e f
0p? pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0

or, with the traditional Czech segment inventory:

p l o u t e f
0p? ?pl pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0

(In this second example, for instance the diphones ?pl
and ?pt would actually share the segment number and
would correspond to the p-any consonant diphone.)

There are more possibilities for representing a segment
inventory; it is necessary to decide for the major diphone
types, whether they should live in their initial or
final sound. That is unfortunate, but it is the way it is.

It is possible to repeat a segment a few times. This effect
can be controlled by adding 10000 times the number of extra
repetitions to the segment number. Therefore,

?e? 20241

generates three identical segments number 241 for the stationary
part of the specified vowel.

Type with

This is actually a conditional rule, though it also uses
a dictionary. It applies an arbitrary rule upon the units
(words) listed in the dictionary.
More details below.

The contentual rules manipulate unit contents. That is, they're suitable
for implementation of more regular letter-to-sound rules, character replacement
and other transformations. They are a magnitude faster than e.g. the more
general (and more heavy weight) subst rule, so they should be used
whenever possible.

Type regress

Assimilation, elision or other mutation of phones or other units
depending on their immediate environment. The parameter is of the form
o>n(l_r), where o,n,l,r are arbitrary strings. The semantic is "change tokens
in o to their corresponding tokens in n
whenever the left neighbor is in l
and right one is in r". The first two strings should therefore either be of
equal length, or n should be a single character, with the obvious
interpretations of "corresponding".

The zero character (00) may be included in any of the strings; it means
"no element", and it can be used to insert new units, delete the old ones,
and to limit the change to the beginning or the end of the scope unit,
respectively. On the other hand, if the contents of some unit is literal 0
before the application of this rule, it will stay untouched. This special
meaning of 0 with this rule type can be suppressed by escaping.

Examples:

regress 0>'(0_aeiou) word phone

inserts the apostrophe before the vowels listed at the beginning of a word.

regress $voiceless>$voiced(!_$voiced) word phone

assimilates voiceless consonants to their voiced counterparts (assuming
$voiced and $voiceless have been defined previously), when they're followed
by a voiced consonant. The change proceeds from the right to the left,
therefore ppb will change to bbb. See
below
for the explanation of the exclamation mark (here: "everywhere").

Type progress

As above, but the change proceeds from left to right. In the second
example for the regress rule, the result would be pbb
if progress was employed.

The structural rules can be used to restructuralize the text. They usually interact
with multiple levels of description simultaneously.

Type raise

Move a unit to a higher level of description, e.g. when a segment
level unit should directly affect the prosody. The parameter is of the form
from:to (from and to are arbitrary strings,
and they can employ the
except operator (exclamation mark).
The tokens in from, if found at the target
level, are copied to the scope level, if the original scope token is listed
in to. It is also possible to omit the colon and the to string; the default
interpretation is "everywhere".

This rule is usually found as a link between rules operating on
different levels. For example, suppose we want to split every
colon before any occurence of one of the words nebo
and anebo:

Having inserted an internal pseudocharacter \X at the phone
level at the beginning of each of the words listed in the dictionary
used by the with rule, we raise this pseudocharacter to the
word level and treat it as the least "sonorous" element with the
following "syllabification" (splitting) rule. The last two rules
perform a simple clean-up - they change all word level occurences
of the pseudocharacter to a space and delete all phone level occurences
thereof.

Type syll

Roughly speaking, this rule type can be used to split words into
syllables according to the theory of sonority, i.e. at the least sonorous
phones.

More generally, it is used to do any sort of inserting unit
boundaries depending on local extremes of a simple metric
defined at target units.
A split occurs at the scope level unit, and, whenever necessary,
at all levels between the scope and the target units.

The parameter is an ordering of the target units (typically, phones),
starting from the extremal (least sonorous) ones, with groups of
equal status (equal sonority) delimited by <

Example:

syll 0<ptkf<bdgv<mnN<lry<aeiou" syll phone

inserts the following (and other) syllable boundaries:

a|pa ap|pa ap|ppppa arp|pa ar|pra a|pr|pa

Tokens not listed are considered least sonorous, order of tokens within
the same sonority group (see the example) is irrelevant. It is not
possible to use the
except operator with
this rule type.

As you can see from the example, the syllable boundaries are inserted
exactly once per every sequence of equivalent target units (e.g. equisonorous
phones) such that both preceding and following target units of the group
have higher sonority, and they're inserted either between the first and second
element of the group, or, if the group consists of a single unit, before
that unit.

This semantics is suitable for the syllabification task in all languages
known to us where syllabification is not primarily morphologically based,
but this rule type can also be used for other tasks involving a unit split
as some point defined by its contents, e.g. splitting a higher level
prosodic unit before or after certain words, as shown in the example
to the
raise rule. The authors are eager to
hear from you if you'd prefer an extension or simplification of this rule
type or if you can comment on automated syllabification issues over a wide
range of languages.

The utterance prosody is modeled in Epos by assigning
values for the following prosodic quantities of individual text
structure units (possibly at multiple levels of description):

pitch (fundamental frequency)

volume (intensity) and

duration (time factor)

Currently, these are values per cent, 100 being the neutral
value.

Epos doesn't currently provide sets
of segment inventories for multiple pitch ranges, therefore
extreme values, such as 15 or 1500 may sound very unnatural.

The prosody adjustments at different levels
sum up for the actual values assigned to the generated
segments. For example, a phone with the frequency (pitch)
value of 130 in a word with the value of 120 will contain
segments (after the segments rule is applied) with
frequency of 150. Alternatively, it is possible to multiply
the values for pitch, volume and duration instead, by setting the
pros_eff_multiply_f, pros_eff_multiply_i and
pros_eff_multiply_t options, respectively.
It is also possible to change the neutral value of 100
to a different base value with the f_neutral, i_neutral
and t_neutral options.

Type contour

This rule assigns a specified prosody contour to units at some level
of description within a unit which consists of them. For example,
the rule can be used to assign pitch contours to stress units;
individual values will probably be assigned to syllables.

The parameter describes a single prosody contour. The first letter
denotes the prosodic quantity (frequency, intensity or duration)
to be specified; the second is a slash; the adjustments follow
as colon-separated decimal integers. For an example,

contour f/+2:+0:-2 word syll

assigns a falling pitch contour to a trisyllabic word. The number
of syllables in a word, or, more generally, of the target units
in a scope unit, must match the number of adjustments specified
in a contour rule, otherwise an error occurs; consider
the
length-based selection of rules
to ensure that. As an exception to that, it is possible
to specify padding in the contour. At most one
adjustment may be immediately followed by an asterisk. This
adjustment will be used for zero or more consecutive target
units as necessary to stretch the contour over the scope unit.

Type prosody

Typically, there will be many instances of this rule in the rules
file, each of which will use a different configuration file for
different purpose (e.g. one may handle word stress, another
one the sentence-final melody of wh- questions, another one semantic
emphasis corresponding to an exclamation mark). The parameter
of this rule is the name of a file formatted as a dictionary
(see
dictionary-oriented rules)
and is further specified here.

Each prosodic adjustment occupies one line; it affects exactly one
of frequency, intensity and duration (T, I, or F, respectively)
of units positioned among others as specified. Their ordering
is insignificant, because each of them affects different
units or a different quantity of them.

The structure of an adjustment is very simple, so let's just
pick an example: i/3:4 -20. The first letter must be one
of T, I, F and specifies the quantity that may be adjusted;
the first number specified denotes the position within a unit
whose length is to be equal to the second number: here, the
rule applies at every third syllable of every tetrasyllable,
provided that the target of the rule is syllable, while
the scope is word (this is specified in the rules file as
usual, not in the prosody file). The last number, separated
by whitespace, is the intensity adjustment to be added
everywhere this specification applies. It is an integer value.

It is also possible to have an adjustment applied for any
length of the scope unit (in the example above, for words
of any number of syllables. To do this, use "*" as the
second number of the adjustment. Also, it may make sense
to count the target unit starting at the end of the scope
unit; in this case append the word "last" to the first number.
An example could be f/1last:* -30, or "drop the pitch by 30
for last syllables of every word". Consequently, at most three
distinct rules may affect a unit; if that happens, only one is
chosen -- the more specific one, or, if both contain the
asterisk, the one counting from the beginning is chosen.
An example, in order of decreasing precedence:

t/1:2 +30
t/1:* +20
t/2last:* +5

You can therefore override general adjustments with exceptions
for some lengths which have to be handled separately.

If multiple prosodic rules (using their own files) supply
adjustments for a certain unit, the adjustments are summed.

It is important to understand the difference between
e.g. a syllable and its phones: the syllable can have an entirely
different prosodic value than its phones; for every given segment,
the value for any prosodic quantity is obtained by totalling
the values for all of higher levels units it is contained in.
This independence of levels of description might theoretically
be useful for modeling tone languages.

Type smooth

Smoothing out of one of the F,I,T quantities. The parameter is

quantity/left_weights/base_weight\right_weights

where the left_weights,
if there are multiple ones, shall be slash separated, the right_weights shall
be backslash separated. The new value of the quantity specified for any
target is computed as a weighted average of the values for the surrounding
units at the same level. If the target is too near to the scope boundary
to have enough neighbors in some direction, the value for the last unit
in that direction instead.

Example:

smooth i/10/20/40\20\10 word syll

applied to the second word un-ne-ce-ssa-ry will adjust intensity values
for all of the syllables. E.g. the second syllable will be computed as
0.3 x i("un") + 0.4 x i("ne") + 0.2 x i("ce") + 0.1 x i("ssa")

The computations for different units do not interfere. The weights can
also be specified as negative quantities and/or as sums of more values.
This permits linear parameterization of the rules.

The smooth rule has also an unavoidable side effect. If (some of)
the prosodic adjustments are assigned at the word level, for example, and smoothing
should take place at the syllable level, it is first necessary to move
the prosodic information down to the syllable level. It is done by adding
the quantity found at the word level to every contained syllable and by
removing it from the word level at all. The unit::project method
is responsible for that; it is called before the actual smoothing.
Prosodic adjustments existing at lower levels than is the one being smoothened
are ignored by the smooth rule.

Multiple rules are occasionally necessary where there are syntactical
placeholders for a single rule only. Or, several rules have to be
grouped in a certain way -- for example, when one rule has to be chosen
nondeterministically out of a set of rules. To satisfy these needs, Epos
rules include three types of composite rules with different semantics.
A composite rule is syntactically treated a single rule.

Blocks of Rules

A block is a sequence of rules enclosed within braces ("{" and "}").
Both the opening and the closing brace follow the rule syntax, but
they take no parameters except for an optional scope specification.
The block is treated as a single rule, which is useful especially
with conditional rules:

if condition
{
do this
do that
}

The rules are applied sequentially, as you would expect, for every
unit of the proper size as given by the scope of the opening brace.
This means that every word (if the scope is word) is processed
separately throughout all the rules in the block. This involves
some splitting of execution on entering the block. By default, no
such splitting is done and the block inherits its scope from its
master rule (a conditional rule, a block it is encapsulated in,
or the global implicit block which covers all the rules altogether).
Consequently, the scope of any enclosed rule may not be larger
than the scope of the block.

Any macros defined in the block are local to the block. The semantic
details are C-like and are by no means important.

Choices of Rules

A choice is a sequence of rules enclosed within brackets ("[" and "]").
Both the opening and the closing bracket follow the rule syntax, but
they take no parameters except for possible scope specification.
The choice is treated as a single rule.

Whenever the choice is applied, one of its subordinate rules is chosen
at random for every unit of the proper size as given by the
scope of the opening brace, and only this rule is applied.

Generally, choices behave like blocks; the main difference is that with
blocks, all of the rules are applied, whereas with choices, exactly one
of them gets applied (possibly different rules for different pieces of
the text processed).

Empty choices (with no rules within) are not tolerated, contrary
to empty blocks.

Length-based Selection of Rules

A (length-based) switch is a sequence of rules enclosed within angle
brackets ("<" and ">"). Both the opening and the closing bracket follow
the rule syntax, but they take no parameters except for possible scope
and target specification. The switch is treated as a single rule.

Whenever the switch is applied to a scope unit, target units contained
within are counted. If n units are found, the n-th rule in sequence of
the subordinate rule is applied.

If there is less than n rules available, the last one will be used.
You can avoid this behavior by specifying "nothing" after the last rule.

The conditional rules execute the following rule if and only if
a condition is met. The condition is specified as the parameter,
the following (conditioned) rule is given on a separate line
(or lines, if e.g. a
composite rule
follows). (Comments, whitespace and empty lines may intervene as
usual.) It is not syntactically necessary to indent the conditioned
rules with whitespace, but it is strongly recommended for readability.

The conditioned rule is syntactically considered to be a part
of the conditional rule.

Type inside

Apply a rule or a block of rules within certain units only.
The parameter is a list of values at the scope level, wherein the
following rule should be applied; the
except
operator may be used.

Every unit (a sentence, for example), which fulfills the criterion,
is processed separately, therefore the scope of the following rule may
be at most that of the inside rule itself.

This example takes action only if the phr_break variable
is set. The action is to insert a hash character (representing
a pause) to the phone level at the end of every colon, and to
affect pro prosodic values of the new character, so that the pause
is sufficiently short. Notice the necessary escaping of the hash
character so as not to confuse it with a comment-out character.

Type near

Apply a rule or a block of rules within units which contain
at least one of the specified units. The parameter is a list
of values at the target level, which are looked up in a unit
of the scope level; the
except
operator may be used. If an occurence is found, the following
rule gets applied to the scope level unit.

If the parameter begins with an asterisk, the asterisk is treated
as an except operator and the test is negated. In other words,
the following rule gets applied, if every target level unit contained
meets the set description with the leading asterisk ignored.
You can combine asterisk and an extra except operator to get tests of
the "contains no characters of this class" type.

A fragment of this kind can be used to spell out all
words which contain no vowels (and are thus supposedly
unpronounceable). The referenced dictionary spellout.dic
should contain the spelled out equivalents for each upper case
letter. The shift of the word to the upper case may look
puzzling, but it is actually only a technical trick to
prevent the spell-out phrases (which are supposedly listed
in lower case) to be spelled out themselves.

Likewise,

near *$vowel

operates only on words consisting solely of vowels;

near $vowel

operates on words which contain at least one vowel and

near !$vowel

operates on words which contain at least one non-vowel.

Type with

Apply a rule or a block of rules for listed units.
In contrast with the preceding rule type, this refers not only
to the token at the scope level (such as space), but to the whole
structure (such as the string of phones delimited by the space).

The parameter is a dictionary filename or a quoted dictionary;
it should list the strings subject to the following rule,
such as special words. All the details
concerning the syntax of the parameter are exactly the same
as with other
dictionary oriented rules
and a simple example is given at the
raise rule.

(Advanced users: replacers can be specified in the dictionary
and they will be used to replace the replacee as with any
other dictionary-oriented rule, but the replacement process
will not be iterated.)

The parameter can optionally be prefixed by an exclamation mark,
in which case the subordinate rule will be applied exactly to
those units which did not match instead of those which did.

An example of how to apply a block of rules to all words except
the words "exception" and "resistant":

with !"exception resistant" word
{
...
}

Type if

Apply a rule or a block of rules only if a condition (given
by the parameter) is met. The condition must currently be specified
as a boolean voice configuration option (possibly a soft option)
or its negation (i.e. prefixed with an exclamation mark).

Example:

if !colloquial
{
...
}

The rules within the block will be applied only if the colloquial
option is not set.

This if rule inherits its scope from its parent rule
if not specified explicitly.

Again, the scope of a subordinate rule may not be larger than that of
the if rule itself.

Type regex

Regular expression substitution. The parameter is of the form
/regular_expression/replacement/. This rule type is similar to subst
with only one dictionary item, but it is way more powerful and more arcane;
its use is not intended for end wizards nor trivial tasks.
For a regular expressions' overview, UNIX users can consult e.g. the grep
manual page, whereas Windows users can telnet to a nearby UNIX machine and
write man grep there.

Epos uses the extended regular expression syntax with the following difference:
in "regular" regular expressions, parentheses match themselves, while
the open group and close group operators are \( and \), respectively.
As we use groups heavily and next to none real parentheses, we decided
to do it the other way round. Also, sed users may be surprised
by the iterative behavior of the regex rule type in Epos.

The replacement may contain escape sequences referring to the match of
the n-th group within the regular expression: \1 to \9.
\0 represents the entire match, but this is probably unusable under the
current design, as this would cause an infinite substitution loop.

In order to use this type of rule, you need to have the rx or regex
library already installed and have WANT_REGEX enabled in common.h.
This is because we don't actually implement the regex parsing stuff; we leave it
to your OS libraries. In case you don't have such libraries installed, we use
the glibc implementation (rx.c in the Epos distribution).

Note that if your system doesn't support locale setting nor provides
a usable regex library, you can't use named character classes such
as [:upper:] in your regular expressions. This is the case
on Windows CE.

Type debug

Debugging information during the application of the rules.
Scope and target are ignored, the parameter is parsed lazily.

Parameter "elem": dump the current state of the text being processed
Parameter "pause": wait until keypress