The W3C Voice
Browser Working
Group aims to develop
specifications to enable access to the Web using spoken
interaction. This document is part of a set of requirements studies
for voice browsers, and provides details of the requirements for
markup used for specifying application specific pronunciation
lexicons.

Application specific pronunciation
lexicons are required in many
situations where the default lexicon supplied with a speech
recognition or speech synthesis systemprocessor does not cover the
vocabulary of the application. A pronunciation lexicon is a
collection of words or phrases together with their pronunciations
specified using an appropriate pronunciation alphabet.

Status of this Document

This section describes the status of this document at the
time of its publication. Other documents may supersede this
document. The latest status of this document series is maintained
at the W3C.A list of current W3C publications and
the latest revision of this technical report can be
found in the W3C technical
reports index at http://www.w3.org/TR/.

This document describes the requirements for markup used for
pronunciation lexicon, as
a precursor to starting work on Speech Interface Framework.
This new requirements list replaces the old requirements.
New requirements are now in line with VoiceXML 2.0 Recommendation,
and other Voice Browser Working Group
specification requirements. Changes between these two versions are described in a diff document.
You are encouraged to subscribe to
the public discussion list <www-voice@w3.org> and to mail us
your comments. To subscribe, send an email to <www-voice-request@w3.
org> with the word subscribe in the subject line
(include the word unsubscribe if you want to unsubscribe).
A public
archive is available online.

Publication as a Working Draft does not imply endorsement by the
W3C Membership. This is a draft document and may be updated,
replaced or obsoleted by other documents at any time. It is
inappropriate to cite W3C Working Drafts as other than "work in
progress". A list of current W3C Recommendations and other
technical documents can be found at http://www.w3.org/TR.It is
inappropriate to cite this document as other than work in progress.

Why do we need such a markup language?

In voice browsing applications there is often a need to use
proper nouns or other unusual words within speech recognition
grammars and in text to be read out by Text-to-Speech systemsprocessors.
These words may not be present in the platforms' built-in
lexicons, in. In
such cases voice browsers typically resort to automatic
pronunciation generation algorithms, which tend
to produce pronunciations of poorer quality thanmay
be improved by manually specificied
pronunciations. The goal of the pronunciation lexicon markup is to
provide a mechanism for application developers to supply high
quality additional pronunciations in a platform independent
manner.

In many cases application developers will need to only provide
one or two additional pronunciations inline within other voice
markupsmarkup languages, but there are
other cases where an application may make
use of large pronunciation
lexicalexicons
that cannot conveniently be
specified inline and will have to be provided as separate
documents. The pronunciation lexicon markup will address both
communities.

The markup language for pronunciation
lexicalexicons
will be developed
within the following broad design criteria. They are ordered from
higher to lower priority. In the event that two goals conflict, the
higher priority goal takes precedence. Specific technical
requirements are addressed in the following sections.

The pronunciation lexicon markup language
for pronunciation lexica will enable
consistent, platform independent control of pronunciations for use
by a voice browsing applications.

The pronunciation lexicon markup language
for pronunciation lexicon should be
sufficient to cover the requirements of speech recognition and
speech synthesis systems within a voice
browerbrowser.

The pronunciation lexicon markup language
for pronunciation lexica will be an XML
Applicationlanguage and shall be interoperable with relevant W3C
specifications (see the section 2: Interoperability Requirements
for details).

The pronunciation lexicon markup language
for pronunciation lexicon will be
internationalized to useusable in of a large number of
human languages (see the
mono-lingual and multi-lingual requirements3.4 and 3.5).

It should be easy and computationally efficient to
automatically generate, author by hand and process documents using
the pronunciation lexicon markup language.

All features of the pronunciation lexicon markup language
for pronunciation lexicon
should be implementable with existing, generally available
technology. Anticipated capabilities should be considered to ensure
future extensibility (but are not required to be covered in the
specification).

The pronunciation lexicon markup language
for pronunciation lexicon specification
should be prepared quicklyeasy to author,
where appropriate deriving from
existing pronunciation
lexicalexicons
formats and using exisitngexisting pronunciation alphabets.

The pronunciation lexicon markup language
should allow the specification of character
encoding for text data to ensure proper support for
internationalization.

ISSUE #1: Point 5 above - Is it clear? Better wording ...ISSUE #2: Point 8 above - Do we need it? It seems to be implied by XML.

The pronunciation lexicon markup must be interoperable with
other relevant specifications developed by the W3C Voice Browser
Working Group. In particular the pronunciation lexicon markup must
be compatible with the Speech Synthesis
Markup Language
[SSML] and,
Speech Recognition Grammar MarkupSpecification [SRGS], and the (unpublished) dialog markup language.

It should
be possible to embed the pronunciation lexicon markup
withinThe pronunciation lexicon markup
may be embedded in the Speech
Synthesis Markup Language
[SSML] and in,
Speech Recognition Grammar MarkupSpecification [SRGS], and the (unpublished) dialog markup language.

The pronunciation markup may provide a mechanism to allow the
specification of multiple independent pronunciation lexicons within
a single document. This may be useful for separating lexicons into
application specific classes of pronunciation e.g. all city
names

The pronunciation lexicon markup may provide
named groupings of lexicon entries within a single lexicon
document.This may be useful for separating lexicons into
application specific classes of pronunciation e.g. all city
names.

The pronunciation lexicon markup must provide the ability to
specify the pronunciation alphabet for use by all entries within a
lexicondocument, such as the phonetic
alphabet defined by the International Phonetic Association IPA
[IPA].

The pronunciation lexicon markup must support the ability to
specify a pronunciation lexicon for a single language within a
single document and identify the language of the lexicon.Language
identifiers should follow the recommendations of rfc1766 or its
successors

The pronunciation lexicon markup must provide the ability to
specify language identifiers for use by all entries within a
document. Each language identifier must be expressed following
RFC 3066 [RFC3066].

The pronunciation lexicon may support the ability to specify
language for an individual entry within a lexicon, thereby allowing
multilingual entries within a single lexicon.Language identifiers
should follow the recommendations of rfc1766 or its
successors

The pronunciation lexicon may support the ability to specify
language identifiers for an individual entry within a document.
Each language identifier must be expressed following
RFC 3066 [RFC3066].

The pronunciation lexicon markup should provide a mechanism for
specifying metadata within pronunciation lexicon documents. This
metadata can contain information about the document itself rather than
document content. For example: record the purpose of the
lexicon document, the author, etc.

ISSUE #4: Do we actually need this requirement if we
use metadata in a standard way?

The pronunciation lexicon markup shouldmust allow multi word
orthographies. This is particularly important for natural speech
applications where common phrases may have significantly different
pronunciations to that of the concatenated word pronunciations,
requiring a phrase level pronunciation. An example would be "how
about" often pronounced "how 'bout".

The pronunciation lexicon markup should provide a mechanism to
indicate the broad syntactic category of the orthography, e.g.
noun, verb, pronoun etc. Required to enable recognisers and/or
synthesizers to select the lexicon entry appropriate for the
context.The markup may define these categories. These categories
may be based upon existing standards such as EAGLES

The pronunciation lexicon markup must provide a mechanism for
lexicon developers to associate miscellaneous additional
information with an orthography, for example to store more detailed
syntactic/part-of-speech tags.

In some situations lexicon entries will be explicitly addressed
from other voice markups, however at other times markups may import
entire pronunciation lexicon documents. In these cases the voice
browser will need to lookup and match words within, for example,
the Speech Synthesis Markup
Language [SSML] and
Speech Recognition Grammar
MarkupSpecification [SRGS]
against the orthographies present in the lexicon. It is
likely that a certain degree of textual variability will need to be
allowed in order to ensure that the pronunciation lexicon is
useful.

The pronunciation lexicon markup specification must
make a
statement aboutprovide a mechanism to indicate
the allowable textual variability in the
orthography. Types of variability include, but are not limited
to,

Whitespace handling

Case sensitivity

Unicode sequence variation

Valid character sets

Diacritics within languages such as Arabic or Farsi

Accent matching within languages such as French.

The definition of a standard text normalisation scheme is beyond
the scope of this specification.

The pronunciation lexicon markup specification mustmay provide a
mechanism to deal with the problem of specifying
homographs, (words with
the same
spelling -, but potentially
different meanings and
pronunciations), within
the same lexicondocument.

The pronunciation lexicon markup may provide a mechanism for
indicating the dialect or language variation
for each pronunciation, as described in
RFC 3066 [RFC3066],
such as "en-scounse". For example in UK
english Rhotic Irish, London Cockney, North British etc. Such a
mechanism should follow any appropriate recommendations described
in rfc1766 or its
successors.

The pronunciation lexicon markup shouldmust enable indication of
which pronunciation is the preferred form for use by a speech
synthesizer where there are multiple pronunciations for a lexicon
entry. The pronunciation lexicon markup
language specification
shouldmust
define the default selection behaviour for the situations where
there are multiple pronunciations but no indicated preference.

The pronunciation lexicon markup may allow for relative
weightings to be applied to pronunciations. These weightings to
indicate the relative importance of the pronunciations within a
single lexicon entry.This can be useful for speech recognition
systems.

The pronunciation lexicon markup may allow for an indication of
pronunciation quality. This can be useful for providers of
pronunciation lexica and for users of external lexica such as
Onomastica, COMLEX. Examples of such quality levels may include
Manually generated and checked, Manually generated, Automatically
generated.

The pronunciation lexicon markup should allow the specification
of the pronunciation of an orthography in terms of other
orthographies with previously defined pronunciations, for example,
the pronunciation for "W3C" specified as the concatenation of
pronunciations of the words "double you three see".

ISSUE #7: Note about it is dangerous in case of
multiple pronunciations.

The pronunciation lexicon markup may provide the ability to
specify a different pronunciation alphabet to be used for each
pronunciation of a lexicon entry. For example this would allow a
lexicon entry to have two pronunciations for a particular
word/phrase, each pronunciation being in a different pronunciation
alphabet. This may be useful when merging pronunciation lexicon
from different sources. This may also be useful for enabling
platform specific optimised pronunciations.

The pronunciation lexicon markup should reuse standard
pronunciation alphabetsWe will standardize on at least
one existing pronunciation alphabet, such as the phonetic
alphabet defined by the International Phonetic Association IPA [IPA].
In particular the pronunciation alphabets
recommended by the Pronunciation alphabet sub group. We do not plan
of developing a new standard pronunciation alphabet.

The pronunciation alphabet must provide a mechanism for
indicating suprasegmental structure such as, word/syllable
boundaries, and stress markings.The specification may address other
types of suprasegmental structure.

The specification must address the issue of compliance by
defining the sets of features that must be implemented for a system
to be considered compliant with the specification. Where
appropriate, compliance criteria may be defined with variants for
different contexts or environments.

The pronunciation lexicon markup should aim for a compact
representation to minimise network bandwith requirements when
transferring lexica between server and voice browser. Where this
conflicts with the generic requirement for human readibility then
readability takes precedence.

The pronunciation lexicon markup should provide a mechanism for
specifying meta data within pronunciation lexicon documents. This
meta data can contain information about the document rather than
document content.

This section contains issues that were identified during
requirements capture but which have not been directly incorporated
in the current set of requirements.

7.1 XPathMore powerful
addressing for Lexicon Entries

iIt may be desirable to provide an addressing scheme for lexicon
entries that is more flexible than the document and fragment URI
schemes currently listed in the requirements. An example of a more powerful
addressing mechanism could be XPath.

7.2 Prefix/Suffix morphological rules

In some situations the explicit specification of all the
morphological variants of a word can lead to extremely large
lexicons. A standard scheme for providing prefix and suffix
morphological rules would enable more compact lexicons documents. However it
is felt that the most common use of the pronunciation lexicon
markup will be for proper nouns where morphological variance is
markup will be for proper nouns where morphological variance is
less of an issue, and that standardisation of morphological rules
will be too difficult to achieve in a first draft. Off-line tools
may provide mechanisms for generating morphological variants.

7.3 Context Dependent orthographies

In some languages the pronunciation of an orthography and the
orthography itself are dependent upon the context in which this
orthography is used. The requirements do not address this issue. It
may not be possible to resolve this issue in a vendor independent
manner. It is possible that the additional information field could
be used to handle this situation in a platform dependent
manner.

7.4 Compound words

In languages such as German and Dutch words can occur as part of
compound words and in some cases may only occur within compound
words. The requirements do not say how compound words will be
handled.In the future, the pronunciation lexicon
markup should address handling compound words.

The editor wishes to thank the previous author
of this document, Frank Scahill, and the old and new
members of the pronunciation
lexicon subgroup of the W3CVoice
Browser Working
Groupinvolved in this activity (listed in alphabetical order):