http://www.w3.org/ -- 8 September 2004 -- Strengthening
the voice of the Web, the World Wide Web Consortium (W3C) has
published the Speech
Synthesis Markup Language (SSML) 1.0 as a W3C Recommendation. SSML 1.0, a
fundamental specification in the W3C Speech
Interface Framework, elevates the role of high-quality synthesized speech
in Web interactions. Application designers for mobile phones, personal
digital assistants (PDAs), and a host of emerging
technologies use SSML 1.0 to achieve both coarse- and fine-grain control of
important aspects of speech synthesis, including pronunciation, volume, and
pitch. Like its companion W3C Recommendations VoiceXML 2.0 and Speech Recognition Grammar
Specification (SRGS) published by the W3C Voice Browser Working Group, SSML 1.0
is built for integration with other Web technologies and to promote
interoperability across different synthesis-capable platforms.

"I am excited about the progress the Voice Browser Working Group has made
in providing improved access to services over the telephone through the use
of Web technologies," said W3C Director Tim Berners-Lee, who will be
delivering a keynote address at the SpeechTEK Conference next
week. He added, "Companies can now offer Web access to their customers via
the telephone as well as from a personal computer."

Aimed at the world's estimated two billion fixed line and mobile phones,
W3C's Speech Interface Framework — a collection of specifications for
building voice applications for the Web — will allow an unprecedented
number of people to use any telephone to interact with appropriately designed
Web-based services via key pads, spoken commands, listening to pre-recorded
speech, synthetic speech and music.

A World Wide Web Consortium (W3C) Recommendation
is understood by industry and the Web community at large as a Web standard.
Each Recommendation is a stable specification developed by a W3C Working
Group and reviewed by the W3C
Membership. Recommendations promote interoperability of Web technologies
by explicitly conveying the industry consensus formed by the Working
Group.

A Rich Vocabulary for High-Quality Speech

One of the primary challenges to strengthening the voice of the Web that
SSML addresses is pronunciation. For example, how do you pronounce "1/2"? The
SSML 1.0 specification uses this simple example to illustrate some of the
challenges of turning general purpose text into meaningful synthesized
speech. Without additional context, one would not know whether to say "one
half" or "January second" or "February first" or "one divided by two". SSML
1.0 constructs help eliminate this sort of ambiguity. The SSML vocabulary
allows word-level, phoneme-level, and even waveform-level control of the
output to satisfy a wide spectrum of application scenarios and authoring
requirements.

"SSML builds on the work of the pioneers in speech synthesis to provide
application developers with a powerful and flexible means to deliver a high
quality mix of synthetic and pre-recorded speech as part of interactive voice
response services," said Dave Raggett, Activity Lead for W3C's work on voice
browsers, and a W3C Fellow from Canon. He added, "SSML allows VoiceXML-based
services to be accessed via textphones for people with speaking or hearing
impairments. In addition, SSML has great promise beyond its use with
VoiceXML, as we look forward to emerging standards for multimodal
interaction."

Like XHTML, SSML is a markup
language based on the widely deployed XML standard. SSML
content can stand alone or be included in other XML content in order to
improve rendering as synthesized speech. Naturally, SSML is particularly
well-suited for use with a VoiceXML wrapper when building an interactive
voice response application.

SSML 1.0 is built for Web integration in other ways as well. The Voice
Browser Working Group worked closely with other W3C groups to ensure that the
design of SSML 1.0 is consistent with principles of accessibility,
internationalization, and general Web architecture. Indeed, one important
application of SSML involves "text phones" that may be used by people with
some hearing disabilities. The same content can also be output as speech
through a common telephone. SSML 1.0 is also consistent with previous work at
W3C on describing pronunciation with Cascading Style
Sheets (CSS). W3C's CSS Working Group is developing a speech module in
CSS3 for rendering XML documents with SSML-based speech engines.

Early Industry Adoption

W3C's Voice Browser Working Group has been particularly successful at
ensuring adoption of its specifications before they reach Recommendation
status. A test suite (discussed in the July 2004 SSML
implementation report) has helped ensure consistent behavior and quality
among the already numerous implementations of SSML 1.0. Vendors that have
already implemented SSML 1.0 and that are participating in Working Group
include: Aspect Communications, France Telecom, Hewlett-Packard, IBM,
Loquendo, Microsoft, MITRE, Nuance Communications, SAP, ScanSoft, Sun
Microsystems, VoiceGenie Technologies, Voxeo, and Voxpilot.

The Working Group will now focus its energies on the remainder of the
Speech Framework. "After VoiceXML 2.0 and Speech Recognition Grammar
Specification (SRGS), SSML is the third language of the W3C Speech Interface
Framework to become a full W3C Recommendation," said Jim Larson, manager,
advanced human input/output, for Intel and also co-chair of W3C's Voice
Browser Working Group. "We are working to complete work on other languages of
the W3C Speech Interface Framework, including VoiceXML 2.1, Semantic
Interpretation, and the Call Control eXtensible Markup Language
(CCXML)."