Rounding up his series on Voice XML, Frank Coyle takes a look at how voice can add a new rich dimension to your Web applications, especially those centered around XML. With Web 2.0 and mashups on the rise, adding Voice XML to the mix lets you pull and push Web-based information to your users wherever they may roam (as long as they take their cell phones).

Like this article? We recommend

Like this article? We recommend

This is the last article in our series on Voice XML, so let’s wrap
things up by expanding your thinking about how you can use voice as an enabler
for your applications. When we look at the classic success stories of Voice XML
(for example, the American Airlines flight service or 1.800.DOMINOS), we see
Voice XML as a front end to large database-driven applications where voice input
acts as a convenient front end similar to the role of the classic HTML form.
Data is collected from a user and delivered to a server, where some back end
database processing occurs. The resultant information is then delivered back to
the voice client end user. This is what I call the classic voice application.

But let’s expand our thinking a bit in light of the ubiquity of cell
phones and the emergence of Web 2.0, where applications are assembled by
connecting software components in new and often exciting ways. The latest term
for this is mashup, a word intended to reflect the kind of guerilla
assembly process that’s driving the creation of a new generation of web
apps that are built around web services.

So where does Voice XML fit in? The answer is that it fits wherever you need
a voice component to drive or augment your application. The ease of setting up
free developer accounts with Voice XML providers such as Voxeo enables you to
begin experimenting with voice for your own apps, hopefully leading to the next
great mashup idea.

To give you some food for thought, let’s look at how we can use
JavaScript to increase the intelligence of our voice apps and then explore how a
little XML data on a server can go a long way toward helping generate dynamic
Voice XML content for you or your users.

JavaScript and Voice XML

JavaScript has been getting a lot of attention lately in the Web 2.0 zone as
the key ingredient for doing client-side AJAX. The good news is that much of
that JavaScript expertise can be leveraged in your Voice XML apps. Technically,
we’ll be looking at ECMAScript, the international "JavaScript"
standard that has been adopted as scripting language for Voice XML.

NOTE

When I refer to JavaScript in this article, technically it will
refer to ECMAScript.

One of the benefits of ECMAScript is that you can access Voice XML variables
within ECMAScript. Elements that accept the expr attribute can use
arbitrary ECMAScript code to generate a value at runtime. And you can abstract
your commonly used ECMAScript functions into functions or libraries to support
reuse in your Voice XML pages.

Some key things to note about JavaScript include the following:

Voice XML variables are equivalent to ECMAScript variables. Voice XML
variables can be passed to JavaScript functions. Values returned from functions
can be stored in Voice XML variables.

The expr attribute available with many tags can refer not only to
Voice XML or ECMAScript variables but also can include ECMAScript function call
expressions.

ECMAScript can be placed inline in the Voice XML document using the <script> element, or scripts can be loaded from a URI.

Let’s begin by writing a simple JavaScript function called multiply that returns the product of two numbers. Imagine that
you’re driving along the highway, wearing your cell phone headset, and you
need to do a quick calculation. You trigger a call to your voice application and
the following dialogue ensues:

As you can see, the JavaScript part of the code is quite simple. It’s
embedded inside a CDATA block so we can use characters that might upset an XML
parser. For example, if we want to use a less-than sign (<) as part
of our script, a parser will get confused thinking that we are starting off a
new element and return a nasty parsing error. By enclosing our scripts in CDATA,
we’re free to use JavaScript constructs without fear of parser
retaliation.