Category Archives: Open Source

Post navigation

I’ve got a thing about screen pops.
I’ve written before about using Asterisk and XMPP to enable IM-based screen pops, but the recent release of Asterisk 1.8 creates a whole new reason to be excited about this topic.

Now that you can both send and receive XMPP messages via the dialplan, it is possible to build sophisticated CTI applications using standards-based XMPP servers and clients with nothing but extensions.conf. Here’s how.

You’ll need an XMPP server with (at least) two accounts. One for you, as a user. One for Asterisk. You’ll also want to fire up your XMPP client and add the Asterisk user to your buddy list.

Set up jabber.conf with the details of the Asterisk account on your XMPP server (make sure you run jabber reload in the Asterisk CLI after modifying the file):

Once you’ve done that, you’ll need to add some dialplan logic to use both JabberSend() and JABBER_RECEIVE (run dialplan reload in the Asterisk CLI after adding this logic):

In this simple example, anytime a call comes into the default context, a set of IM messages are sent to the XMPP account user@xxx.xxx.xxx.xxx (where xxx.xxx.xxx.xxx represents the host name/IP for your XMPP server). The following line in the dialplan will cause Asterisk to wait 10 seconds to receive a response from user@xxx.xxx.xxx.xxx.

When a response is received, it is read into the variable OPTION. Subsequent dialplan logic will either send the call to the extention that was dialed, or simply hang up (you could just as easily add options and logic to route the call to one of several different phone numbers or to voicemail).

That’s it!

This powerful new addition to Asterisk makes building sophisticated, interactive XMPP-based screen pops easy. Just imagine what other juicy little nuggets await in the new version of Asterisk.

Node.js is a framework for building server-based applications in JavaScript. Node.js is event driven, so if you got a fairly good understanding of state-driven development frameworks you’ll probably get it quickly. If not, start here.
I wanted to learn more about Node.js, so I decided to build a module. There are lots of modules out there, but I wanted to do something very specific with mine. I wanted to use Node.js to build voice applications. (Not a shocker, it’s what I do.)

Turns out Node.js is a very nice match for the Tropo WebAPI, a cloud-based API for building sophisticated speech and communication applications. The Tropo WebAPI speaks JSON, and I can’t think of any more natural way of creating and consuming JSON than with good ‘ol JavaScript. Really, you can see why this gets me excited.

The Node.js module that I’ve been working on for interacting with the Tropo WebAPI is now available on GitHub. It comes with some very nice examples, and even a set of unit tests (yes, Virginia, you can write unit tests with Node.js). It has everything you need to get started using Node.js to write voice apps in JavaScript.

If you decide to give it a try (which I hope you do), there are some additional ingredients I would recommend adding to the mix:

CouchDB – The wonderfully powerful document-oriented storage engine that uses both JavaScript (for map/reduce and views) and JSON (for storing documents). There are also many fine Node.js modules available for interacting with CouchDB.

With these ingredients you’ve got a pretty powerful foundation on which to build robust, sophisticated multi-channel communication apps.

But why would you want to build a voice application with JavaScript?
Pretty much all of the voice application development tools and technologies that have been developed over the last decade or so have one essential unifying characteristic – each of them seeks to leverage easy to understand, low cost web technologies to build phone applications.

This principle can be seen very clearly in the approach embodied by the new Node.js library for the Tropo WebAPI. If you can write JavaScript, you can build sophisticated, cloud-based communication applications that not long ago required specialized skills, training, software and hardware (Big bucks, people. Big bucks).

Cloud-based telephony services based around simple to use APIs that employ widely supported standards like HTTP and JSON are democratizing phone and voice application development.

It’s really exciting to be a part of this trend and to contribute tools that others can use to build powerful applications.

One of those factors – support for speech recognition – is a good differentiator for developers to use when choosing a cloud telephony platform.

Speech recognition is becoming increasingly important in our everyday lives. Smartphones and powerful handheld devices enable multimodality, and there are more and more restrictions placed on our use of phones while doing other tings (like driving).

Plus, I can’t think of a more deflating concept than a cloud telephony provider that allows developers to build sophisticated apps and mashups in the language of their choice but that chains users of those apps to a telephone keypad. No fun.

To give an example of how powerful speech recognition can be, and how easy it is to use with a cloud telephony provider that supports it, I worked up a small demo to illustrate the point. The sample code for this demo is on Github, and we’ll dive into it in more detail below.

This demo uses two PHP libraries that are designed to work with the Tropo platform (one of the only cloud telephony providers to support speech recognition):

Let’s take the example of a company directory that allows callers to dial a single number, select a person or department at the company and then be transferred to the person they select.

With cloud telephony, there is no need to have such a system live on a machine in the server room – it can be hosted externally in the cloud, making it easier to manage and to scale. In addition, with the Tropo Platform, it doesn’t have to be the same tired old DTMF-based menu telling callers to press an extension number or to “dial by name…”.

This script is pretty self-explanatory, but there are some key points I want to emphasize. First, note the $options array that holds the reference to an external grammar file (more on that in a bit). Tropo seems to need for this reference to be an absolute one and not a relative reference to the file (not hard to do with PHP – you just need to be aware of it).

Also, the file reference needs to include a trailing parameter indicating that this is an XML grammar (;type=application/grammar-xml). This seems to be true even if the grammar file is served with the correct MIME type by whatever is serving it.

Now lets have a look at this grammar file.

This simplistic example demonstrates how to use the PHPGrammar library. Note the simple array structure that is being used to hold the details of employees for our fictitious company. This could very easily be replaced with a dip into a data source of pretty much any kind, like an LDAP directory or database holding employee details.

Also note in this example that we want to do something referred to as Semantic Interpretation. Our grammar file is a set of rules that will be applied to what the caller says – Semantic Interpretation (SI) dictates the value that is given to our application from the grammar when a successful match occurs.

In this example, we want the caller to be able to say the name of the person they want to be transfered to. We make the first name optional so they may either say the last name of the person or (optionally) the full name. Obviously this may need to be changed based on the size of the directory to render in a grammar file (e.g., multiple employees with the same last name).

Do note that the Tropo platform seems to require the “Script” sytax for returning SI values on a successful match as opposed to the “String Literal” syntax. (More on these alternatives here.)

Works on Tropo (Script syntax):<item>foo<tag>out="bar";</tag></item>

Does not work on Tropo (String Literal syntax):<item>foo<tag>bar</tag></item>

So, when a caller says the name of a person in our company directory we want to return the number for that person to our Tropo script so we can transfer the call to them. This can clearly be seen when we examine the Result object that is delivered by the Tropo platform.

Tropo’s Result object includes the full grammar engine output, and lots of very detailed information about the recognition. As you can see, the utterance that the speech recognition engine heard was the name of one of our faux employees. The value that was returned is the number of that person.

We use this value in the transfer_call() method of our Tropo script.

// Create a new instance of the Result object.
$result = new Result();

// Get the value of the selection the caller made.
$phone = $result->getValue();

// Create a new instance of the Tropo object and transfer the call.
$tropo = new Tropo();
$tropo->transfer('+1'.$phone);

// Write out the JSON for Tropo to consume.
$tropo->RenderJson();

Using the PHP WebAPI library, it takes just 5 lines of code (excluding comments) to get the value of the grammar result and transfer the call. How cool is that?!

Obviously there are lots of things that can be done to enhance this script, to make it more robust, but it illustrates the essential concepts of speech recognition in the cloud.

What’s more, because of all of the great functionality provided by the Tropo cloud platform we can really push the envelope on the tired old company directory:

We could take an inbound call from a Skype user and transfer to a cell phone (or a SIP endpoint).

We could let our caller select a department in our company and then ring several different numbers at once, transferring the call to the first one answered (sort of a “hunt group in the cloud”).

We could use Tropo’s built in IM capabilities to send a screen pop to the person receiving the call.

Just a few days ago, CouchDB version 0.11 was released – this new version is packed full of cool new features as outlined on the Couch.io blog. It’s also the first release without the Alpha or Beta label attached to it.
What’s more exciting, CouchDB version 0.11 is a feature-freeze release candidate for the upcoming version 1.0. So if you’ve played around with CouchDB and have an old instance laying around, now is the time to upgrade.

If you’ve read my previous series on using CouchDB to build cloud telephony applications with Voxeo’s Tropo platform, and you used my instructions for setting up CouchDB on Ubuntu 8.04, then upgrading to CouchDB version 0.11 will be a piece of cake. (Note – the mirror you download from may be different than below. Go to the download page to find the best one):

Before upgrading, make sure that any customizations you’ve made to the CouchDB configuration are in /usr/local/etc/couchdb/local.ini. The upgrade process will overwrite any changes you have made in default.ini.

$ sudo /usr/local/etc/init.d/couchdb stop

You should probably run make uninstall on the previous version of CouchDB before starting.
If you see leftover files in /user/local
$ find /usr/local -name *couch* | wc -l

Note – if you see an error that says {"error":"error","reason":"eacces"} when trying to create a database or insert documents, you may need to re run some commands listed in the previous install instructions:

This post will provide a quick overview of how the Cloudvox JSON API can be paired with PHP and the delightfully awesome Limonade Framework. If you’re not a PHP developer don’t despair – this example can easily be ported to other languages like Ruby (using Sinatra) or C# (using Kayak).

Cloudvox API Helper Classes

When writing apps for the Cloudvox JSON API, there are two things that we need to manage – the JSON that we will send to Cloudvox (using plain old HTTP) and the response Cloudvox sends back to our app (this will include any user input collected from the caller, and also things like a unique identifier for the call, caller ID and other information about the call).

To make managing both sides of our exchange with Cloudvox easier, I’ve created a set of PHP classes that can be used with any standard PHP IDE to make writing Cloudvox JSON and parsing Cloudvox responses simple and easy. You can download this class library from GitHub.

Using these classes is pretty straightforward:

Will render:

[{"name":"Speak","phrase":"Hello world!"},{"name":"Hangup"}]

Required properties are included in the constructor for each class – in most IDE’s this means that you can simply use Shift + Control + Space Bar when you create a new instance of the object to see what properties are required.

Optional properties on all classes are handled by using the PHP __set() method in the base class. This effectively let’s you overload the object and set properties which are not declared in the class definition. So for example, if we wanted to collect input from the caller (e.g., their zip code) we would use the GetDigits class, and overload it to add a URL to post the results to:

url = "http://somehost.com/myscript.php";

?>

The problem with overloading in PHP is that you don’t get the benefit of having your IDE display overload options (it can’t because the properties that we wish to set are not declared in the class definition). There also isn’t any way to control what overloaded properties get set, so its possible to add things the Cloudvox won’t understand.

I’m not sure there is any way around this given the way that PHP has implemented overloading. I do plan to work on a set of Cloudvox classes using another language that handles overloading a bit better, like C#, but for now you should only overload to set the url and method properites for classes that can use them (see the Cloudvox docs for more details).

Sipping on some Limonade

If you know of the excellent Ruby Framework Sinatra but you want to code your project in PHP, fear not – check out the equally excellent PHP framework called Limonade. It’s the functional equivalent of Sinatra for PHP.

Using Limonade with our set of Cloudvox JSON classes makes building cloud telephony applications very simple. The biggest benefit is that you don’t have to split up the different steps in your application (i.e., collecting input, validating input, re prompting, etc.) into different PHP files (which can be kind of a pain) – all of your steps can be contained within a single file.

Limonade lets you set a URL “route” that Cloudvox can send HTTP requests to with results, and to get rendered JSON for another application step. For example:

This “Hello World” example – defined in a script named sample.php – could be accessed by hitting http://somehost.com/index.php?/start. To make things easier, we’ll use a little Apache magic to allow URL rewriting:

A simple demo app using the Cloudvox JSON helper classes and the Limonade Framework can be seen below. You can use this sample application with a new Cloudvox account to get started building cloud telephony applications.

This simple app demonstrates how powerful the Cloudvox JSON API is for creating cloud telephony apps. When coupled with an elegant framework like Limonade, sophisticated, cloud-based telephony applications are readily available to any developer that wants to build one.

The helper classes for the Cloudvox API are obviously a work in progress, so if anyone reading this has comments or suggestions feel free to let me know – mheadd [at] voiceingov.org.

Facebook has been busy lately doing all sorts of interesting things to the PHP scripting language. Although most of the recent PR hype was centered around HipHop for PHP (a name that, quite frankly, makes me want to use it less), Facebook also released another very interesting and potentially useful extension for PHP – XHP. From the XHP wiki on Github:

XHP is a PHP extension which augments the syntax of the language such that XML document fragments become valid PHP expressions. This allows you to use PHP as a stricter templating engine and offers much more straightforward implementation of reusable components.

Not sure where I read it, but one commenter compared XHP to E4X in JavaScript. It’s a neat idea, and its actually not all that hard to start playing around with XHP (if you’re comfortable installing PHP extensions).

After thinking about XHP and playing around with it a bit, I started to think it might be useful as a way of generating not just HTML, but also other XML-based languages like VoiceXML.

The core XHP classes that are used for generating HTML are fairly easy to understand, once you get use to the syntax – extending these core classes to generate VoiceXML (or any other XML-based language) is not all that hard. But before we do that, let’s install XHP as a PHP extension and kick the tires a bit.

Installing XHP on Ubuntu

I’ve tried the following instructions on both Ubuntu 8.04 and 8.10, and I’m pretty sure they’ll work with just about any recent Ubuntu version.

Depending on your system, you may need to add some of the prerequisites for building this extension:

$ sudo apt-get install flex bison

I tried this approach on both Ubuntu 8.04 and 8.10 and in both cases only version 0.13.5 of re2c worked (earlier version obtained via apt-get did not cut the mustard). If you’re running a later version, you may be able to get this version of re2c from the standard repos.

When you run make install, the new extension file is placed in a directory where Apache can load it – however, we need to modify the php.ini file so that Apache is aware of the extension. Open php.ini and add the following:

extension=xhp.so
extension_dir=directory_where_xhp.so_is_located

When setting this last option, use the directory where xhp.so was placed by make install. Now we just restart Apache:

$ sudo /etc/init.d/apache2 restart

Easy right? Unfortunately, things get a little less clear at this point. It turns out that to make XHP work properly, some PHP libraries need to be included in any of the XHP scripts we write. These files are located in the php-lib directory of the XHP source code.

To make things easier (especially when we start to write our own extension to XHP for VoiceXML, lets move these files to a convenient local directory that we can include in our scripts:

$ cp php-5.X.X/ext/xhp/php-lib/* my/local/directory/php-lib/

So now we can do some interesting stuff like this:

What’s especially interesting about XHP is that it enforces proper syntax at compile time, so if your markup isn’t syntactically correct an exception gets tossed.

Generating VoiceXML with XHP

The XHP libraries we just discussed implement the HTML spec out of the box. However, if you try and render tags that are not part of the HTML spec an exception will occur. I wanted to find out how hard it would be to extend the concepts behind XHP for other markup languages, like VoiceXML. Turns out, its not hard at all.

If you look at the init.php file that is included in the example above, you’ll see its in turn including a file called html.php, which defines all of the HTML elements that can be rendered by XHP, the attributes that each can have and also the parent-child relationship between elements.

Using this is a guide (the syntax is new but fairly easy to follow), I knocked out a quick class file for some basic VoiceXML elements – just to illustrate the concept:

This is admittedly rough, and it only covers a few basic VoiceXML elements, but it demonstrates that extending XHP to render VoiceXML is actually quite easy. To use this file, simple edit the init.php file mentioned previously and add an include statement:

Now we’re ready to use XHP to generate VoiceXML:

How cool is that?

I’m still toying around with XHP, but this little experiment clearly shows that it has use beyond just simply rendering HTML. I’d be interested in hearing from other developers – is this worth a full blown project to flush out a complete VoiceXML class library for XHP? What other markup languages would make good candidates for this same type of approach?

Post a comment here, a tweet to @mheadd or shoot a quick e-mail to mheadd [at] voiceingov.org with your thoughts or comments.

This is the continuation of a series that will describe how to build voice applications with the Tropo cloud telephony platform and CouchDB.
In the last post, I detailed how to get a CouchDB instance up and running on Ubuntu, and how to get an account started on Tropo so that you can start building cloud telephony applications. In this post, we’ll create our first CouchDB database and create a simple Tropo application that connects to our CouchDB instance. First, however, we need to tweak the default settings for CouchDB so that we can access our CouchDB instance from the an external environment.

Configuring CouchDB

Recall from the last post that the configuration files for CouchDB are located in /usr/local/etc/couchdb/. Open the local configuration file and take a look at the default settings:

$ sudo vim /usr/local/etc/couchdb/local.ini

In the [httpd] section, you’ll notice the setting for the default port that is used to connect to CouchDB – 5984. You’ll also note the bind_address setting. By default, CouchDB listens only on localhost – you can change this by altering the value of bind_address to a publicly resolvable IP address (you may need to uncomment this setting as well).

However, before proceeding please note that CouchDB does not yet have a built in security model, so anyone that can access the IP address in the configuration file can potentially access your CouchDB instance. We’ll need to take some steps to restrict access to our CouchDB instance – there are several ways of doing this.

Once you have your CouchDB instance up and running, you can create a database in one of two ways. The first, and easiest, is simply to use the curl command. You create a database in CouchDB by using the HTTP PUT method:

Now that we have an initial database in our CouchDB instance, lets build a simple Tropo application that will populate it with records (or documents in CouchDB parlance):

This simple application is a basic auto attendant. It asks the caller for a 4-digit extension and then transfers them to a 10-digit PSTN number. At the end of the call, we write a very simple call log document to our new call_logs database using the HTTP POST method.

(One small side note – you can use either the POST or PUT methods to insert a document into a CouchDB database. However, using PUT assumes you want to assign a specific document ID to your document. When you use HTTP POST, CouchDB will automatically assign a document ID. For now, we’ll keep things simple and use POST.)

Much of the functionality in this simple app is just stubbed out for now – i.e., the getPhoneNumberByExtension() method – we’ll build more of this out in later posts.

Modify this file by adding your instance-specifc details to the constant declarations at the top. Do also note that the last two constants can remain blank for now.

When you load this file up on Tropo and make a test call, you will see your call log document is inserted into the call_logs database. The structure of the document is pure JSON, which is supported quite nicely in PHP (and most every other language that can run on Tropo as well).

In the next post, we’ll examine CouchDB design documents in more detail and modify our simple demo application to get a list of extensions from another CouchDB database and parse the JSON data structure in the getPhoneNumberByExtension() method.