What I Did

My project idea was to develop a voice menu interface to the Archive.org live music archive using Twilio. The idea was that you would call a particular phone number, and be presented with a voice menu interface. There would be options to listen to the Archive.org top music pick, or to perform a search.

Core Technology

Archive.org

Archive.org exposes a very nice, hacker-friendly API. It is fairly well-documented here. I only encountered a few gotchas, which are that the API to the main page does not return valid JSON, and so it must be parsed using JavaScript’s eval; and, the query API is based on Lucene query syntax, which I did not find documented anywhere.

Twilio

Developing a Twilio telephony application is just like developing a regular web application. When you register with Twilio, they assign you a phone number, which you can then point to a web server URL. When someone calls the number, Twilio performs an performs HTTP request (either GET or POST, depending on how you have it configured) to the server which you specified.

Instead of returning HTML, you return TwiML. Each tag in a TwiML document is a verb which tells Twilio what to do. TwiML documents can be modelled as state machines, in that there’s a particular flow between elements. For certain tags, Twilio, will simply flow to the next tag after performing the action associated with that tag; however, for other tags, Twilio will perform a request (again, either GET or POST) to a URL specified by the tag’s “action” attribute, and will execute the TwiML document returned by that request. This is analogous to submitting a form in HTML.

Each HTTP request performed by Twilio will submit some data, like the caller’s phone number and location, as well as a variable which allows the server to track the session.

There were a few instances of undocumented behaviour that I encountered, but overall developing a TwiML application was as easy as it sounds. After I had my node.js hosting set up, I had an initial demo working in less than an hour, in which the user could call in, and would be able to hear the archive.org live music pick. This was simply a matter of using Archive.org’s API to retrieve the URL to the file of the top live music pick, and passing this URL to Twilio in a <Play> element. Twilio was then able to stream the MP3 file directly from Archive.org.

Main Technical Contribution: Using SCXML and SCION to Model Navigation in a Node.js Web Application

I developed the application using Node.js and SCION, an SCXML/Statecharts interpreter library I’ve been working on. In addition to providing a very small module for querying the archive.org API using Node.js, I feel the main technical contribution of this project was using SCXML to model web n﻿avigation, and I will elaborate on that contribution in this section.

Using Statecharts to model web navigation is not a new idea (see StateWebCharts, for example), however, I believe this is the first time this technique has been used in conjunction with Node.js.

From a high level, SCXML can be used to describe the possible flows between pages in a Web application. SCXML allows one to model these flows explicitly, so that every possible session state and the transitions between session states are well-defined. Another way to describe this is that SCXML can be used to implement routing which changes depending on session state.

A web server accepts an HTTP request as input and asynchronously returns an HTTP response as output. Each HTTP request can contain parameters, encoded as query parameters on the URL in the case of a GET request, or as POST data for a POST request. These parameters can contain data that allows the server to map the HTTP request to a particular session, as well as other data submitted by the user.

These inputs to the web server were mapped to SCXML in the following way. First, an SCXML session was created for each HTTP session, such that subsequent HTTP requests would be dispatched to this one SCXML session, and this SCXML session would maintain all of the session state.

Each HTTP request was turned into an SCXML event and dispatched as input to the SCXML session corresponding to the session of that HTTP request. An SCXML event has “name” and “data” properties. The url of the request was used as the event name, and the parsed query parameters were used as the event data. Furthermore, the Node.js HTTP request and response objects were also included as event data.

In this implementation, SCXML states were mapped to individual web pages, which were returned to the user on the HTTP response.

The SCXML document modelling navigation can be found here. Here is a graphical rendering of it (automatically generated using scxmlgui):

Note that the transition conditions do not appear in the above diagram, so I would recommend reading the SCXML document as well as the diagram.

In this model, the statechart starts in an initial_default state in which it waits for an init event. The init event is used to pass platform-specific API’s into the state machine. After receiving the init event, the statechart will transition to state waiting_for_initial_request, where it will wait for an initial request to url “/”. After receiving this request, it will transition to state root_menu. Of particular interest here are the actions in the <onentry> tag. The TwiML document to be returned to the user is inlined directly as a a custom action within <onenter>, and is executed by the interpreter by writing that document to the node.js response object’s output stream. This document will tell Twilio to wait for the user to press a single digit, and to submit a GET request to URL “/number_received” when the request completes.

There are two transitions originating from root_menu. The first targets state play_pick, the second targets state searching, and the third loops back to state root_menu. The first two transitions have a cond attribute, which is used to inspect the data sent with the request. So, for example, if the user presses “1″, Twilio would submit a GET request to URL “/number_received?Digits=1″ (along with other URL parameters, which I have omitted for simplicity). This would be transformed into the SCXML event {name : '/number_received', data : { Digits : '1' }}, which would then activate the transition to playing_picks. The system would then transition to playing_picks, which would call a JavaScript function that would query the Archive.org API to retrieve the URL to Archive.org’s top song pick, and would output a TwiML document on the HTTP response object which would contain the URL to that song.

If the user pressed a “2″ instead of a “1″, then the cond attribute would cause the statechart to activate the transition to state searching instead of playing_pick. If the user pressed anything else, or attempted to navigate to any other URL, then the wildcard “*” event on the third transition would simply cause the statechart to loop back to root_menu.

The rest of the application is implemented in a similar fashion.

Comments and Critiques

While overall, I feel this effort was successful, and demonstrates a technique that could be used to develop larger and more complex applications, there are ways I would like to improve it.

First, while I feel that being able to inline the response as custom action code in the entry action of a state is a rather elegant approach, it would be useful to make the inline XML templated so that it can use data from the system’s datamodel.

Second, there’s a disconnect between the action specified in the returned document (the url to which the document will be submitted), and the transitions originating from the state corresponding to that document. For example, it would be possible to return a document with a form with action attribute “foo”, and have a transition originating from that state with event /bar. This may not be a desirable behaviour, as there’s not legal way for the returned web page to submit to URL “/bar”. The action attribute on the returned form can be understood as specifying the SCXML events that that page will be able to generate, or the possible flow between pages within the web application, and so it might be better somehow model the connection between returned form actions and transition events more explicitly.

Third, there are several features of SCXML that this demo did not make use of, including state hierarchy, parallel and history states. Uses for these features may emerge in the development of a more complex web application.

Fourth, there is currently quite a lot of logic in the action code called by the SCXML document. This includes multiple levels of asynchronous callbacks. This is not an ideal approach, as it means that even after an SCXML macrostep has ended, a callback within the action code executed within that macrostep may be asynchronously called by the environment. I feel this breaks SCION’s interpretation of Statecharts semantics, and may lead to unexpected behaviour. A better approach would be to feed the result of each asynchronous callback back into the state machine as an event, and use a sequence intermediate states to model the flow between callbacks.

Fifth, and finally, I had a few technical difficulties with SCION, in that node.js’s require function was not working correctly in embedded action code. I worked around this by passing the required API’s around as a single object between functions in action code. I fixed this issue in SCION today.

Conclusion

The finished application can be demoed by calling (315) 254-2188. I’m going to leave it up until the account runs out of money, so feel free to try it.

I had a great time at the Hackathon, and I feel my participation was productive on multiple levels. I’m looking forward to further researching how SCXML and SCION can be applied to web application development.