SPIN Project Blog

Diary of a diploma thesis at the Institute for Informatics at the University of Munich, Germany.

Wednesday, October 04, 2006

First successful test run of the prototype implementation

Today, I went on a walk through Petershausen. But today it was the first time I did this walk together with the prototype implementation of a "tourist guide", which is location-aware and has a speech interface. The whole thing runs on my HTC Magician aka T-Mobile MDA compact and takes approx. 15MB for the whole package, i.e. including positioning and speech interaction.

I walked up to the station and the guide told me "Listen User, the station is now visible." Ok, that might have been an obvious thing to say. But when I asked "Guide, tell me something about the station. ", the guide responded: "The Petershausen train station connects Petershausen to Munich and Pfaffenhofen an der Ilm. It is well communicated by suburban trains, regional and national trains including the Intercity Express." For a first run quite impressive.

Sunday, August 27, 2006

Web2.0 Workshop at the CDTM

Are you interested in topics surrounding the latest web buzzword "web2.0"? Then the upcoming workshop on social online services and entrepreneuship might be of interest to you, too. The Center for Digital Technology and Management (founded by Technische Universität München and Ludwig-Maximilians-Universität München) has invited Germany's leading entrepreneurs in the web2.0 community to discuss opportunities and threats in this field. Speakers include

Thursday, August 24, 2006

SPIN's now pro-active

Pro-activity is the word of the day. SPIN is now capable of contacting the user in a server-push like way. We use AJAX for the asynchronous communication and XHTML+Voice as the host language. In one of my last posts I already explained how to use AJAX for VoiceXML. Our latest demo shows how a speech-based service can "get back to the user" in a pro-active sense, i.e. not directly related to user action. In this demo the user starts the dialogue with the system - but keep in mind that this is not a must. The AJAX-based pro-activity feature allows the system to contact the user at any given point in time.

Wednesday, August 23, 2006

SPIN now runs on a PDA

We're really speeding up now. I just finished work on a spike that was ment to show the feasibility of letting a simple voice dialogue run on my (already old) HTC Magician aka HTC PDA compact aka T-Mobile MDA compact. The successful demo can be found here. This spike uses the following infrastructure:

HTC Magician

ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003 (can be found at IBM and seems to be the only free voice browser for PocketPCs as of August 2006. Actually it took me some time to find this great piece of software...)

XHTML+Voice dialogues

Apache Tomcat web server

JSP files for handling requests

LMU's TraX client for positioning and position updates

LMU's TraX server for position services

So by now, I have reached most of SPIN's technological requirements, namely

Spoken dialogue on a mobile terminal

Combination with location-based services

Dynamic generation of dialogue steps by the server (on request by the client)

Monday, August 21, 2006

Using AJAX for simulating proactive spoken dialogue

As I have written in my last post I am currently experimenting with AJAX for VoiceXML. Today, I finished my first spike on that topic and I think it is worth sharing.

My idea was to use AJAX to asynchronously 'push' updated data from the server to the client in order to prompt the user with it, i.e. doing pretty much the same thing a web developer would do for the visual web when 'pushing' data from the server to the web browser. AJAX has all the nice features in place to realize the client-server communication and since XHTML+Voice lets us call VoiceXML prompts from within JavaScript code we can integrate AJAX with VoiceXML in a straight forward way:

Step One: Remembering the Last Server Response

In the head of our XHTML document we will define the following JavaScript code. Our global variable ajax_response will be used to contain the most current response from the remote server. old_ajax_response will save a previous copy of ajax_response that will be used to see if the content delivered by the server has changed. Some words on this later.

var ajax_response = '';var old_ajax_response = '';

Next, we will provide a simple getter for ajax_response:

function getPrompt(){return ajax_response;}

Step Two: The VoiceXML Output Form

Since we are interested in presenting the updated server data to the user, we will use a simple VoiceXML form containing a simple prompt:

<form xmlns="http://www.w3.org/2001/vxml" id="myPrompt"><block>I have just received an update for you<value expr="getPrompt();"></value></block></form>

The form simply asks the getPrompt() method for the latest data that has been received from the server and uses it to prompt the user.Step Three: Simulating Server Push

Next, we need a method for making a request to our remote JSP script that is generating the server-side content.

The last function we need is the myHandler callback function that has been used in the above AJAX call. This simple method has the real magic in it. When the data that arrives from the server is new to the client, it uses the DOMActivate event to activate the myPrompt form.

Now we have almost everything we need: We have the JavaScript in place that asks the server for an update, we know how to include this data in a VoiceXML form and now all that's left is telling the browser to poll the server for an update using an interval of our choice. We need to do this, because we are just simulating server push. Our client still needs to poll the server in order to receive updated data. This is done by adding the following line of JavaScript code to the above JS code, which is going to call the makeRequest method every ten seconds:

var theTimer = setInterval("makeRequest()", 10000);

Conclusion

The presented example shows how AJAX can be used to simulate a server push voice dialogue to the user. It is a proof-of-concept, nothing more. If you are going to use this for your applications you should make sure to only send as much data to the client as necessary. The example always loads a complete data fragment and compares it to the last one received. This is not network efficient, though. Incremental updates might be a better idea for your application.

Saturday, August 19, 2006

AJAX for VoiceXML

I was thinking about how to enable AJAX for VoiceXML. This could help make voice interaction across mobile networks more efficient, as only the really necessary part of a dialogue would have to be transmitted. First, I tried to make AJAX work with my favourite VoiceXML Voice Browser OptimTalk. Unfortunately, this was not possible, because OptimTalk does not offer an XMLHttpRequest object and I didn't find a way to simulate this object with pure ECMAScript, i.e. without an external JRE.

My second try led me to Opera. Opera offers support for XHTML + Voice, which includes most parts of VoiceXML. Unfortunately it removes some of the functionality that VoiceXML offers (e.g. the GOTO or EXIT elements). I perfectly understand that these elements pose a syntactic redundancy as one can make use of their native XHTML counterparts, but removing support for some VoiceXML elements keeps people like me from integrating existing VoiceXML dialogues directly into their multimodal applications.

During the next days I will try to experiment a little with the AJAX capabilities of Opera in combination with XHTML+Voice.