Archive for the ‘General’ Category

After fifteen years in speech, language and Internet technology, I’m about to make a big career shift. Three weeks from now, on February 22, I will join the ranks of the TeleAtlas/TomTom engineering force in the beautiful city of Ghent, on a permanent, full-time basis. My task: contribute to various process automation and improvement initiatives.

The reasons for this change are manifold, as they always are.

Firstly, I was craving to work (again) in an environment that blends innovation culture with a clear international dimension. Over the last five and a half years, I have had the chance to work on a number of international opportunities, ranging from the Beavis and Butthead Hotline to a project for a speaker verification company in Ireland. But in between, and all too often, I had the feeling to be missing out on much of the professional fun. This situation could not last forever.

Secondly, every now and then it’s good to enter a domain (geographical data management and applications) that is both new enough to be intellectually stimulating, and familiar enough to be digestible in a reasonably short time-frame. I’m very much looking forward to applying software development automation, process improvement and/or machine learning techniques in this new setting.

Thirdly, it will be great to have real colleagues again. However hospitable a customer’s working environment may be, a freelancer fundamentally stays an outsider; however amicable relations with partner companies may be, there generally is no Big Plan or Strategy guiding your actions in a direction that goes further than your next assignment.

For years I have been practising various aspects of agile software development, commonly used in speech technology projects. Short iterations, test-driven development, self-organising teams: been there, done that. But explaining how and why to other people wasn’t always easy. So after a brief lookout, I decided to further consolidate my agile knowledge by taking the Certified ScrumMaster (CSM) course and exam. Which I did last week and today, respectively.

The 61 multiple choice questions took about 35 minutes to answer. Thanks to an acceptable score, I was able to add another acronym to my CV. So if you’re looking for a CSM with MBA, MA and MSc degrees who is certified in VXML and knows how to program a.o. in PHP, do send me an SMS.

The obvious answer is: it depends. The most important payback factors are timing, personal drive & involvement, and immediate on-the-job applicability. My activities today are largely based on the business plan I wrote as my final MBA project at Vlerick Leuven Ghent Management School. So the decision back in 2002 to pursue an MBA has certainly steered my life in its current direction. But that’s true for any decision, of course. Time is unidirectional, without backtracking, so what-if questions are largely irrelevant.

This being said, the philosophical answer is: if you feel you need an MBA, find a program that suits you, get qualified, and then go for it. Don’t let the future happen, make it happen.

Or, as William of Orange said: One need not hope in order to undertake; nor succeed in order to persevere.

According to a survey conducted by Unisys in seven European countries, 89% of the Belgian interviewees find “[it is] acceptable for a trusted organization such as [a] bank, credit card provider, health care provider or governmental organization to use biometrics such as [their] voice or fingerprints to verify [their] identity”. The Belgian respondents’ primary reasons for this overwhelmingly positive attitude towards biometrics are more convenience (73%), better security (42%), higher speed of the identity verification process (45%) and privacy protection (15%).

Ranked by acceptance, Belgium is followed by France (86%), Sweden (81%), The Netherlands (73%), Germany (68%) and Italy (61%).

Two years ago, Dexia Bank of Belgium launched a speaker verification system for its private banking customers. My role in this project included general speech technology assistance to the project leader, expectation management, VUI development, technical integration of the speaker verfication engine with the application, and usability testing.

Barely waiting for Microsoft and Tellme to return from their honeymoon, Google Labs recently launched Google Voice Local Search, an experimental 411 (directory assistance) service. For the moment, 1-800-GOOG-411 just offers US local business listings, directly accessible from any US phone. But with a Grandstream SIP phone, an Asterisk PBX and a gateway like FreeWorldDialup, this minor nuisance is quickly bypassed.

So instead of speculating if, when and how Google will integrate the new service in its pay-per-click or pay-per-call advertising model, I just called 1-800-GOOG-411 for a quick try-out. Jingle Networks‘ 1-800-FREE-411 service was chosen as Google’s sparring partner.

To make the test a bit more fun and real for myself, I decided to only search for US businesses that I have actually visited at some point in time. This way I not-so-randomly picked David’s World Famous catering service in Burlington, MA; the MIT COOP bookstore in Cambridge, MA; and the Starbucks on El Camino Real in Palo Alto, CA.

First some food from David’s World Famous. My call to 1-800-GOOG-411 was answered by a neutral-sounding male voice saying “calls recorded for quality”. Notice the absence of any verb? After two seconds I got a pre-recorded prompt “GOOG-411 experimental. What city and state?” My answer “Burlington, Massachusetts” was well recognized and explicitly confirmed by the system. To the next question “what business name or category?” I said “David’s World Famous”. There was a short database lookup and after 21 seconds into the call, I got presented with the top-2 results. I chose the first one and could have been connected directly to the catering service after 41 seconds, if I had wanted to. Instead I asked for more address details, which another male TTS voice read aloud twice, presumably to give me a chance to jot it down. The phone number was read correctly, in a conversational, natural way. After this self-chosen digression, I was connected to the David’s World Famous answering machine – not a surprise, really, as the local time in Massachusetts at that moment was well after midnight.

I then tried the same procedure through 1-800-FREE-411, at least that’s what I had in mind. “Welcome to 1-800-FREE-411! Press 9 now to get the last number you requested”, said a female pre-recorded voice. I wasn’t interested in that, so I kept silent. After 12 seconds, a first commercial offered me to take part in Stonebridge Life’s $25,000 give-away. Er, maybe some other time. Thirty-one seconds into the call, I got a “What city and state, please?” prompt, and said “Burlington, Massachusetts”. There was no explicit confirmation; instead the system immediately continued with “Are you looking for a business, government or residential listing?” “A business listing”, I said. Again no confirmation, but another prompt “Would you like to search by name or by category?” “By name”, I answered. “OK, what listing?” “David’s World Famous”, I said. Now things became funny. The call was sponsored by “Girls Gone Wild”, who offered me two videos for free, meaning I just had to pay shipping and handling costs. Yeah, right. Not that I dislike oriental food, but hot ‘n’ spicy DVDs were not exactly what I had asked for. Anyway, back to the call. A flat female voice brought me down to earth with the message “the number you requested is seven eight one – two two nine - eight seven eight six”. You would think any decent VUI designer knows by now that US phone numbers don’t get read this way, but apparently not so at 1-800-FREE-411. What’s worse, after I’d heard the requested phone number, I was presented with two options: hear it again, or get connected to … Girls Gone Wild. While I was waiting for the obvious third option that would connect me to David’s World Famous, the system again threw the flat-spoken number at me, and prompted me for yet another repeat. Just when I thought I was finally going to be connected, the system thanked me for calling, made some more publicity about their own website “to learn about other special offers” and then hung up. Two minutes and five seconds had gone by, and I was still left with an empty stomach.

After the stomach, time for the brain. I called 1-800-GOOG-411 again, now searching for the MIT Coop bookstore. The speech recognition of “Cambridge, Massachusetts” went smoothly, as expected. Alas, the business name turned out to be more problematic, with its two abbreviations. “MIT” stands for Masschusetts Institute of Technology, and is customarily pronounced one letter at a time: M-I-T. The word “Coop”, although an abbreviation for “cooperative“, is pronounced as an acronym over there, rhyming with “loop” or “soup”. Being a foreigner, I pretended not to know this and said “M-I-T Co-op” at first. Successive attempts to recognize this same pronunciation generated a “no match” leading to a “try again” prompt, and a low-confidence false match with an attached explicit confirmation prompt. The system then presented me with some indirect matches from its database, all of which were irrelevant. After the fourth list item, the Google voice suggested to start all over again, so that’s what I did and said. I now pronounced MIT as an acronym, sounding like the German preposition “mit”, and stuck to “Co-op” for the second part. Apparently I guessed right, because the system literally confirmed my incorrect pronunciations and offered me a short list of three MIT Coop locations. I chose the second one, and after one minute and fifty-five seconds, I was connected to the answering machine of the MIT Coop on Kendall Square in Cambridge, Massachusetts.

My first search for the MIT Coop at 1-800-FREE-411 failed immediately with the message “We’re sorry but no live operators are available at this time. Please try again later”. For an automated system, that’s an illogical answer, especially since 1-800-FREE-411 explains in its own FAQ that they are ”no longer supporting live operator services from certain localities”. Subsequent calls [1,2,3,4] did go through, but they all suffered from no matches and false matches, irrespective of my pronunciation of “MIT Coop”. I couldn’t verify if “MIT Coop” was in-grammar or out-of-grammar, but the corresponding web search did return one entry. On the positive side, 1-800-FREE-411 transfers callers to an operator after two failed recognition attempts.

My last search for Starbucks Coffee on El Camino Real in Palo Alto, California went without a glitch at both 1-800-GOOG-411 and 1-800-FREE-411. With Google, I was transferred after 45 seconds; with the other system I got to hear the complete number after one minute and fifty seconds. This time the irrelevant ads were from InCharge Debt Solutions and American Express, respectively.

Before we draw some conclusions, first a warning: no speech recognition system should ever be evaluated on the basis of a few calls and utterances made by a single speaker over a single channel. To do so would not only be unfair, but also unscientific and possibly completely wrong. This being said, my first impression is thar Google’s potential entry in the automated DA space should be a major concern for all other players on the US 411 market. As could be expected, the 1-800-GOOG-411 voice user interface is clean and snappy, with various error recovery mechanisms already in place; speech recognition looks good; and the direct transfer to the requested number is an obvious functionality that’s blatantly missing with 1-800-FREE-411. So looking from the technology side, Google seems to know what they’re doing – hardly a surprise.

A bigger challenge for Google or any competitor will be to balance the economic aspects of sponsored local audio ads (remember the DMarc acquisition) with the human interaction limitations of a spoken phone interface. A caller’s tolerance for inserted ads is inversely proportional to the degree of certainty with which the business or category name is entered. If I ask for Starbucks, I want Starbucks’ phone number; but if I just want coffee, multiple relevant results are expected, including sponsored transfers and special offers. With its army of natural language processing specialists, the richness and vastness of its data, and its very deep pockets, Google is well placed to shake the US Directory Assistance industry, if it wants to. Unless it has other priorities, with even bigger returns.

Is it that spring is in the air? Merely two weeks after Nuance Communicationsannounced the acquisition of hosted speech application provider BeVocal for approximately $140 million, Microsoft yesterday announced it is to acquire Tellme Networks for an undisclosed sum estimated as high as $800 million to $1 billion. According to the acquiring companies’ respective press releases, the ability to offer speech-enabled solutions to mobile carriers and their billions of customers comes forward as one of the prime drivers behind both deals.

One of the killer applications for people on the go could well be mobile search. Which immediately brings us to Google. Last year the quintessential search company acquired DMarc Broadcasting for $102 million in cash, plus possible earn-outs totalling up to $1.13 billion. Too expensive a toy to simply put aside, as Donna Bogatin already pointed out. One explanation of why the integration takes so long is this: what if Google wasn’t only interested in playing spoken ads to the good old radio audience? Indeed, they could just as well offer those same tunes – if they’re short and catchy enough – to click-to-call customers over VoIP, or better … to mobile phone users. But how can Google physically interact with these users?

Well, maybe Voxeo can lend them a hand. For European customers, a Google-Voxeo alliance could bring voice services to the masses via Voxeo’s partnerMAP Telecom. After Tellme’s short-lived European adventure in 2001-2002, that would be big news.

The article opens rather spectacularly with the statement by Andrew Moloney, head of international marketing at RSA Security, that innovative, entrepreneurial fraudsters are moving their criminal activities from online banking to phone banking. To counter this new form of fraud, financial institutions increasingly base their security not only on what their customers know or have, but also on what they are. Enter voice biometrics. The author mentions the two publicised cases from the Low Countries that have been covered in previous posts on this weblog: ABN AMRO in The Netherlands and Dexia in Belgium. As frequent readers of this blog know, I have had the pleasure of contributing to the latter project.

Whereas the RSA Security executive non-surprisingly stresses the security aspects, my personal contribution to the IET article focused in on the convenience benefits for customers. In a previous weblog article I explained that from a narrow technology-only view on voice biometrics, a heart-rending trade-off between security and convenience seems inevitable. The real value of the IET article is that Andrew Moloney shows a way out of this dilemma.

Mr. Moloney is quoted saying that there is probably a 10% level of [false] reject rate (my emphasis). Note that this figure means nothing if we don’t know the corresponding false accept rate (FAR). But for the sake of the argument, let’s assume the FAR is at an acceptable level (as defined by the financial institution, based on a policy decision, a given accept/reject threshold and test results). Now, to lower the 10% FRR - indeed unacceptable in large-scale roll-outs - while keeping the FAR at a fixed low level, the RSA Security executive’s strategy of framing the voice biometrics application in a broader security and convenience perspective is absolutely right. Mr. Moloney explains that by looking at a genuine caller’s past usage patterns, it becomes possible to factor in more security-related attributes in the final accept/reject decision. How can this work?

My interpretation is that at first, the pure voice biometrics threshold is lowered. As a result, FRR goes down, while FAR goes up – that’s the name of the trade-off game. But to compensate for this temporary loss of security, the call’s actual (non voice related) attributes are then compared with the expected attributes as learnt from the (assumed) genuine customer’s past usage patterns: filtering out abnormal behaviour brings the FAR down again. In the end, FRR goes down, while FAR is still stable at an acceptable level. So everyone wins.

First some lifestyle news: true alpha-geeks nowadays run Unix-based Mac OS X on their laptop. If you’re an über-alpha-geek like Damian Conway, you then use VIM – a text-based editor – on top of Mac OS X for your presentation. On the vestimentary side, form still follows function, so printed t-shirts, jeans & trainers rule, suits suck. Unless you’re the CEO of MySQL or an anti-software-patent lobbyist. Ponytails are finally on the way out. Unless you’re male.

On Tuesday, Tim O’Reilly opened the Open Source Convention with the expectedly provocative statement that open source licenses have become obsolete. The service model present in many Web 2.0 initiatives has turned software into something that is performed rather than distributed. Google, Yahoo, Salesforce.com and the like indeed don’t sell us software, they provide services. On the level playing field created by open source – and therefore by definition commoditized – software, the competitive advantage must come from network effects and operational excellence. The arrival of open APIs over the last few years has spurred development of numerous mashups that mix and repurpose various sources of open data.

Brian Suda gave a quick overview of Microformats, a set of “simple conventions for embedding semantics in HTML to enable decentralized development“. I had a “so what?” reaction at first in the sense that explicitly tagging addresses, calendar events, and other formats has been done since ancient SGML and XML times. What’s new, as I understand it, is that explicit Microformat tagging within (X)HTML or XML allows for a richer reader experience directly implemented in the web browser. The Ajax trend will only reinforce this: for example, it will become customary to highlight a tagged name and email address, and drag them directly into an address book.

Greg Stein, chairman of the Apache Software Foundation and Engineering Manager at the Google Open Source Programme Office presented Google’s involvement in open source software. Apart from organizing the Summer of Code, Google has recently launched free Project Hosting. Greg described the service as similar to tigris.org or Sourceforge.net, but simpler and easier to use. Google’s Project Hosting service is powered by Subversion, which has overtaken CVS as the version control system of choice for hackers worldwide.

In Tuesday’s keynote sessions, Steve Coast from the OpenStreetMap project completed Tim O’Reilly’s opening talk by pointing out that whereas open systems are a done deal, open data are not. He made a public appeal to the audience to go out on the streets and gather geographical data by themselves. The need for public geodata is particularly high in Europe, where taxpaying Internet entrepreneurs pay their government twice: first to have geodata collected with taxpayers’ money, and then again to buy the right of access to the data. Jo Walsh, another advocate for open geodata and present at EuroOSCON, wrote an interesting article about this issue half a year ago.

The second keynote session on Tuesday was by Adrian Holovaty from WashingtonPost.com, in his spare time lead developer of Django. He is the man behind Faces of the Fallen, ChicagoCrime.org and the US Congress Votes database, websites that make a lasting impressing on anyone who values transparency of public policy and information. Adrian Holovaty made a call to action for all hackers to practice journalism via computer programming, as he called it. More details on Adrian Holovaty’s quest for transparency can be found in this interview recorded by Robert Niles three months ago.

Actual authors (like Kathy Sierra & Bert Bates) and would-be writers (like myself) then attended Mike Hendrickson’s session on Content 2.0. Mike is a publisher at O’Reilly, which means he has a say in what gets published and what not. The bottom line of his talk is that modern publishing is like agile software development: an incremental process with multiple feedback loops. Timing is essential: don’t wait until you’ve reached perfect quality, or someone else will have served your readers. Also, (book) size does matter, but contrary to popular belief, less is more: one-chapter PDF books are okay, if that’s all you have to say! When sales of an established book flatten out, why not give the book back to the community for a live, participatory update? Who gets paid for what in such a social publishing effort is one of the interesting questions that still need to be addressed, Mike Hendrickson admitted. Yet another application for micropayments?

My first day at EuroOSCON ended in La Kasbah, an enchanting Moroccan restaurant in the trendy Dansaertstraat.

Wednesday morning started off with keynote sessions by Tom Steinberg from MySociety.org and Dale Dougherty from O’Reilly Media, the maker of Make magazine. Tom Steinberg succeeded in waking me up so I also went to “Democracy: a Hacker’s Guide”, his follow-on session. MySociety.org has launched a number of technology driven projects in the UK that help bridge the gap between citizens and their elected representatives. Whereas TheyWorkForYou.com focuses on aggregating, reorganizing and republishing public information, sites like WriteToThem.com and HearFromYourMP.com promote an intelligently filtered information flow between the electorate and their representatives, from the local up to the national level. PledgeBank.com makes it easy for altruistic volunteers to find one another, so that they can do some Good Deed to the benefit of society as a whole. In his talk, Tom Steinberg gave the hackers in the audience a number of (mostly non-technical) hints on how to launch similar services in their respective home countries. In Belgium, GovCamp may be a good start, although it does not look much like a grassroots initiative, as it is “initiated by the Belgian Federal Government” (!).

Denise Kalos, VP Corporate Solutions at O’Reilly and Andrew Kelly, Practice Manager at CollabNet then presented “The Secret Sauce of Robust Developer Communities”. As consultants to the corporate world (e.g. BEA Systems), the speakers stressed the need to find a right balance between community building on the one hand and achieving the business goals on the other hand. The very reason why corporations like BEA ask third parties to build their communities is credibility. If the community gives too little to the member developers in terms of recognition, resources, or exposure, it won’t get anything useful in return either. There’s a fine line between helping out developers with new products, and selling. Building a developer community must not be a marketing exercise, and needs to be done with care. In the end, it’s all about people & passion.

Wednesday’s keynote session speakers Jim Purbrick from Linden Lab (the makers of Second Life) and Mårten Mickos, CEO of MySQL AB could not really captivate me. I don’t blame them, maybe it was a matter of overstimulation. I did learn a new term, though: meat space as opposed to cyber space. And Mårten Mickos taught us that when a Finnish guy says “that’s not too bad”, it means exactly the same as “that’s fantastic!” uttered by an American.

After a day and a half, it was high time for some Ajax stuff. Scott Dietzen, CEO of Zimbra, showed off his 135.000-line Javascript application with the same name. I forgot everything he said, except that large-scale Ajax programming is best done using one of the following libraries that shield browser differences: Kabuki, Dojo, Yahoo! UI Library, Google Web Toolkit (GWT), or Scriptaculous. Might come in handy some day. I’m afraid I missed Simon Willison’s talk on the Yahoo! UI Library as well.

The prize for the worst talk went to the CEO of Wengo, I’m afraid. For Google’s sake, I won’t mention his name in this place. His Famous First Words were: “I hope it’s not going to be too boring”. Now that’s a captatio benevolentiae! In the rest of the talk, which was indeed rather dull, the speaker succeeded in apologizing twice more: first for not being a technician – as if that would have made a difference – and second for his insufficient mastery of the English language – although his English was in fact more than okay. Dear Wengo CEO, we don’t know each other, but if you read this, no hard feelings please.

To make up for this false note, I decided to visit the Make Fest that evening with my 9 year old daughter. She was interested in a stringless guitar and tried on some presence-detecting sensors. Most importantly, she went home with a free OpenBSD poster featuring deformed versions of Asterix and Obelix.

On Thursday, Robert “r0ml” Lefkowitz tried to shock the non English speaking part of his audience with his plea for the abolition of English as the (only) lingua franca for developers … and compilers. If his intention was to make his fellow American listeners more aware of internationalization and localization issues, he could have made a point. Some not-so-humorous Europeans went into a discussion with the speaker, only to realize that he was just joking, after all. At least that’s how I interpreted it.

Chris Heathcote, an English designer who works for Nokia in Finland, referred to numerous sources that warn us against unwanted invasion of our privacy. His main conclusion was that people who want to play along in the information society are forced to trade in a part of their privacy. Privacy becomes a luxury, because most consumers can’t afford not to take a loyalty card in their supermarket. It’s amazing what people will give up for some convenience.

The last session I attended was by Colin Brumelle on Music 2.0. In his short historic overview of new recording technologies since the 19th century, the speaker argued that each innovation had been accused of “killing music” in one way or another by the industry people then in power. Our time is no different in that respect. Although everyone in the audience agreed that the current power structures in the music business are deemed to collapse, no one, including the speaker, was able to lay out how and when. The speaker raised many questions but did not really try to answer them. In that sense the talk was a bit disappointing.

More than any specific technological innovation, the potential impact of technology on society was what really struck me in this Open Source Convention. For that reason alone, Adrian Holovaty and Tom Steinberg were my favourite speakers.