With over 15 years of experience in the use of speech recognition technology, it seems that I’ve seen it all when it comes to the application of speech recognition technology, the good and the not so good. I created this space as a forum to share my own experiences, those of my clients and for others to join in and share own. Feel free to add your own thoughts, questions, rants and raves! Hopefully it will be cathartic and useful for us all.

Wednesday, August 17, 2011

Observations from SpeechTek 2011 - New York

I'm just back from SpeechTek, the major industry conference in the Speech Recognition / Text-To-Speech / Voice Biometrics industry. I spent three great days attending sessions, catching up with friends in the industry and see the latest offerings from vendors in the space. I've been attending this conference for better than a dozen years now and it's interesting to see how the industry has evolved and matured during that time. As I flew home to Seattle, I jotted down a few of my thoughts and observations. Three Themes seemed to run through the conference:

and each of themes converged to produce a trend I'd call Adaptive Personalization.

Cloud Computing
I've said it before and it's worth repeating, the clouds are gathering! By that, I mean that the speech recognition industry (and it's related applications) are running full speed towards the trend in cloud computing. In fact, I think it may be the vanguard of that advance. So many major customer self-service applications today run in the cloud on platforms like Microsoft's TellMe, Nuance's BeVocal, Voxeo, Angel.com or others that it would be impossible to argue it's not a full-fledged trend. Millions of automated self-service calls (both inbound and outbound) pass through each of these today. Supporting this growth of Cloud Computing related to speech recognition is the parallel movement of applications and data to the cloud that's being driven by the advent of Apple's iPad (and other tablet computers) along with the ever growing use of Smart Phones. Both of these items share a common trait, much of their application smarts or functionality come from cloud based services and data using a model in which the device is primarily a presentation layer in an application and the functional work and data storage are largely handled in a cloud based platform or platforms. Many of these applications are even mashups which aggregate data and services from multiple cloud based applications. A whole new generation of speech applications are cloud based, using the cloud for application functionality, speech recognition, voice biometrics and data aggregation from multiple sources. This approach allows for incredibly rich applications with access to large data sets far beyond the limited processing power and storage capabilities of the typical individual smart phone.

Analytics, Analytics & Analytics
If there was a single buzzword that prevailed at SpeechTek it was Analytics. The use of the term was so prevalent and so overloaded that it almost lost all meaning (the true sign of a buzzword). Every presentation, every piece of product literature, every vendor booth in the exhibit hall had some reference to analytics. Despite the overuse of the term, it was clear that it represents a major trend in the industry and I believe on that offers the potential of significant benefit to the end users of these systems. Perhaps we can look to the web and the evolution of e-commerce for some clues for what lies ahead in the speech industry. Analytics has found wide use on the Internet as a tool to understand user behavior, customer needs and help companies provide more carefully filtered and tailored information to users.

In reality, I think we saw three distinct applications of analytics (Analytics is defined as the science of analysis. A simple and practical definition, however, would be the application of computer technology, operational research, and statistics to solve problems): (1) Using analytics as a discovery tool in customer service operations to help identify hot spots or problems (such as issues in self-service speech applications or e-commerce web sites), (2) Using analytics (and computerized semantic processing) to process data from a variety of channels (Twitter, Facebook, email, blogs, etc., speech based self-service applications) to identify trends and customer issues and (3) Using analytics and data from all customer interface modalities (web, smartphone, IVR, call center agents, SMS messages, Twitter, etc.) to model and infer meaning & intent for individual customers. I believe that this third use is the most significant and potentially most game changing of the three.

Smartphone (and multi-modal applications)
With the rapid growth of smart phones, iPads and similar data/voice enabled portable devices, we're seeing a new generation of applications emerge. The availability of voice, Internet and background access to large amounts of data (especially real time data) a new generation of mobile applications that are truly multi-modal, that is they are capable of accepting typed and spoken inputs and delivering visual and audible outputs. This gives users a choice in their preferred communications channel and opens up these devices to more effective and efficient means of delivering complex data, such as lists which don't lend themselves to audio output. A good example of this type of mixed mode application is Nuance's "Dragon Go!" which is available on the iPhone or iPad. With this application, you can speak a simple query phrase. The application captures your utterance, ships it off to be processed in "the cloud" using natural language understanding and returns search results form multiple data sources in visual form. You can get more information about the application from Nuance's web site or Apple's App Store.

Adaptive Personalization
The convergence of these three: Cloud Computing, Analytics and Multi-modal applications offers us the most compelling theme of all. By having access to large amounts of data and computing power in the cloud, combined with the "intelligence" that can be gleaned from analytic (which can process information about the user from a variety of sources and channels) with the powerful presentation and input possibilities of multi-modal applications, we can make a leap forward to a "brave" new world" where applications understand the context of our actions across multiple channels and products and present us with information, help or services tailored to exactly what we need and exactly when we need it. I'm calling this trend Adaptive Personalization. This kind of personalization goes far beyond the kind of customizing we see in things like a search query using your location data to constrain the choices presented.

An example of this kind of adaptive personalization might be for the customer of a financial services company or bank who is applying for a loan on the institutions web site, when they encounter a question or issue not address in the online application process. Imagine that they might grab their cell phone and call the institution's customer service number for assistance.

When they reach the customer service number the applications identifies the caller from their cell phone ANI information and then rather than presenting them with a natural language question or deep menu of choices, through the use of analytics the application can see their most recent activity was on the loan application process on the web and offer them the option of being transferred directly to a loan specialist to assist. One early example of a product that supports this in a product is Genesys's Conversation Manger.

I don't think it will be too many years before this will be common place in advanced customer service environments. When melded with information about customer channel preferences and proactive notification it will completely turn the customer experience inside out, and in a good way.

That's my two cents worth, let me know that you think or feel free to add you own ideas and observations in the comments. If you'd like to see my tweets from the conference (and those of the other attendees) search using the tag #SpeechTek.

4 comments:

I hate technology that does not feel quite safe but it's almost too good to pass up. the Cloud is that kind of technology for me. It does not feel quite safe from a security standpoint, but when it comes to your computer crashing and losing information, it is too good to pass up.

Hello - could you tell me if any of the vendors at the conference were offering a voice biometrics solution that could be utilized with a Cloud based Telephony Platform similar to Twilio? There seems to be a huge technical hurdle with cloud based telephony services that need to interact with other services (i.e. Voice Authentication/Biometrics). We are looking to replace our traditional IVR platform and move it to the Twilio, but our only hurdle now is integrating voice authentication during the inbound Twilio call.