The past months have witnessed breakthrough announcements from Microsoft, IBM, and Google, all hitting new marks in speech recognition accuracy. The claim is that we have reached the 5.1% word error rate of humans. I don’t know about you, but the last time I spoke to a machine, I didn’t get the feeling the recognition was that good. So, let’s review the next generation of speech technologies, how they are enabling new analytics and the growing role of these insights for businesses.

A Bit of History

In early 2000, speech recognition reached 80% accuracy. In the enterprise space, it triggered adoption for Interactive Voice Response (IVR). The promise was then to remove the intricacy of long menus for describing a customer service problem or to say the person or the department you were trying to reach. Speech applications were very dependent on vocabularies and languages though. They required a sophisticated set-up by highly specialized system integrators and each major language had its speech recognition startup. Mid 2005, Nuance started its buying spree, snapping up 15 companies and consolidating the space.

Side to speech usage for IVR, a second use case emerged for Quality Management (QM). Customer Service organizations use quality management applications to listen to call center calls and rate them. The process was tedious, limited to a small sample of calls, and often like looking for a needle in a haystack. With speech recognition, it became possible to automate parts of the process. Workforce Optimization leaders NICE and Verint developed or bought their ways into speech analytics, followed by Contact Center Infrastructure players such as Avaya or Genesys.

These developments remained limited. IVR speech enablement has failed to transform the customer experience and voice self-service experiences continue to be rated poorly. Speech for Quality Management is often confined to compliance or script adherence verifications. At the beginning of this decade, it seemed the speech technology had stalled.

The Machine Learning Transformation

While speech for customer service was developing, Amazon, Apple, Google, IBM, and Microsoft continued investing in research & development of speech technologies, driven by the vision it would eventually become critical for user interaction with machines. It broke into the market with the introduction of Siri. It accelerated with Machine Learning that transformed speech recognition. Artificial Intelligence (AI) removed many intricacies and the need to re-engineer the stack for new languages or new vocabulary sets. Today, most digital disrupters, most notably China “Big Three,” Alibaba, Baidu, and Tencent are building their speech stack. Because generic machine learning engines can be used, barriers to entry have been lowered dramatically. Also, open source options are available, such as ​​CMUSphinx, ​​HTK, ​​Julius, ​​Kaldi, and ​​Simon.

New Entrants are Looking to Disrupt the Customer Service Space

The AI breakthrough has paved the way for new entrants. Companies such as iFLYTEK or Speechmatics are aggressively addressing the issues of usability, accuracy, and deployability, in particular beyond the dominant languages.

For customer service, the battle is now shifting to the other half of the equation, Natural Language Processing (NLP) and Natural Language Understanding (NLU). Yactraq is applying its patented technology to democratize audio mining. It is finding that, by democratizing the technology for businesses, it can enable businesses to innovate besides compliance and adherence, helping discover best practices in customer interaction. Verbio is originally a spinoff from the Polytechnic University of Catalonia and today, still half of its engineers have PhDs. Deep Learning has been powering its speech stack since 2013. It nows sees NLU as the next frontier, in particular for being able to understand more than one command, from different simultaneous speakers, and prioritize them. After targeting automation and assistance for call centers, it is expanding in other industries and use cases. This industry focus is critical to its solutions. Omilia is another intriguing story. It was formed as an IVR system integrator. In 2007, it started developing its own speech technology with the vision to leapfrog IVR-directed dialogs and offer natural conversations. It was able to leverage Deep Learning to assemble its stack. Its technology bolstered an impressive 59% reduction in IVR abandonment and a double-digit increase in self-service completion at Royal Bank of Canada.

Sales Communication Leading the Way?

With the rise of inside selling, sales communication has become another very active and innovative space. A growing number of sales interactions are taking place over the phone and sales executives have become concerned about becoming blind to these conversations.

Chorus.ai is a pioneer of the sales intelligence space. Conversations need to be considered as a business asset. Cultivating a solution approach, it has assembled a vertically integrated stack providing a broad range of indicators for assessing the effectiveness of conversations and correlate sales process elements to actual outcomes. It uses homegrown speech recognition, tuned and modeled for sales conversations. The company is baking its know-how in a three-step onboarding process, starting with recording all conversations to uncover insights in a matter of days. These insights are used to create dashboards tracking performance drivers. They can eventually be monitored in realtime and used to drive changes in the front lines.

Gong.io was born from one of its founder experience building fast growing sales teams. He got frustrated with superficial tools measuring performance that felt like trying to diagnose a disease by measuring the pulse when an x-ray is needed. He reviewed existing tools and found them complex and ill-suited to B2B sales, that have long, unscripted conversations that can involve more than two participants. Founded in August 2015, the company assembled its first solution in a record time. Speech was actually the easy part. It is focusing on conversation Intelligence by uncovering topics and recognizing performance patterns.

The sales technology space is fascinating. It is recent but incredibly dynamic: in only a few years, it evolved beyond coaching and performance management to provide broader prospect and customer insights. It is poised to become a key element of Voice of the Customer (VoC) programs.

Artificial Intelligence is transforming the speech industry. Within a given domain or use case, the accuracy of voice recognition is now very good. Usability has still to be perfected and the next frontier is natural language processing and understanding. But already, the technology has become mature enough to stimulate the creation of new applications and markets. I expect more startups and innovations to come!