Why voice is the next big internet wave

At first glance, few technologies feel as unsexy as voice. From a user’s perspective, little has changed since the days of Alexander Graham Bell. Most see voice as a mature technology that simply connects people in real-time across a distance. But voice is experiencing a wave of innovation that will fundamentally alter this definition.

During Mobile World Congress, Jae-woan Byun, the CTO of SK Telecom, condemned current voice offerings as “boring for users” but promised a “second tsunami” that could change everything.

Advertisement

The first tsunami was about messaging. It swept away SMS volumes and revenues and resulted in the kind of valuation that Facebook placed on WhatsApp. Thanks to the elimination of the historical limitations that telephony placed on voice, we are already sensing the shockwaves of the next tectonic shift.

Voice will be:

Available every “wear.” Voice is fast becoming a primary interface for wearable technology. Voice will soon become ambient, with audio sensors embedded into our environment: cars, living and workspace, and fashion accessories. Conversations will follow us from home to car to office — jumping automatically from device to device.

Private and secure. Encryption of voice will become the default, not the exception. Layered security models will include voice biometrics as a standard component. And for our most private conversations and transactions, speech will continuously authenticate us – not simply at the outset of a conversation.

Smartphone-native. Today, the dialer application on a smartphone replicates 1970s touchtone telephony. The ability to tap, swipe, wave, drag, point, rotate, shake and talk means that powerful new features will be simple and easy to use, in the same way that the iPod made mobile music easy.

Imagine rotating your phone to landscape orientation to turn a 1:1 call into a conference call. Apps will allow easy customization of the voice experience. Your CRM app will handle calls from clients; another will intercept calls when you are roaming and it’s 3 AM; and another will manage calls from the “burner” number you put in an ad to sell your car. Powerful new services will be so easy and intuitive that we won’t even notice a learning curve.

Application-embedded features. Beyond caller ID, inbound voice calls carry little context today. Increasingly, voice calls are originated within apps and web pages and are thus full of useful metadata. Moving forward, voice calls will come complete with context, such as where the user is stuck in a business process, allowing organizations to build and continuously refine a fit-for-purpose voice experience.

Beyond the “call.” Sadly, we are still replicating the patterns and limitations of 1876 telephony with the idea of a call today. We either schedule calls with fixed timing, length and attendees or blindly interrupt people. Future voice communication will mirror the more fluid activity streams on Facebook(s fb), Yammer or Google(s goog) Hangouts. We will invite others into a call as needed, allowing them to jump in and out of conversations seamlessly. Outside calls or cold calls will come with a “conversation request,” where the caller pitches the receiver on why he or she should answer and invest their time.

Augmented memory & total recall. Voice is about to become recordable by default, and in many contexts and corporations, it already has been for decades. We are moving beyond simple record keeping to active knowledge management via voice. Similar to how we search our email for past conversations and threads, we will be able to do that with our voice conversations too. Essentially, we will be able to jump to the 15 seconds that mattered in that last call and have perfect recall of all our conversations.

Your intelligent voice assistant. Basic AI technology has offered voice command control for over a decade, and Siri and Google Hotwording have taken that experience to a new level. As intelligent assistants continue to improve and adapt, we can see a future where they join us during the call. They will interpret questions and offer answers, content and ideas in both spoken and visual form. This will help us perform various administrative tasks, like scheduling a meeting, querying past correspondence or adding a task to your to-do list.

Accessible to all. The next generation of voice services will not only have high-definition audio, but also customized acoustic profiles to us individually and our environment. We don’t all speak the same languages or dialects, so automated real-time subtitles and translation will become commonplace. One in five people have significant hearing loss, and end-to-end digital cloud-centric hearing aids will remove the “analog gap” for hearing-impaired users.

Voice intersects with a long list of hot topics: the internet of things, search, location services, wearables, security, connected car, big data, quantified self and beyond. As analyst Benedict Evans of Andreessen Horowitz recently tweeted: “It’s kind of ironic that voice is one of the next big things in mobile.”

I would say Evans is partially correct. It’s not just mobile. Voice promises to be the next big thing in communications, period.

I think what we call voice here is really machine intelligence, which is needed to parse meaning and intention from natural speech. The key to being able to do this is context. Context has always been the achilles heel of A.I. researchers, but with sensor-packed smartphones in every pocket connected to an internet containing all the worlds knowledge, context is finally here. This will allow the kind of smart voice-driven applications discussed in this article.

Large emerging markets have partial or no literacy in some parts. While governments may be working to introduce literacy in some of these markets, it is undoubtedly a long shot – decades in some cases, to bring in full literacy. These are also the large growth markets for smart phones and undoubtedly these will be the next big users of the Internet. This is another big reason why voice will become the predominant interface: the most natural communications interface.

Of course, gesture and swipe could connect user to the device / screen but a more complete communications interface is voice. The unlimited potential of voice riding its next wave is now clear. Nice piece!

Absolutely, and it’s so easy to think of voice from a Westernised & wealthy perspective. Not everyone is literate, able-bodied, etc. The potential to apply voice interfaces to democratise computing and communication technology is under-explored. Hence the prediction we will see a profusion of new uses and ideas.

Pervasive personal recording and assistance sound potentially attractive and useful. One problem is privacy: would you want Google to provide such a service, if it came with the privacy compromises embedded in other Google services? And what about governmental access? Should we establish something like 5th Amendment protection for your personal data? Obviously in extremis, such recording becomes indistinguishable from “self”…

This is a real and legitimate concern, especially as spoken communication has been assumed to be ephemeral, whereas written is (by its nature) persistent. My experience in the context of work & conference calls is this: once you have tried Hypervoice technology, and had your words and gestures all recorded, everything that went before feels “broken”. It’s like dealing with an amnesiac.

That said, I think the industry has a lot of work to do on the transparency and responsibility around call recording and data retention. As with all technologies, this one has its uses and abuses.

Two things that will make voice even more useful as an interface:
1. Real time translations from any to any language as you have already alluded to
2. Silent speech interface (http://en.wikipedia.org/wiki/Silent_speech_interface) which you can consider post-voice. This is very important as I dont want anyone around eavesdropping or even knowing what commands I am giving to my gadgets. One of the challenges would be that the right gadget should know that a particular command is for it rather than someone else.

Hi Zahid:
I am very interested in what you wrote and have envisioned. I was ex-RoLM, the Voice CBX Processing company from the 1980s. Please keep me posted on what venture we can generate out of this Silent Speech Interface. I would be interested in exploring a new venture investment project.
Sincerely,
Henry H. Wong
GARAGE TECHNOLOGY VENTURES LLC
2040 Martin Ave, Santa Clara, CA 95050 (The New Innovation Center)
SKYPE, KaKaoTalk, WhatsApp, WeChat & ViBER Call: HenryWong94301
==================================henry@garage.com
==================================

You take the voice boom as a given when there is nothing pointing at that besides wishful thinking. Too many are pushing voice just because they have no better ideas on how to do something. Voice for TVs, voice for glasses ,when it’s not what is needed.
Voice is intrusive, is less private ,it’s a bigger effort than the alternative sometimes.
IM is popular because voice has it’s problems.
Ever owned a dog? We are so lazy that often we whistle or use gestures instead of voice, we really don’t want to talk all the damn time. For devices to talk to the user,that’s , in many cases, ok.
Ofc voice is useful in some cases but a boom is not supported by anything.

good point…good article and not trying to begrudge the writer or any others..but “voice” is nothing new…it seems to have always been the case anyways..one case in point is the “new” feature for FB from whatsapp ..the ability to make calls..not sure I really want or need to call friends on FB, but I suppose its a nice option to have….hope that all just made sense