3 Reasons Voice Will Finally Come To The Web

Siri is teaching us to talk to and not just type on our devices. But will we be comfortable recording all our conversations to make voice a searchable app?

Voice is dead. Or at least the digerati think so. It takes some real digging in Silicon Valley to find the voiceheads, the true believers that voice will have its second coming as a Web application.

Today, most people think of Apple's Siri when you say voice app, but what if you could control all your apps with voice, and also search through spoken conversations and find content as easily as you do in email? At the very fringes of consumer and enterprise social interaction, this vision is already here. This emergent paradigm, known as hypervoice, promises to be a major boon for productivity. The real question is whether it will tip and become the next big shift in the Web.

It's kind of crazy that telephony and the Web are still so separate. Voice on the Web is only about the transport of voice, not voice as rich media content. Voice today is like Web 1.0 when Web content simply mimicked brochures. It's so boring, it hurts.

But what if voice was interactive like hypertext? What if we could search, share and find highlights from our conversations -- just like we do with text? Voice could go from a fringe player to a radical new social object with the potential to alter the way we communicate online.

These ideas may seem wild, but they are certainly not new. The voiceheads are quick to pull up their shirts to compare scars. With so many false starts, why is now the time for voice to become a member in good standing of the Web community? Here are three reasons.

1. Productivity #SOS

Today, voice solves only a space problem -- connecting two people across long distances in real time. But that model doesn't line up with how we work today. We work asynchronously, out of our email inboxes and social media activity streams. Live calls are increasingly disruptive to our workflow. Throwing in the pain of connecting across multiple time zones makes the need for a better way to work more pressing.

Text alone can't save us from this time-stretched, overloaded information stream. We need new tools, badly. Emerging hypervoice apps, where we can go back over our voice conversations and quickly find bits of information we need, will be like giving us perfect recall. Imagine augmented memory without an implant.

2. Viva La WebRTC!

The World Wide Web Consortium (W3C) drafted WebRTC as an API definition to enable browser-to-browser applications for voice calling, video chat and peer-to-peer file sharing without plug-ins. Today it's not trivial to put voice on the Web and make the pieces play nicely together, so it's hard to underestimate the impact that WebRTC (Web Real-Time Communication) will have on the development of future voice applications. Although the standard is still gathering form and adherents (e.g., Microsoft and Apple have not joined the party yet), WebRTC promises to make it whip simple for developers to integrate voice and video into their applications. By lowering technology barriers, new applications are likely to emerge quickly and seemingly out of left field. WebRTC will unleash the developers!

People are starting to get comfortable talking to, not just through, their devices. We saw this nascent behavior shift start with Siri, and now it is likely to expand with Google's hotwording. These behavioral shifts are a critical step forward, as we have to get comfortable with voice as an interface. We need to move away from using our mobile devices as a typewriter.

It's critical, and yet ... Behavioral shifts are the hardest friction point to overcome. Social convention and etiquette change far slower than our technology advancements.

And while we are talking about barriers, one of the most pressing to overcome for hypervoice will be the acceptance of recording our voice conversations. As an early adopter, I have about two years of recorded conversations. I had assumed that people would be more off-put at the prospect of being "on the record." What has really surprised me is how little anyone seems to care. By regularly recording my conversations in a format that was simply searchable and shareable not only by me, but by them as well, my colleagues saw it as a boon for their own productivity, too.

So the real question is: Are we ready to trade our privacy for productivity? We have done it before, countless times. But in some ways, voice feels special. It feels like part of our personhood. And to this last point, time will only tell.

Siri has a crush on my husband but refuses to give me the time of day. (Oddly, I find this true with lots of women that I've polled.) Hypervoice, thankfully, is not dependent upon speech recognition. Although it can certainly help (see VoiceBase http://www.voicebase.com), it is not a requirement. If you are able to mark up a conversation with tags and/or text notes that sync with the compiled audio, that qualifies. It also means that hypervoice works with all languages today. Here's a quick video to help explain the concept: http://vimeo.com/53700340

Absolutely, Chris. I may have biased the sample significantly. That said, this response also came through during our customer discovery process where we outsourced the interviews to a neutral third party, We found that push back to the idea of being recorded was strikingly low across a wide array of demographics. However, one of the key groups that bristled significantly were lawyers. Our target sample did not like the idea of their voice being recorded at all.

How do the voice-as-app folks address the difficulties in voice recognition? I drive a Ford, and SYNC doesn't always understand me. Nearly every iPhone user I talk to can regale me with stories of Siri misunderstanding what she heard. Nearly every week I struggle to understand someone on a voice phone call; it happens whether that person is on a mobile or running a softphone app. Text does not have this issue; facial recognition and object recognition software are starting to address this for video and images. Being able to search a pile of semi-intelligble sound files is not my idea of productivity.

As a journalist I record lots of interviews, and no one ever objects, but that's a more formal setting. I'm surprised with your experience, that recording more general conversations doesn't make people uneasy. Might that be because you're dealing with people in this context as a voice-as-app evangelist? And might people outside that voice community be less comfortable with it?