Yap Isn’t Much Like Siri. So Why Does Amazon Want It?

CLT Blog’s Justin Ruckman decoded SEC filings to turn up an intriguing recent Amazon acquisition: Yap, a Charlotte-based speech-recognition startup best known for its recently shuttered voicemail transcription app and backend services for some of Microsoft’s voice-to-text application.

So far, Amazon hasn’t publicly commented on or even confirmed Yap’s acquisition, and didn’t immediately respond to our attempts to find out what it plans to do with the company. It’s an uncharacteristic buy for them, since the company traditionally hasn’t bothered much with voice technology. Amazon’s Kindle Fire tablet doesn’t even have a microphone. So what’s going on here?

But Yap isn’t actually very much like Siri. Yap’s specialty is transcriptions; Siri’s is artificial intelligence. Apple packages Siri’s core software with third-party search and transcription services to extend its functionality, which leads to some overlap (like voice-based text messaging). The heart of Siri, however, is the AI that strips human language for meaningful phrases and transforms them into actionable commands. Unless Yap is hiding something deep within its labs that they’ve never shown to anyone, the company doesn’t have anything quite like that.

What Yap does do, though, and does very well, is cloud-based voice transcription — i.e., literal, word-for-word rendering of speech into text, at very high volume with very high accuracy but at very low cost. It can do this with direct dictation or recorded speech, with something as short as a text message or voicemail or as long as an entire keynote address. Transcribed speech can then be used for search, commands, or output directly into a document.

The closer analog to Yap then, isn’t Siri, but Nuance, the company behind Dragon’s collection of voice applications for desktop and mobile, and whose engine powers the speech-to-text component of — you guessed it — Siri.

What, then, does Amazon want with Yap? In the absence of a public announcement, I can think of a handful of possibilities that are much more likely than any head-to-head competition with Siri. For the sake of convenience, I’ll arrange them from most to least likely.

It’s a straight-up play for licensable patents and other IP. Yap co-founder Igor Jablokov reportedly told Ruckman that the company had “IP in every iPhone and Android device.” Microsoft has used Yap tech, too. As Amazon builds its device portfolio, it would rather cross-license IP than pay a fee to anyone.

Forget about Amazon-branded hardware for a second. To help drive retail sales, Amazon’s been experimenting with all kinds of user interfaces to aid search in its mobile applications: text, barcode scanning, photography, etc. Voice is a natural next step.

Yap is a cloud company; Amazon is a cloud company. As Amazon offers increasingly more robust services to its cloud customers, high-quality automated voice-to-text transcription is an extremely appealing feature, and in certain sectors, could be decisive.

Google has messed around with voice-to-text transcription for limited applications like voicemail transcription, but hasn’t ever really focused on or commoditized it. That’s just not Google’s style. Amazon could use its cloud computing strength to supercharge Yap and offer genuine commodity transcription services at a competitive price. Think about it: every time you wished you had a written copy of an audio file, you could upload it to Amazon, pay a small fee, and have it quickly spit back a pretty accurate transcription. This is the Holy Grail — what my friend Matt Thompson calls “the speakularity.”

Amazon’s secretly making a smartphone. This is just one of piece of that. NB: Hey, I told you these would get increasingly unlikely as they went on. But Amazon using its existing tech to make a competitive Android-based smartphone is to me more likely than it plunging deep into artificial intelligence research to turn Yap into a Siri competitor. In fact, it might even be a precondition for anything like that to happen.

We’ve reached out to Amazon to see if they’ll shed any light on what their plans are.

Siri is exciting because voice interaction is exciting. But just like multitouch interfaces turned out to be much bigger and more versatile than their implementation on the first iPhone, voice interfaces are already turning out to be much bigger and more versatile than their implementation with Siri.

Virtual assistants are just the beginning. In the near future, we’re going to see a lot of new investment in voice interfaces begin, and prior investments in voice interfaces pay off.