Microsoft brings video voice recognition for everyone

Mary Branscombe |
Sept. 12, 2014

Azure Media Services is something Apple might want to consider for streaming its next keynote, rather than rolling its own system on Amazon Web Services and Akamai. It's what big-name broadcasters used to stream the 2014 Winter Olympics and the 2014 World Cup, it's what powers the Blinkbox streaming video service, and if you watched the Xbox One announcement you've already used it, so it's certainly proved its reliability.

Avanade's Paul Veitch suggests that the first customers beyond broadcasters will be banks, especially traders — and regulators.

"Lots of banks are interested not only in storing data in the cloud but in how you recall it. You could say 'tell me when I was talking to this customer about the price of gold' and it will know where that part of the conversation was. Now we can analyze that data and make it searchable. The Financial Conduct Authority are quite interested in that for compliance; are the Chinese walls inside the bank working? And internal compliance departments are interested too; they're looking at data mining audio calls and conversations."

He suggests it will be even more useful it you connect it to other data sources and machine learning systems. "There are already automated trading systems that monitor Twitter," he points out. "Now you could do monitoring inside the bank for sentiment too."

Mining voice recordings is the kind of thing that would fit in perfectly with Delve, the social network for documents that's just launching in Office 365. Delve looks for documents your colleagues are working on, and willing to share publicly, that are relevant to what you're working on or the meeting you're about to go to, and shows them to you.

That would be extremely useful if it included links to the recording of a Lync meeting where the customer you're going to meet tomorrow is phoning up to make a complaint, or right to the minute in your online training video where the presenter covers how to fix the problem you're writing an email about. If you can get the right two minutes of it, a three hour video becomes much more useful.

That's the kind of thing Satya Nadella means when he talks about "productivity [including] group collaboration and business processes" or about "digital work and life experiences" that include "intelligent and social work experiences". The Indexer is aimed at broadcasters and content companies today, because they already know what they will use it for (but services like YouTube and Twitch are turning almost everyone into broadcasters now). Now that voice search is available on Azure as a service, we can see what else you can use it for.