29 May 2014

At the inaugural Code Conference in California, CEO Satya Nadella has revealed that Microsoft’s real-time speech translation technology will finally make the jump from the mystical, bottomless pit of its R&D department to a consumer product: Skype. On stage at the conference, Nadella demoed a beta version of Skype Translator, which performed real-time translation of English to German speech, and vice versa. Skype Translator isn’t perfect, but it’s tantalizingly close to the creation of a Star Trek-like universal translator or Babel fish if you prefer that allows everyone in the world to communicate, even if they don’t share a common language.

We first saw Microsoft’s speech translation tech way back in 2012, when Microsoft Research’s Rick Rashid translated his own English speech into Spanish, Italian, and Mandarin. We then saw the tech again in November 2012 but since then, Microsoft has been fairly quiet. Now we know why: Microsoft has been trying to squeeze the technology into Skype.

Your browser does not support iframes.

In the demo, Microsoft’s Skype and Lync vice president Gurdeep Pall has a conversation with a German friend. He speaks in English, and Skype translates it into German and then she speaks in German, and Skype translates it into English. It isn’t quite real-time, but it’s pretty good (and language translation will never be real-time anyway, because of phrases, syntax and semantics, and other linguistic caveats). Microsoft says a beta version of Skype for Windows 8 with speech translation will be available “before the end of 2014.”

Personally, I was a little disappointed in the demo. Let’s not forget that it’s basically just a piece of software that does speech-to-text conversion (a la Dragon speech recognition software) and then text-to-speech (a la Microsoft Sam). Machine translation between the two languages occurs in the middle, but that’s not exactly very exciting either (Google Translate has been free to use for years).

Back when the real-time speech translation was first demoed in 2012, it actually used the speaker’s voice in the translations as in, it would convert my English into German, but keep my accent, timbre, and intonation. This was some seriously impressive tech that essentially reverse engineered your voice into a series of phonemes (individual sounds), and then used that information to reconstruct your voice in a new language in near-real-time (the demo starts at around the six-minute mark in the video above). Presumably this technique required too much processing power, and so now we just get generic, Microsoft Sam and Microsoft Anna computer speech. (I wonder what Skype will do for gender edge cases.)

While the Skype Translator demo wasn’t quite as awesome as I’d hoped, in reality the lack of accent/timbre is only a minor quibble. The potential for real-time speech translation in education, business, diplomacy, and multilingual families is huge. Just by downloading a new version of Skype, western companies could start doing business with companies in China and other huge growth markets. And yes, there’s no reason Microsoft will reserve this tech just for Skype a real-time speech translation app for Windows Phone would be pretty useful for travel.