Microsoft has released a new version of its Translator API. This provides developers with the same speech-to-speech facilities as those used in the Skype Translator and in the iOS and Android Microsoft Translator apps.

The blog post announcing the availability of the new Microsoft Translator API Microsoft describes it as:

the first end-to-end speech translation solution optimized for real-life conversations (vs. simple human to machine commands) available on the market.

It also explains how it works using AI technologies, such as deep neural networks for speech recognition and text translation and outlines the following four stages for performing speech translation.

Automatic Speech Recognition (ASR) — A deep neural network trained on thousands of hours of audio analyzes incoming speech. This model is trained on human-to-human interactions rather than human-to-machine commands, producing speech recognition that is optimized for normal conversations.

TrueText — A Microsoft Research innovation, TrueText takes the literal text and transforms it to more closely reflect user intent. It achieves this by removing speech disfluencies, such as “um”s and “ah”s, as well as stutters and repetitions. The text is also made more readable and translatable by adding sentence breaks, proper punctuation and capitalization. (see picture below)

Translation — The text is translated into any of the 50+ languages supported by Microsoft Translator. The eight speech languages have been further optimized for conversations by training on millions of words of conversational data using deep neural networks powered language models.

Text to Speech — If the target language is one of the eighteen speech languages supported, the text is converted into speech output using speech synthesis. This stage is omitted in speech-to-text translation scenarios such as video subtitling.

2) Speech-to-text translation, for scenarios such as webcasts or BI analysis, allows developers to translate any of these eight supported conversation translation languages into any of the supported 50+ text languages.

A two-hour free trial is available. This provides 7,200 transactions where a transaction is equivalent to 1 second of audio input and is the same as the free monthly tier. Beyond this subscriptions are are available:

The prospect of being able to communicate without language barriers is becoming ever more a reality and the more we use it the better the facility will become. Ironically there's a error in the sample Microsoft uses in its artwork above - Gurdeep is the object of the final sentence in the English and becomes the subject in the French. This sort of error will quickly be corrected by machine learning as more data becomes available.

There's an updated version of Databricks Delta that improves the speed that Parquet data can be imported and has stronger merge features. The analytics engine has also been made available on Amazon AW [ ... ]