Microsoft's AI APIs add content moderation, speech recognition

If you want your apps to understand what someone’s saying or know if your user-content rules are being broken, Microsoft has you covered.

Microsoft is expanding its portfolio of —in-the-cloud APIs that provide out-of-the-box versions of useful algorithms—to include that go into general availability next month: the Content Moderator and Bing Speech APIs.

Talk to me, and I shall hear

Bing Speech converts audio into text and vice versa. It’s also able to apply contextual understanding to that speech or text. The Speech API’s lets you try a limited sample of both text-to-speech and speech-to-text for yourself.

Both processes show their limits pretty quickly, though. Text-to-speech still sounds somewhat robotic; there’s always the sense that the speaker is emphasizing the wrong syllables. And speech-to-text still seems best suited for processing short command phrases rather than for performing transcriptions of longer texts. Google’s speech recognition API appears to be more accurate, although Microsoft offers competitive features like real-time streaming of results (as per Google’s Voice Typing function).

of LUIS includes the ability to parse command examples like “turn off all the lights” or “switch all lights to green” (for those of you with fancy multicolored LED bulbs).

Red light, green light, yellow light

With the Content Moderator API, Microsoft provides tools to help automate one of the more tedious and time-consuming jobs in creating services that accept user-submitted content. Content Moderator can check images, text, and video for “offensive and unwanted content that creates risks for businesses.”

Image moderation can check for “adult or racy content,” and can extract text from images by way of OCR—for example, to determine if meme-type images have offensive content. Both image and video moderation return simple “is/is not” checks for adult material, as well as confidence scores for more precise evaluation. Text moderation can check for profanity in more than 100 languages, as well as malware/phishing URLs. It can also return details about the original and corrected texts if needed.

The process isn’t intended to be entirely automatic; Microsoft provides both a tool and an API to allow individuals and teams to moderate submitted content and apply custom tags and workflows to data. But the underlying moderation APIs are meant to zero in on the content that needs at least some human oversight and to filter out the things that don’t.

, or having personal information broadcast maliciously by third parties.

Microsoft says Cognitive Services provides ready-to-use APIs and data models so that companies don’t have to build their own data sets or trained models. Equally important in time will be that it provides convenient ways for organizations to build extensions to these services without writing from scratch applications that back-end into Microsoft’s APIs. The workflow/tagging mechanism in Content Moderator provides a hint at this; these systems could be customized for a specific environment by feedback from nontechnical users instead of developers alone.