AWS’ new text-to-speech engine sounds like a newscaster

Thanks to modern machine learning techniques, text-to-speech engines have made massive strides over the last few years. It used to be incredibly easy to know that it was a computer that was reading a text and not a human being. But that’s changing quickly. Amazon’s AWS cloud computing arm today launched a number of new neural text-to-speech models, as well as a new newscaster style that is meant to mimic the way… you guessed it… newscasters sound.

“Speech quality is certainly important, but more can be done to make a synthetic voice sound even more realistic and engaging,” the company notes in today’s announcement. “What about style? For sure, human ears can tell the difference between a newscast, a sportscast, a university class and so on; indeed, most humans adopt the right style of speech for the right context, and this certainly helps in getting their message across.”

The new newscaster style is now available in two U.S. voices (Joanna and Matthew) and Amazon is already working with USA Today and Canada’s The Globe and Mail, among a number of other companies, to help them voice their texts.

Amazon Polly Newscaster, as the new service is officially called, is the result of years of research on text-to-speech, which AWS is also now making available through its Neural Text-to-Speech engine. This new engine, which isn’t unlike similar neural engines like Google’s WaveNet and others, currently features 11 voices, three for U.K. English and eight for U.S. English.

In this age of fake news, having life-like robot voices that sound like real newscasters feels a bit problematic at first. For the most part, though, whether a robot or human reads the text doesn’t make all that much of a difference. There are plenty of good use cases for the voices, and given the examples that AWS provided, you’ll be able to listen to these voices for significantly longer than the old ones before you want to cut your ears off.