Resources

Alphabet’s DeepMind owns the WaveNet neural network which now powers the US English and Japanese voices of the Google Assistant, enabling better and more realistic speech capabilities for your Google Assistant enabled device. Generating raw audio, it has taken a long time even with the help of Google’s research teams and WaveNet’s to make efficient enough to use in devices.

Originally created in 2016, the technology was deemed too young to be ingrained into everyday usage on consumer devices due to the length of time it took to generate small snippets of audio. Now released, it uses an entirely different method to generate sound than previous solutions have. To put it into perspective, it used to take about 1 second to generate 0.02 seconds of audio. Now it takes 1 second to generate 20 seconds of audio! The quality has also greatly improved from WaveNet’s initial creation, with the sample rate increasing drastically and the resolution doubling.

Previously, concatenative speech algorithms based on a database of high quality recordings were used. They were usually generated from a single voice actor over many hours, and then sounds were cut and spliced together as needed.

The team behind WaveNet took an entirely unique approaching. Building a convolutional neural network, it can generate entire waveforms based off of samples it had analysed in the past. It uses 16,000 samples per second and transitions between all of them seamlessly. The network was trained based off of a large dataset of speech samples. The neural network analysed this data, determining how tones follow each other and what samples were realistic or not. It built its accent off of these voice samples, so in theory any number of unique voices with unique accents are possible.

WaveNet will be the first product to be powered by Google’s Tensor Processing Units. These TPUs are designed to accelerate machine learning and artificial intelligence, a perfect platform for WaveNet to continue to grow and learn. What’s more, WaveNet scores much higher on a mean opinion score scaling from 1-5 than its predecessors. You can see that for yourself and also listen to some audio samples below!

About Author

A 20-year-old Irish technology fanatic in his second year of a Computer Science degree in University College Dublin. Lover of smartphones, cybersecurity and Counter Strike. You can contact me at [email protected] My Twitter is @AdamConwayIE and my Instagram is adamc.99.

XDA Developers was founded by developers, for developers. It is now a valuable resource for people who want to make the most of their mobile devices, from customizing the look and feel to adding new functionality.Are you a developer? | Terms of Service