The Threat of Voice Cloning

In our November 2017 article, we mentioned Lyrebird as an example of a speech synthesis company that has the potential to thwart voice biometric systems under the right circumstances. This month, Tom Allen of Computing Magazine (UK) wrote an article about Baidu, another speech synthesis company based in China, and their ability to clone a voice with “only a few seconds of audio”. Click here to read the complete article.

Baidu has been working on their “Deep Voice” software for over a year, and they claim that their latest version needs only 3.7 seconds of speech to clone a voice. The article does not mention the specific content that is required, but this is an amazing feat at even 5 times this amount of speech!

This technology clearly makes sense for voice bots, IVR systems, and other applications where text-to-speech (TTS) technology is applied. We will no longer have to put up with mechanical or robotic-sounding synthetic speech in computer applications. We can even create our own personal “voice avatars” to generate good approximations of ourselves speaking – with only input from our keyboards. Think about it, there are many applications where this technology could benefit us.

However, as a voice biometric software vendor, we are naturally concerned that someone with this technology, good recordings of a valid speaker, and bad intentions, could eventually thwart a voice biometric system, thereby negating the positive benefits of using voice biometrics. It’s still too early to tell how good this technology is, and how widespread it will become for the average person. But, it is a very real threat that we are taking seriously.

Check back soon, as we will definitely continue to follow this technology ...