Adobe’s VoCo Lets You Put Words In Your Audio Recording’s Mouth

Several months ago, I posted about Face2Face, a program that allows for the manipulation of video in real time to change and animate facial expressions. It’s an interesting technology, but also concerning – all you have to do is watch this video to see how it might be used in the wrong hands.

Well, today we have something new and exciting to fuel the fires of conspiracy.

According to Engadget, this month Adobe unveiled their experimental Project VoCo tool, which allows users to insert dialogue into preexisting voiceover recordings simply by typing in the words or phrases you want to hear. The new dialogue will sound more or less just like the original voice.

The software does this by analyzing the sound of the original voice (about 20 minutes worth of audio will do), then synthesizing the new words to match.

In the above video, you can watch Adobe’s new tech in action, in this case using an audio sample of Keegan-Michael Key. It’s not perfect, but the real magic begins at around 4:20, when the presenter shows off the ability to add small phrases.

As usual, the first question to arrive with such a technology is: Can it be used for evil? The answer, of course, is yes. But Adobe is also apparently researching ways to detect audio forgeries, through things like watermarks.

That said, like any software, you can imagine that other variations will pop up in the future without said watermarks. The technology is here, one way or the other.

So much potential. People could use this for fan works like create mugen fighters and use the original voice of a character for a fighter. Example, one could theoretically use the voice clips of old cartoon characters to create new audio for their fighter version. Man, I’d love to see an open source version so I don’t have to deal with Adobe and their pricing methods.

text to speech already uses this. Take a live human and they record them saying various things and the text to speech program puts it all together so any word can be pronounced by the voice. Im interested to see if it sounds as robotic and bland as the tts voices or has the tech improved that much. from the demo its only a couple of words so its hard to tell. But just google “text to speech demo” and look for “acapela text to speech” and experiment with all the voices