Project VoCo demo offers intriguing look at tech for word changes

November 6, 2016
by Nancy Owano

What are the first things that come to mind when you hear the word Photoshop. On the good side, you know this has been quite a valuable tool for all creatives. On the other hand, you know how some people have railed against designers and other media creatives for using its features to make people look thinner, younger or generally better than they look in real life.

Well does Adobe have a surprise for you. Think being able to do for audio what Photoshop accomplishes for photography.

Project VoCo allows you to change words in a voiceover simply by typing new words. Huh? The project allows you to edit speech.

Tyler Lee in Ubergizmo said, "Photoshop is a tool used by many professionals in the photography and graphics editing industry. It lets users touch up photos as well as manipulate them by erasing certain objects, adding objects, merging photos, and so on. Now it looks like Adobe wants to be able to do the same for speeches."

Project VoCo can generate authentic-sounding speech in anyone's voice based on a sample. You can be adding words that did not originally appear in the audio file and the voice sounds the same; people would not ascertain the change.

"Add text to a recording in exactly the same voice by simply selecting a clip of speech, opening up an edit box, and typing in new text," said No Film School's Jon Fusco, Producer/Editor.

It will work if you feed Project VoCo about 20 minutes of audio featuring that someone's voice. Then it's good to go. The software proceeds to let you edit that speech. No Film School said "all you need is around 20 minutes of recorded speech for the algorithm to kick into gear for replication purposes. It analyzes the speech, breaks it down into phonemes, transcribes it, and creates the voice model."

As you may guess from the word Project in its name, this is not something available now. It grabbed audience attention at the Adobe MAX 2016 event, where people get to take a sneak peek at some projects. Adobe's Zeyu Jin described what it does.

Craig Stewart, editor of Creative Bloq, wrote about the demo.

"Jin took a clip of speech and by simply typing new text into an edit box was able to add that text into the speech, in exactly the same voice. In other words, he 'redubbed' what the speaker has actually said."

Elsewhere, reactions have been interesting, ranging from great tool, nice for video games, to concerns over what this could mean for doctoring what people actually say.

Some commenters called up the possibility of abuse in adding words not originally found in the audio file. What if it were used to trick people or to abuse known personalities or opponents? Simply put, as one classic comment came in, "What could possibly go wrong?"

The point was not lost on Tyler Lee. He said that "from an ethical standpoint we have to wonder if such a software should ever be released to the masses."

At the same time, there is no need to fly off the handle. There are no signs as of yet that Adobe is preparing to ship Project VoCo as a commercial product

Nick Statt of The Verge also said, "it's not clear at this time when it will materialize as a commercial product." This is so far a project.

According to Statt, "An Adobe representative confirmed the project's existence to The Verge, clarifying that it was shown ... as part of a sneak-peek program at the MAX conference."

Meanwhile, tech watchers gave reasons for why this would be a useful tool among professions. Stewart wrote that the project raises "ethical alarm bells about the ability to change facts after the event (an ethical minefield already well-trodden by some using Photoshop's photo editing features)," but at the same time, " it could also be an incredibly useful tool for podcasters and audiobook-creators, making them able to post-produce audio edits without having to pay a voice actor for re-records."

Project VoCo would be useful for those who work with visual and audio media, and No Film School placed it in that perspective. Fusco said that "as avid podcasters and filmmakers ourselves, it's safe to say we're excited by what we see."

He shared an example of how Project VoCo can help out. "If one of your actors gives a reading that proves to be just a little off, you may now be able to tweak it, adding or replacing a word that doesn't originally appear in the audio file."

Statt in The Verge said the project is in development as part of a collaboration between members of Adobe Research and Princeton University.