The University of Washington’s
Synthesizing Obama project took audio from one of Obama’s speeches and used it to animate his face in an entirely different video

In an age of Photoshop, filters and social media, many of us are used to seeing manipulated pictures – subjects become slimmer and smoother or, in the case of Snapchat, transformed into puppies.

However, there’s a new breed of video and audio manipulation tools, made possible by advances in artificial intelligence and computer graphics, that will allow for the creation of realistic looking footage of public figures appearing to say, well, anything. Trump declaring his proclivity for water sports. Hillary Clinton describing the stolen children she keeps locked in her wine cellar. Tom Cruise finally admitting what we suspected all along … that he’s a Brony.

This is the future of fake news. We’ve long been told not to believe everything we read, but soon we’ll have to question everything we see and hear as well.

For now, there are several research teams working on capturing and synthesizing different visual and and audio elements of human behavior.

Software developed at Stanford University is able to manipulate video footage of public figures to allow a second person to put words in their mouth – in real time. Face2Face captures the second person’s facial expressions as they talk into a webcam and then morphs those movements directly onto the face of the person in the original video. The research team demonstrated their technology by puppeteering videos of George W Bush, Vladimir Putin and Donald Trump.

Face2Face lets you puppeteer celebrities and politicians, literally putting words in their mouths.

On its own, Face2Face is a fun plaything for creating memes and entertaining late night talk show hosts. However, with the addition of a synthesized voice, it becomes more convincing – not only does the digital puppet look like the politician, but it can also sound like the politician.

A research team at the University of Alabama at Birmingham has been working on voice impersonation. With 3-5 minutes of audio of a victim’s voice – taken live or from YouTube videos or radio shows – an attacker can create a synthesized voice that can fool both humans and voice biometric security systems used by some banks and smartphones. The attacker can then talk into a microphone and the software will convert it so that the words sound like they are being spoken by the victim – whether that’s over the phone or on a radio show.

Canadian startup Lyrebird has developed similar capabilities, which it says can be used to turn text into on-the-spot audiobooks “read” by famous voices or for characters in video games.

Although their intentions may be well-meaning, voice-morphing technology could be combined with face-morphing technology to create convincing fake statements by public figures.

You only have to look at the University of Washington’s Synthesizing Obama project, where they took the audio from one of Obama’s speeches and used it to animate his face in an entirely different video with incredible accuracy (thanks to training a recurrent neural network with hours of footage), to get a sense of how insidious these adulterations can be.

Beyond fake news there are many other implications, said Nitesh Saxena, associate professor and research director of the University of Alabama at Birmingham’s department of computer science. “You could leave fake voice messages posing as someone’s mum. Or defame someone and post the audio samples online.”

These morphing technologies aren’t yet perfect. The facial expressions in the videos can seem a little distorted or unnatural and the voices can sound a little robotic.

But given time, they will be able to faithfully recreate the sound or appearance of a person – to the point where it might be very difficult for humans to detect the fraud.

Given the erosion of trust in the media and the rampant spread of hoaxes via social media, it will become even more important for news organizations to scrutinize content that looks and sounds like the real deal.

Telltale signs will be where the video or audio was created, who else was at the event and whether the weather conditions match the records of that day.

People should also be looking at the lighting and shadows in the video, whether all of the elements featured in the frame are the right size, and whether the audio is synced perfectly, said Mandy Jenkins, from social news company Storyful, which specializes in verifying news content.

Doctored content might not pass the scrutiny of a rigorous newsroom, but if posted as a grainy video to social media it could spread virally and trigger a public relations, political or diplomatic disaster. Imagine Trump declaring war on North Korea, for example.

“If someone looks like Trump and speaks like Trump they will think it’s Trump,” said Saxena.

“We already see it doesn’t even take doctored audio or video to make people believe something that isn’t true,” added Jenkins. “This has the potential to make it worse.”