In a new paper from University of Washington, researchers demonstrate using neural networks to generate realistic-looking video footage of Obama. Is this the fake news of the future?

The pervasiveness of fake news during the 2016 election—and its possible impact on the results—has prompted social media companies to crack down on the rapid spread of misinformation on their platforms. But these policies focus on fake news in the form they took during the election—articles, posts, and Twitter bots. But what if fake news began to take on another form, with the help of artificial intelligence? What if fake news could be spread by hyperrealistic video, straight from the mouth of a public figure you trust?

advertisement

advertisement

A team of researchers from University of Washington has figured out a way to use neural networks to generate fake video of Barack Obama. By using audio clips and algorithms trained to manipulate mouth movements, these researchers have generated video of Obama so realistic and seamless, it’s hard to distinguish between the manipulated video and the original. And while the researchers envision using the tech for much more benign purposes, it’s easy to imagine its potential for being co-opted for the spread of fake information.

The video below illustrates some examples of this tech at work. The researchers chose to illustrate their work with a selection of Obama’s weekly addresses to the nation. They trained a neural network on the footage of these addresses—all 17 hours, or nearly 2 million frames, from all eight years—which is easily accessed online. In one example, a generated Obama splits the screen with the real Obama; both address the Pulse nightclub shootings using the same language, though generated Obama has a different room at the White House as his background. In another example, a generated Obama is miming the words of his much younger self from 1990.

These videos are extraordinary for a couple of reasons—the first of which is the technology the researchers have come up with. Manipulating a person’s lip movement to line up with a new audio track is nothing new: another recent example is Face2Face, developed by researchers at Stanford, which animates a face on video footage according to the movements of another person captured on a webcam. The difference with this technology is that while it does rely on hours and hours of Obama footage to train the neural network, it does not merely re-animate Obama’s mouth to match the new audio. Instead, the models learned what mouth shapes were linked to various sounds from being fed the videos that already exist online. The researchers then took new audio and dubbed it over existing video, then took the mouth shapes that lined up with the new audio and grafted into the existing video. What results is an eerily realistic-looking Obama, making all of the right pauses, head nods, and expressions you recognize him for.

The other extraordinary thing about this technology is its potential uses. The researchers say that the algorithm can be applied to video calls to generate video from audio using significantly less bandwidth than actual video does. It could also be used to create summary videos of long speeches, by cutting out portions of the talk and mending it seamlessly back together (an example of this is in the video above).

But as their example suggests, the technology can also be used to manipulate messaging. For example, someone could graft together a new statement for generative Obama, using words that he said in an audio clip, though not necessarily in that order. It would be simple to imply a different meaning even by using his own words. It’s also possible to make it look like Obama is saying something his much younger self said, or vice versa. It’s even possible to match up video of Obama with an audio clip of someone impersonating the former president, so that it looks like he said something he never has before.

advertisement

The researchers used Obama because there was an abundance of footage of him online. In other words, this method only works if there is enough video on which to train the algorithm. That limits the scope somewhat, but then again, it puts highly public figures at the highest risk for manipulation—many of whom are figures of authority, people we trust. Soon all content providers may need to define policies for curbing fake news, if it makes its way from social media posts to video.

advertisement

advertisement

advertisement

About the author

Meg Miller is an associate editor at Co.Design covering art, technology, and design.