Kristen Stewart Uses Artificial Intelligence for Come Swim

No, she is not just a pretty face. Kristen Stewart, who skyrocketed to fame as “Bella” from Twilight, has come a long way since her teenage portrayals of hot-headed vampires and werewolves.
And when I say long, I mean she directed a riveting absurdist short film, and, co-authored a research paper on neural networks (an Artificial Intelligence-based technology) for the movie she directed. She did this with Bhautik J Joshi, a Research Engineer at Adobe, and David Shapiro, the Producer at Starlight Studios. (You can find this paper on Cornell University’s ArXiv.)
Kristen’s team explored the application of Neural Style Transfer in producing her short film, Come Swim, parts of which were made to look like a moving impressionistic painting. Their paper titled, Bringing Impressionism to Life with Neural Style Transfer in Come Swim talks about “using neural networks to artistically redraw an image in the style of a source style image.”
This super-artsy-tech collab was the result of a dream Kristen had of a man lying on the floor of the sea, which she says, “couldn’t let pass without capturing it somehow.” She first followed the visual with an impressionist painting of a man waking up from a dream, and then turned to AI-based machine learning to bring it to life.
Kristen wanted to realize her directorial intent of visualizing a state of “fading-in between dreams with reality” that each one of us invariably experiences in the very first moments of waking up. These precise thoughts have been artistically rendered using Neural Style Networks that eventually helped bring out the poetry in the concept.
Let’s explain in plain English how a painting ended up becoming a part of a moving picture feature in Come Swim. (This kind of rendering has not been used in formal VFX production yet.)
But before we nerd out, first a little bit about the evolution of technology in art.

Technology in Art: Photographs

The first ever photograph was taken in 1826 by Joseph Nicephore Niepce; he started by experimenting with light-sensitive material to capture projections from his camera obscura. At the time, the only way known to capture an accurate picture of anybody was painting.
Since renaissance, most of the progress in painting was focused on accuracy, which basically meant realistic impressions of lights, shadows, contours, perspectives, etc.

By 1838, Louis Daguerre had improved on the technique of capturing images by using polished silver plates with Iodine. He succeeded in mechanizing the way reality is captured.
A year later, Paul de Roche famously declared, “Painting is Dead.” In a way he was right, realistic paintings turned into impressions (read Van Gogh, Pierre Rousseau, Pablo Picasso). And by the middle of the 20th century, art had been essentially liberated from the role of accurate representations to incredibly interesting new ways of depicting reality.
Meanwhile, in 1889, Kodak began commodifying photography, and people got hooked to this new form of “art,” and to this day, fine art photography is alive and well, hundreds of millions of dollars of revenue is generated in terms of art sales around the world.
But this art form of painting, in a way, has come full circle (case in point: Come Swim); artificial intelligence in machine learning has allowed the incorporation of photography as part of the painting process. Come Swim takes it a step ahead: it uses a painting to redraw images shot by a camera.
To understand this, let’s shift gears to machine learning through neural networks.

Artificial Neural Networks (ANNs)

Neural networks are a form of computing which has its beginnings in the 40’s, this is around the time people started thinking about the brain as a computer. Alan Turing, proposed a computing architecture based on this very idea. He envisioned neuronal units which would make very simple calculations based on connections from other neurons that would be essentially tunable (read tuning a learning algorithm) in nature.
Neural networks are made of units that do very simple calculations, such as, addition/subtractions or applications of simple functions. Each neuron is connected to other neurons and is responsible for agglomerating data that comes its way and sending it across to other neurons connected to it, just like inside a brain. Such junctions (or links) are called Nodes, and the output obtained at each of these Nodes is called its activation or node value.
The pixel data of an image in a feed-forward neural network (the most basic kind of a neural network) is fed in a unidirectional manner: first into the input layer of neurons, the next layer processes it and then passes it onto the layer adjacent to in the series; there are no feedback loops.
The connections that connect the neurons have weights (or integer numbers that control the signal passing through a node) associated with them, and these weights are random at first. As Alan Turing also proposed, the system is disorganized at first and it needs to be “trained,” which is the essence of machine learning.
And by “training” we mean adjusting the weights of neuronal links for accurate pattern generation/recognition. Such a network is, thus, capable of extracting salient features from any kind of input data. In the case of Come Swim, it’s the features of the painting that Kristen created.
The weights need to be adjusted in a manner that allows the network to “learn” to behave in a certain manner, in this case, learning how to redraw the image of the painting. And if you do this over and over again (basically tune it) in an iterative manner, you end up with a learning algorithm that can help you get the desired outcome (256 iterations were used to get the desired look for the shot).
And what do we really mean by learning?
The first layer of neurons to receive the input data become reactive to simple features like edges, corners, or combinations of them, and as you move through the layers, they become more responsive to features than the ones preceding them. So by the time the last layer has received the input, they start recognizing more complex features, like let’s say, the head of the man in case of Kristen’s painting.

The Original Style Image: A Painting Created by Kristen Stewart Based on Her Dream

This behavior is similar to a human brain, where researchers have found out that the progressive layers of neurons present in the visual cortex of the brain also extract progressively higher order of information from a visual stimulus.

Decoding Neural Style Transfer in Come Swim

Transferring the style of an image to another can be done using a convolutional neural network (CNN), which is a feed-forward network in essence. The arrangement of neurons in a CNN emulate an animal’s visual cortex. Individual cortical neurons tile the visual cortex by responding to stimuli in a restricted region of space known as the receptive field.
CNNs are made of multiple layers of such receptive fields that are essentially collections of neurons that help process portions of an input image (Kristen’s painting in case of Come Swim). The outputs of these collections are tiled so that their input regions overlap to obtain a better representation of the original image; this is repeated for every such layer.

The Camera Shot Used for Tuning of the Style Image for Come Swim

The team of Come Swim used CNNs to transfer the painting’s style onto a test frame and then tuned their learning algorithm by adding “blocks of color and texture” until a desired painting-like texture was achieved for the shots. Once the transfer process was tuned correctly, they applied it to different parts of the film, in process creating frames like the ones shown below.

Transfer of Texture and Contrast from the Style Image to a Camera Shot to Give It an Impressionist Painting-Like Qualities

Most of the cutting edge technology now uses neural networks, and fortunately enough we have high computational power at our disposals to experiment with such technology. GPU accelerator cards can easily be installed on our PCs that provide gigantic computational power for very little capital investment on our parts.
Kristen Stewart’s Come Swim has been described as “a diptych of one man’s day,” composed of portraits halved into the impressionist and realist genres. Come Swim premiered last week at the Sundance Film Festival, 2017, Utah, and has been generating a lot of buzz. And rumor has it, Kristen is possibly going to Cannes with it this summer too.
So I suppose it’s about time we stopped taking art for granted took notice of this whirlwind of confluence of technology and art going around us. Besides, AI and machine learning are no longer esoteric manifestations of technology—they can be used for anything with a sprinkling of imagination.
All one needs to do is build an algorithm, train the network, and tell it whatever you saw, and get playing around with it!
Check out AcadGild’s course on Machine Learning and start innovating with neural networks!