Hacking Film: Automating Your Assembly Cut with Computational Editing

Since 1974, computer graphics professionals have been meeting at an annual conference called Siggraph to show off the latest in technology and research. For decades, the conference appealed to only those deep in academia and the R&D departments of tech companies. But Siggraph is one of those places where, in 2017, everything that will be commonplace by 2037 is just a seed of an idea, living as part of someone’s Ph.D. thesis presentation, a product in development or proof-of-concept looking for investment capital.

Siggraph 2017 at the LA Convention Center

An example: at Siggraph ’95 I stood in front of a monitor where I saw my image augmented with a cartoon duck’s head. I turned my head, and my now duck head matched it with zero lag. Twenty-two years ago this was a mind-blowing feat of graphics virtuosity running on a liquid-cooled SGI machine. Today, it’s a SnapChat/Instagram feature.

With that anecdote in mind, I decided to pour over the presentations and exhibitors from Siggraph ‘17 to see what part of the future was presented this summer in LA. But then I got stuck on one particular project that ended up being the sum of this entire blog. So instead, I’m going to tell you about that.

COMPUTATIONAL VIDEO EDITING FOR DIALOGUE-DRIVEN SCENES

Watch this video first. Now, if you want to know more about how it works, download the PDF. I think this will have a profound impact on post-production in the next couple of years. Automated video editing isn’t going to be what you think it’s going to be—it’s not going to put editors out of work. But it is going to change the nature of editing.

Watch the video. You probably won’t be using this exact piece of software, but you will be using something like it very soon. You’re already using automation to organize your files, edit photos, offer you suggestions for things you might like to watch, buy and follow. So you’ve already been prepped for this kind of digital assistance.

Automating the duties of the assistant editor is not a new concept, but this is by far the best-realized example. It’s more than just your NLE’s browser building “smart bins” or one of those Movie Shaker style apps that semi-randomly organizes your footage and photos into a music video complete with “Ken Burns” wipes. This system takes source footage—along with a script—and assembles entire edited scenes in a few minutes.

The system demonstrated in the video above uses a bunch of really cool Machine Learning techniques to develop “emotional sentiment” values, meaning it arranges clips based not just on which line is said, but also on how each line of dialogue is delivered by the performer—positive, neutral or negative. It identifies the actor (or a number of actors) in the frame, and also the shot type: wide, medium, close-up, zoom, etc.

So at the end of the analysis, you get something like this:

Every shot, automatically labeled by speaker, shot type and emotional sentiment. With all the work of sorting shots done automatically, the fun can begin. The system doesn’t use time-based editing to assemble scenes. Meaning you’re not setting in- and out-points and dropping shots together on a timeline.

Instead, you select some “Idiom” widgets: some rules to drive the editing of the scene. Example: in a two-person dialogue-driven scene, you would cluster some of these idiom widgets like: “speaker visible” + “no jump cuts” + “emphasize ‘X’ character.” The system them generates an edit, selecting shots and setting in- and out- points that it determines best satisfies these rules. And an edit is done in minutes rather than hours.

The editor is not handed a fine-tuned edit—that’s still something that requires the nuance of a human editor. Rather, the system can quickly assemble a rough cut in a variety of styles for an editor to use as a starting point.

Editing becomes less about the 90% tech work of sorting and assembling shots, and more about creative flights and experimentation. This project isn’t the end of editing; it’s the front end of a much more creative and far less technical era of editing.

SO, WHAT’S NEXT?

Typical Adobe Premiere timeline

Here goes my guessing about where this all lands in a few years.

Post-production houses are already routinely delivering multiple edits of a single show. There are cut downs needed for different broadcast lengths, edits for different ratings standards, on-line, bumpers, etc. A system like this—that tags footage based on content, not just metadata—will likely automate all of that. You need a 47 minute show trimmed down to 43 minutes without losing anything important? It’ll do that and show you exactly what got cut out and why.

The end of each shooting day, a director will go back to her home/hotel and be able to watch assemblies of today’s footage. The movie will build itself with every new scene filmed. There may still be months spent in editing, but the process will begin as soon as the camera starts shooting and footage hits the hard drives.

Something to note (and I don’t know what to make of it exactly) is that one of the organizations involved in this research project is Adobe, the software/data management company behind Creative Cloud. Are we seeing the foundation of new editorial tools for Premiere CC 2018?

AND AFTER THAT?

So many video options… all customizable

Eventually, this kind of system could live client-side. Meaning, every show you watch will be tailored for how you like to watch something. A customized system will deliver edits on the fly, tailored to the specific quirks of your attention span and multitasking habits. No two people will watch the exact same version of a show.

This is scary and it’s really cool. It means that edited video media will become a collection of clips that can assemble themselves when someone wants to watch it, and the assembly will match whatever time and attention the viewer has to do it in. This doesn’t replace skilled editors, a director’s cut or an audience watching a film in a theater. It just adds one more place in our screen-dominated lives where video content can fill whatever attention and available time we may have.

Learn more about the evolution of filmmaking tech, including where we might be headed next, by checking out more of Eric Escobar’s Hacking Film series.