Smartphones have undeniably taken over our lives in the past few years and most of us carry one everywhere we go. One of the most-used features on a modern phone is the camera, and it's being used all the time. In many cases, more than one person is taking photos or videos of what is going on and each person gets different angles and maybe a different part of the action. There has not been an easy way to edit it all together, but Disney Research announced the development of an automatic editing tool that takes the different videos from a scene and intelligently combines them.

The tool first analyzes what each camera is seeing and from what angle, and then based on that information is able to extrapolate what is the main subject of the video. For example, if four out of five cameras follow a kid running in the snow, it knows that the kid is the main subject and it will work based on that. Then the tool analyzes the angles of the different videos and will make smart choices about how to cut the videos to make sense to the viewer.

We present an approach that takes multiple videos captured by social cameras that are carried or worn by members of the group involved in an activity—and produces a coherent “cut” video of the activity. Footage from social cameras contains an intimate, personalized view that reflects the part of an event that was of importance to the camera operator (or wearer). We leverage the insight that social cameras share the focus of attention of the people carrying them. We use this insight to determine where the important “content” in a scene is taking place, and use it in conjunction with cinematographic guidelines to select which cameras to cut to and to determine the timing of those cuts. A trellis graph formulation is used to optimize an objective function that maximizes coverage of the important content in the scene, while respecting cinematographic guidelines such as the 180-degree rule and avoiding jump cuts. We demonstrate cuts of the videos in various styles and lengths for a number of scenarios, including sports games, street performance, family activities, and social get-togethers. We evaluate our results through an in-depth analysis of the cuts in the resulting videos and through comparison with videos produced by a professional editor and existing commercial solutions.

Noam Galai is a Senior Fstoppers Staff Writer and NYC Celebrity / Entertainment photographer. Noam's work appears on publications such as Time Magazine, New York Times, People Magazine, Vogue and Us Weekly on a daily basis.

Synchronization of footage is done now with the audio track with programs like FCP, Premiere, and PluralEyes. Not sure what advantage it would have over the existing sync technologies, other than negating the need for sound, but all modern phones and cameras record sound anyway. Automatic editing? So you're letting a program make judgement calls?

I think this would better serve government/law enforcement and/or network ENG production more, who deal regularly with security camera footage that typically doesn't have audio, as well as social network footage (which may or may not have audio, but is additional alternate event footage). It wouldn't surprise me if such technology already exists, or if this is existing technology being rolled out for public use.