Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classiers are further rened using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we conrm that our proposed methods can learn good object masks just by watching YouTube.

One thing we have been working on within Research at Google is developing methods for making casual videos look more professional, thereby providing users with a better viewing experience. Professional videos have several characteristics that differentiate them from casually shot videos. For example, in order to tell a story, cinematographers carefully control lighting and exposure and use specialized equipment to plan camera movement.

We have developed a technique that mimics professional camera moves and applies them to videos recorded by handheld devices. Cinematographers use specialized equipment such as tripods and dollies to plan their camera paths and hold them steady. In contrast, think of a video you shot using a mobile phone camera. How steady was your hand and were you able to anticipate an interesting moment and smoothly pan the camera to capture that moment? To bridge these differences, we propose an algorithm that automatically determines the best camera path and recasts the video as if it were filmed using stabilization equipment.

YouTube effects: Shake it like an Instagram picture

YouTube users can now apply a number of Instagram-like effects to their videos, giving them a cartoonish or Lomo-like look with the click of a button. The effects are part of a new editing feature that also includes cropping and advanced image stabilization.

Taking the shaking out of video uploads should go a long way towards making some of the amateur footage captured on mobile phones more watchable, but it can also be resource-intensive — which is why Google’s engineers invented an entirely new approach toward image stabilization.

The new editing functionality will be part of YouTube’s video page, where a new “Edit video” button will offer access to filters and other editing functionality. This type of post-processing is separate from YouTube’s video editor, which allows to produce new videos based on existing clips.

Lights, Camera… EDIT! New Features for the YouTube Video Editor

Nine months ago we launched our cloud-based video editor. It was a simple product built to provide our users with simple editing tools. Although it didn’t have all the features available on paid desktop editing software, the idea was that the vast majority of people’s video editing needs are pretty basic and straight-forward and we could provide these features with a free editor available on the Web. Since launch, hundreds of thousands of videos have been published using the YouTube Video Editor and we’ve regularly pushed out new feature enhancements to the product, including:

Video transitions (crossfade, wipe, slide)

The ability to save projects across sessions

Increased clips allowed in the editor from 6 to 17

Video rotation (from portrait to landscape and vice versa – great for videos shot on mobile)

While many of these are familiar features also available on desktop software, today, we’re excited to unveil two new features that the team has been working on over the last couple of months that take unique advantage of the cloud:

Stabilizer

Ever shoot a shaky video that’s so jittery, it’s actually hard to watch? Professional cinematographers use stabilization equipment such as tripods or camera dollies to keep their shots smooth and steady. Our team mimicked these cinematographic principles by automatically determining the best camera path for you through a unified optimization technique. In plain English, you can smooth some of those unsteady videos with the click of a button. We also wanted you to be able to preview these results in real-time, before publishing the finished product to the Web. We can do this by harnessing the power of the cloud by splitting the computation required for stabilizing the video into chunks and distributed them across different servers. This allows us to use the power of many machines in parallel, computing and streaming the stabilized results quickly into the preview. You can check out the paper we’re publishing entitled “Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths.” Want to see stabilizer in action? You can test it out for yourself, or check out these two videos. The first is without stabilizer.