The hard tech behind Google’s simple Clips camera

Maybe the biggest surprise of Google’s hardware event today was the launch ofClips, a small stand-alone AI-driven camera that can capture up to three hours of video and images and then automatically select the best moments for you. I’m not sure how well Clips will do in the marketplace, but technically, it’s a fascinating product.

During my conversation with Clips product lead Juston Payne, he repeatedly stressed that Clips is not an accessory to the Pixel — or anything else, really. “It’s an accessory to anything, I’d say. It’s a stand-alone camera. A new type of camera and insofar as that any digital camera has become an accessory to a computer or a phone, so too with this,” he said. “The reason for that comes back to the fact that the intelligence is built into the device to decide when to take these shots, which is really important because it gives users total control over it.”

So unlike a product like Google Home, which fully relies on being connected to the cloud, Clips is pretty much a self-contained unit. It takes your images (probably while you set it down in your living room while you play with your kids), runs its pre-trained machine learning algorithms to find the best ones and then automatically generates your clips and picks your best images for you.

This means it just works, no matter whether you are an iOS or Android user (though it comes with an app that lets you see the clips on the device and share them). And the device reflects this simplicity, with its one button (for manually starting recording) and straightforward design.

“We care very deeply about privacy and control and it was one of the hardest parts of the whole project,” Payne told me. “The thing is that until really quite recently, you needed at least a desktop or you needed literally a server farm to take imagery in, run convolutional neural networks against them, do semantic analysis and then spit something out.”

Only recently has silicon evolved to the point where a company like Google can put all of this into a small device like Clips. Indeed, when you hold Clips, it’s surprisingly small (and disappointingly, it doesn’t feature a built-in clip, though you can put it into a little plastic housing that features a clip). Most of the weight is probably the battery, which should last about three hours, and the camera unit itself, which features a pretty wide-angle view.

To run its models on the camera, Google went to Intel’s Movidius and its extremely low-power vision processing unit (VPU).

“In our collaboration with the Clips team, it has been remarkable to see how much intelligence Google has been able to put right into a small device like Clips,” said Remi El-Ouazzane, vice president and general manager of Movidius, Intel New Technology Group, in his company’s own announcement today. “This intelligent camera truly represents the level of onboard intelligence we dreamed of when developing our Myriad VPU technology.”

Every AI model needs to be trained, though, and to train Clips, Google actually worked with video editors and an army of image raters to train its models. “There’s not a great ML [machine learning] model that can say: there’s a baby crawling on the floor, that probably looks good,” explained Payne. So Google collected a lot of its own video. It then had editors on staff look at the content and say what they liked — and then the labelers looked at the clips and decided which ones they liked better, which became the training material for the model.

Over time, the unit learns who the people are you care about and what images you are interested in.

But there’s a drawback here, too. For now, Clips is great for finding images of people and pets (or really, cats and dogs — not pet pigs). It’s not a device you can take on a vacation and expect it to find the best images for you. Over time, Google plans to expand the machine learning model on the device to include support for more situations, but right now, it’s basically probably best as a device for young families. “We’re starting with a focus and then we’ll build out from there,” explained Payne. “Right now, it doesn’t understand the world in general.”

Over time, Clips will understand more of the world. At $249, it’s definitely an expensive device, though I wouldn’t be surprised if Clips caught on and made regular appearances on baby shower registries.