Google’s AI watches YouTube clips to learn about human behavior

Google has curated a set of YouTube clips to help machines learn how humans exist in the world. The AVAs, or “atomic visual actions,” are three-second clips of people doing everyday things like drinking water, taking a photo, playing an instrument, hugging, standing or cooking.

Each clip labels the person the AI should focus on, along with a description of their pose and whether they’re interacting with an object or another human.

“Despite exciting breakthroughs made over the past years in classifying and finding objects in images, recognizing human actions still remains a big challenge,” Google wrote in a recent blog post describing the new dataset. “This is due to the fact that actions are, by nature, less well-defined than objects in videos.”

The catalog of 57,600 clips only highlights 80 actions but labels more than 96,000 humans. Google pulled clips from popular movies, emphasizing that they drew from a “variety of genres and countries of origin.”

If there are two people in a clip, each person is labeled separately, so a machine can learn that two people are needed to shake hands with one another or that sometimes humans kiss when they also hug.

The technology will help Google analyze years and years of video and could also help advertisers better target consumers, based on the actions a person is more likely to watch.

Ultimately, the goal is to teach computers social visual intelligence, according to an accompanying research paper which describes it as, “understanding what humans are doing, what might they do next and what they are trying to achieve.”