Google's Go-Playing Machine Opens the Door to Robots that Learn

Share

Google's Go-Playing Machine Opens the Door to Robots that Learn

Two robotic arms face two closed doors. Both reach forward and miss the door handles entirely. So they reach again, and this time, they hit the handles head-on, rattling the door frames. So they try again. And again. Finally, they grab the handles cleanly and pull the doors open, and after a few more hours of trial and error, they can repeat the trick every time.

Businesses are already replacing human workers with robots. Now researchers are building self-learning machines that can do so much more.

Beyond a few videos and two matter-of-fact blog posts, Google declines to discuss this research—an effort overseen by University of California, Berkeley roboticist Sergey Levine—and certainly, the project is still in the early stages. But it represents a much broader movement toward machines that can learn to do things on their own, rather than obeying the pre-ordained programming of human engineers.

"What we're interested in are robots that interact with humans," says Ronnie Vuine, who founded the robotics startup Micropsi alongside Harvard cognitive scientist Joscha Bach. "Imagine a robot taking a piece of work and handing it to a human hand—or taking a piece of work from a human hand. Today, you can't do that."

Trial and Error

Reinforcement learning is an old technology that had its coming-out party about two years ago when DeepMind, the London artificial intelligence lab owned by Google, used the technique to build systems that could play old Atari games with super-human skill. In Breakout—the game where you knock down a wall of bricks using a paddle and a bouncing ball—DeepMind's AI learned to hit the ball behind the wall, knocking down brick after brick with a speedy economy that didn't seem possible.

Then the lab applied much the same technique to Go, cracking the ancient game ten years ahead of schedule. DeepMind founder Demis Hassabis and his team fed about 30 million Go moves into a deep neural network—a pattern recognition system that can learn tasks by analyzing vast amounts of data. Once the system learned the game, it reached even higher levels of skill by playing against itself, over and over and over again.

Reinforcement learning is particularly well suited to games. The technique is driven by a "reward function," a system that tracks which actions bring reward and which don't. In games, the reward is obvious: more points. But the same technique works with other types of software as well as in the physical world, places where the reward function is sometimes less obvious—and sometimes more. For Google's robots, the reward is opening the door.

A New Universe

Of course, opening a door is just one small part of navigating the world. The larger goal gets very complicated, very quickly—not to mention very expensive. That's why many other researchers are using digital simulations to explore reinforcement learning before moving into the physical world, hoping to bridge the gap between games and robotics.

Take OpenAI, the billion-dollar artificial intelligence lab bootstrapped by Elon Musk. It's building a sweeping software platform called Universe where AI "agents" can use reinforcement learning to master computer applications of all kinds, from games to web browsers. In theory, this could help build agents that operate in the real world as well. If you can teach an AI to play a driving game, the thinking goes, you can teach it to drive.

Prowler.io is a Cambridge, England startup moving down the same path. Today, this small team of researchers is building agents that can learn to navigate massively multiplayer games—virtual worlds. But as time goes on, they plan on extending this virtual work to robots and autonomous vehicles in the real world. Today, this is not how driverless cars operate. They make decisions based on an enormous set of rules programmed by engineers, which is a long way from real autonomy. Prowler founder and CEO Vishal Chatrath, who sold his previous AI company to Apple, argues that reinforcement learning and related techniques are essential to building truly autonomous vehicles—cars that can do everything a human driver can do.

In Berlin, Micropsi is already pushing these techniques into physical systems, much like Google. Founded in 2014 with an eye on building robots for manufacturing and other industrial purposes, the company began by building robotic simulations it could train via reinforcement learning. A video on the company's website shows off a system where a virtual robot arm learns to balance a virtual pole on the end of its virtual finger. The system simulates gravity and the movement of the robot, and a reward function tracks whether the pole falls or stays up. "We give the robot a cookie for every second it keeps the pole up," Vuine says. "And if it falls, you punish it." Now, the company is applying these same techniques to a physical machine called the Universal Robot.

The Problem with Reality

The trouble is that the physical world requires new techniques too. Vuine claims his company can solve any robotic problem inside a computer simulation, but simulations aren't the real thing. "If you do it in simulation, you haven't done the half of it," he admits. "It's hard to simulate contact physics." In other words, you can use simulations to build a robot that can balance a poll, but teaching it to push a plug into an outlet requires real plugs and real outlets.

And pushing a plug into an outlet is one of the easier problems—just because there's an obvious and simple reward. Most behavior is harder to rate. As you string tasks together, those systems of reward get enormously complex. Carnegie Mellon researcher Abhinav Gupta, who is exploring similar technology with funding from Google, questions how useful reinforcement learning can be in the short term. He and his team are exploring a different set of techniques based on convolutional neural networks, a machine-learning technique widely used in image recognition, and these methods collect much larger amounts of data.

Chatrath believes that at least for right now, the best way to explore AI grounded in the physical world is through toys—small and simple machines. The idea is that as systems learn to use simple machines, they can apply what they learn to more complex machines. What's clear is that robots don't just have one way to learn. They have many. And inside so many organizations, they're getting started.

Related Video

Business

What the AI Behind AlphaGo Teaches Us About Humanity

When Google's AI beat the world's Go champion 4-1, it stirred a certain sadness in many people. But the reality is the technologies at the heart of AlphaGo are the future. So it's a time to be excited not scared.