Researchers at Musk-backed OpenAI propose a new way to tell if an AI is making the right decisions, and they released an online game to demonstrate the theory.

Modern AI systems can recognize faces, drive cars and understand human speech, but experts often say it can sometimes be difficult to understand how they’re actually making decisions.

advertisement

advertisement

And if their virtual thought processes have hidden logic errors or blind spots, that can lead to serious safety or other concerns, even if their decisions seem to mostly come out right. For instance, recent research from Stanford and MIT found that mainstream face recognition and analysis algorithms often performed worse on pictures of people who aren’t white and male, likely because they were trained on skewed datasets.

And researcher Janelle Shane, who runs the website AIWeirdness, recently pointed out the propensity that Microsoft Azure’s computer vision API has for spotting sheep in photographs of grassy fields, even where there are no animals at all, presumably because so many of the images of fields it has seen before were dotted with sheep. When it comes to mistakes like these, the stakes can be deadly. According to a report by the Information, an Uber vehicle’s self-driving software failed to register a pedestrian as a human before it struck and killed her in March.

Ultimately, AI systems are only useful and safe as long as the goals they’ve learned actually mesh with what humans want them to do, and it can often be hard to know if they’ve subtly learned to solve the wrong problems or make bad decisions in certain conditions.

To make AI easier for humans to understand and trust, researchers at the nonprofit research organization OpenAI have proposed training algorithms to not only classify data or make decisions, but to justify their decisions in debates with other AI programs in front of a human or AI judge.

“Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information,” write OpenAI researchers Geoffrey Irving, Paul Christiano and Dario Amodei in a new research paper. The San Francisco-based AI lab is funded by Silicon Valley luminaries including Y Combinator President Sam Altman and Tesla CEO Elon Musk, with a goal of building safe, useful AI to benefit humanity.

Since human time is valuable and usually limited, the researchers say the AI systems can effectively train themselves in part by debating in front of an AI judge designed to mimic human decision making, similar to how software that plays games like Go or chess often trains in part by playing against itself.

advertisement

In an experiment described in their paper, the researchers set up a debate where two software agents work with a standard set of handwritten numerals, attempting to convince an automated judge that a particular image is one digit rather than another digit, by taking turns revealing one pixel of the digit at a time. One bot is programmed to tell the truth, while another is programmed to lie about what number is in the image, and they reveal pixels to support their contentions that the digit is, say, a five rather than a six.

Microsoft’s computer vision API incorrectly determined this image contains sheep [Image: courtesy Janelle Shane / aiweirdness.com]The truth-telling bots tend to reveal pixels from distinctive parts of the digit, like the horizontal line at the top of the numeral “5,” while the lying bots, in an attempt to deceive the judge, point out what amount to the most ambiguous areas, like the curve at the bottom of both a “5” and a “6.” The judge ultimately “guesses” which bot is telling the truth based on the pixels that have been revealed.

The image classification task, where most of the image is invisible to the judge, is a sort of stand-in for complex problems where it wouldn’t be possible for a human judge to analyze the entire dataset to judge bot performance. The judge would have to rely on the facets of the data highlighted by debating robots, the researchers say.

“The goal here is to model situations where we have something that’s beyond human scale,” says Irving, a member of the AI safety team at OpenAI. “The best we can do there is replace something a human couldn’t possibly do with something a human can’t do because they’re not seeing an image.”

The researchers use image sampling, rather than natural language debates, in their early experiments because the ability of AI systems to carry on a spirited, coherent verbal argument is still a challenge. “Text-based debates would be possible today,” says Amodei. “But they won’t be very good.”

In the image test, a judge trained to identify numbers after seeing arbitrary sets of six pixels can figure out which bot is telling the truth 88.9% of the time after each bot has the chance to reveal three pixels of its choice. By comparison, it can guess accurately 59.4% of the time after seeing six random pixels from the images.

advertisement

That is, even with the debaters able to see the whole image while the judge can only see disjointed fractions of it, the debate still results in correct answers, since it’s easier for the truth-telling bot to find evidence to support its position.

“Demagogue Algorithms”

To test their hypothesis—that two debaters can lead to honest behavior even if the debaters know much more than the judge—the researchers have also devised an interactive demonstration of their approach, played entirely by humans and now available online. In the game, two human players are shown an image of either a dog or a cat and argue before a judge as to which species is represented. The contestants are allowed to highlight rectangular sections of the image to make their arguments—pointing out, for instance, a dog’s ears or cat’s paws—but the judge can “see” only the shapes and positions of the rectangles, not the actual image. While the honest player is required to tell the truth about what animal is shown, he or she is allowed to tell other lies in the course of the debate. “It is an interesting question whether lies by the honest player are useful,” the researchers write.

OpenAI found that an adversarial game between two human debaters can produce honest behavior, even if they know much more than the judge [Image: OpenAI]The judge is also able to pose questions, and each player is also allowed to reveal one pixel of the image over the course of the entire debate.

“We’ve played this game informally at OpenAI, and the honest agent indeed tends to win, though to make it fair to the liar we usually limit the rate at which the judge can solicit information (it’s cognitively difficult to construct a detailed lie),” the researchers write.

One open question is whether training and deploying AI systems that can formulate debates as well as make decisions will take significantly more computing power than ones that just take action, something that will have to be determined through future experiments, they say. “We hope the performance penalty is small,” says Amodei.

The researchers emphasize that it’s still early days, and the debate-based method still requires plenty of testing before AI developers will know exactly when it’s an effective strategy or how best to implement it. For instance, they may find that it may be better to use single judges or a panel of voting judges, or that some people are better equipped to judge certain debates.

advertisement

It also remains to be seen whether humans will be accurate judges of sophisticated robots working on more sophisticated problems. People might be biased to rule in a certain way based on their own beliefs, and there could be problems that are hard to reduce enough to have a simple debate about, like the soundness of a mathematical proof, the researchers write.

Other less subtle errors may be easier to spot, like the sheep that Shane noticed had been erroneously labeled by Microsoft’s algorithms. “The agent would claim there’s sheep and point to the nonexistent sheep, and the human would say no,” Irving writes in an email to Fast Company.

But deceitful bots might also learn to appeal to human judges in sophisticated ways that don’t involve offering rigorous arguments, Shane suggested. “I wonder if we’d get kind of demagogue algorithms that would learn to exploit human emotions to argue their point,” she says.

Even before AI systems are advanced enough to hold sophisticated verbal beliefs, some of those questions can still be tested by having humans argue before a less knowledgeable judge. “We are optimistic that we can learn a great deal about these issues by conducting debates between humans, in domains where experts have much more time than the judge, have access to a large amount of external information, or have expertise that the judge lacks,” they write.

“I think it’s going to be a really tough thing to do,” Shane said about OpenAI’s debate method, “but I want to see them try it because I think it would be really interesting and probably really entertaining.”