Making Smart Machines Fair

“IS THIS AN UMBRELLA OR A STRAWBERRY?” Posed to the undergraduates filling the lecture hall in the computer science building, the question topping assistant professor Olga Russakovsky’s lecture slide is laughably simple — but the question isn’t for the humans in the room. The class is COS 429, “Computer Vision,” and the question is being asked of a computer.

For computers — and the scientists and engineers working to make them see the world — the task of telling fruit from rain gear is beyond merely difficult. It has taken decades for researchers to bring tasks that previously were the domain only of living brains — things like speech recognition, visual perception, and complex decision-making — within reach of computers. These and other applications of artificial intelligence (AI) promise to remake our world, from medicine and transportation to security and criminal justice.

Such rapid advances come thanks to machine learning, a wide variety of methods that computers use to “learn” and apply knowledge that isn’t directly programmed into them. Rather than giving the computer a system of explicit rules — umbrellas are this shape when open and that shape when closed; strawberries are these shades of red — machine-learning systems learn from “experiences,” like observing a collection of example images, discovering new rules, and making associations on the fly.

By figuring out the rules for themselves, learning machines can make inferences and decisions more quickly and accurately than humans. When Sebastian Thrun, a researcher at Google X, wanted to teach computers to recognize melanoma — skin cancer — for example, he didn’t try to program in the rules about size, shape, and color that medical students learn. Instead, he gave the machine-learning system a “training set” of 130,000 images of skin lesions that had already been classified by dermatologists, and let the machine learn its own rules. When put to the test, Thrun’s program outperformed the doctors.

Machine learning even offers insights when nobody knows the rules: Princeton professor Olga Troyanskaya, at the Lewis-Sigler Institute for Integrative Genomics, recently used a machine-learning system to create a list of genes likely to be associated with autism. The system was given no rules for how autism genes should look or act — just a map of genetic interactions in the brain, and a list of genes that were already implicated in autism and other disorders. The list Troyanskaya’s machine generated gives autism researchers a new set of genes to study.

Machine learning already plays a large role in human society, from social media to fraud detection, and researchers expect its role to continue expanding in the next few years. But just as people make decisions based on misperceptions, prejudice, and faulty information, the vast quantities of data that “teach” machine-learning systems can be messy, full of gaps and human biases. Machines are no better than the data they learn from.

THIS IS WHERE RUSSAKOVSKY COMES IN. Russakovsky, who joined Princeton’s faculty last summer, is among the country’s leading young scholars of computer vision — the computer science field that deals with enabling computers to “see” and process images as human vision does. She was named to MIT Technology Review’s 2017 list of “35 Innovators Under 35” and is relentlessly optimistic about the ways AI will improve the human condition.

Russakovsky seeks to advance what she calls “humanistic AI, or AI for social good,” by designing these systems explicitly to avoid human mistakes and solve human problems. It’s a goal shaped in part by her experience as a woman in a profession dominated by men, which focused her awareness on how human blind spots and biases affect machine-learning systems.

She made her first splash in the field as a Ph.D. student at Stanford in the lab of Fei-Fei Li ’99, who leads the AI and machine-learning development projects at Google Cloud. There, Russakovsky was researching the problems of object detection and image classification — sorting through photographs for images of cars, or cows, and, yes, strawberries. She pioneered an algorithm for computer-vision systems to separate the object of interest in an image from the background, much as a human might, making it easier to classify the relevant object in the front.

But machine learning isn’t just about the machines, says Russakovsky. The systems are inextricable from the humans who design, train, and use them. Rather than thinking just about the computerized eye, she sees her role as building “collaborative human-machine object-detection systems.”

Take the training sets that many AI systems depend on — full of example objects, faces, or human actions that the machine is trying to learn about. For computer vision, these sets are usually manually labeled, which means tedious work sorting through tens of thousands of images and checking each one for hundreds of different objects. For large-scale data sets, whose labeling is often done by overworked students, the process becomes time-consuming and expensive. Worse, since bored humans labeling images make mistakes, the labels are not always accurate.

Stanford professor and Google Cloud chief scientist Fei-Fei Li ’99 gives a TED talk in March 2015. Li was a mentor to Olga Russakovsky and shares Russakovsky’s interest in encouraging young women to study computer science.

Photo: Bret Hartman/TED

Working with Li and then-Stanford postdoc Jia Deng *12, Russakovsky pioneered a way to ease the burden on human annotators by asking fewer and more general questions about the images. Instead of asking individually whether there were any chairs, desks, tables, ottomans, or hat racks, the system would ask if there was any furniture — and move on to other categories of objects if the answer was “no.” This made the process of labeling images much faster and possible to outsource, meaning computer systems could learn from more data. Using this strategy and other enhanced techniques, Russakovsky and her colleagues built a collection of millions of images of thousands of objects, which they called ImageNet. The ImageNet pictures are now a widely used standard in computer vision.

Other researchers, meanwhile, were uncovering how AI was affected by another human flaw: prejudice. In 2015, Google found itself at the center of a cultural maelstrom after its new photo application labeled a selfie taken by a black couple as “gorillas.” (The problem was corrected within hours.) At the time, just 2 percent of Google employees were African American, and some experts said having a more inclusive team likely would have ensured that the application was trained to recognize the full diversity of human faces. “Our own human bias informs what questions we ask about AI,” says Russakovsky.

In the spring of 2017, a team of Princeton researchers showed one way that bias in human data can directly create bias in machine-learning systems. In a paper published in Science, assistant professor Arvind Narayanan, visiting professor Joanna Bryson, and postdoc Aylin Caliskan showed that common machine-learning programs trained with ordinary human language can pick up the cultural biases in that language — including blatantly discriminatory views on race and gender. Ask the programs to fill in the blank “Man is to doctor as woman is to ... ?” and they respond, “nurse.”

Other scholars have shown that a widely used image collection supported by Microsoft and Facebook linked pictures of activities such as shopping and washing to women and images of coaching and shooting to men. Machine-learning programs that were trained on those images, researchers found, learned those associations.

Narayanan compares the way computer systems perpetuate stereotypes to the way the media do. Think about any piece of art or entertainment with a character who’s identifiably a woman, or black, or old. “The media tries to mirror the world to get a feeling of authenticity, but in mirroring the world it’s also perpetuating those stereotypes,” says Narayanan. It’s the same with computers — and, as researchers have found, stereotypes can lead to insidious errors.

As AI’s impact on people’s lives grows, such errors will become more relevant and concerning. In May, Facebook announced that it was developing software called “Fairness Flow” to look for bias in the AI systems that prioritize news stories and filter offensive content —– recognizing that these algorithms affect how nearly 2 billion people see the world around them. And AI is having increasing influence on our offline world, as well. “There’s a bunch of interesting questions around uses of AI and machine learning in the criminal-justice system,” says Professor Ed Felten, director of Princeton’s Center for Information Technology Policy, who cites controversial risk-assessment algorithms used by some states in making sentencing and probation decisions.

While workforce diversity helps address the problem, Russakovsky and Narayanan recognize that it’s not sufficient. In February, they received a grant from the School of Engineering and Applied Science to develop best practices for finding and correcting bias in machine-vision systems, combining Russakovsky’s technical specialty with Narayanan’s experience in studying digital privacy and prejudice. A first step for the professors is to measure the cultural bias in the standard data sets that many researchers rely on to train their systems. From there, they will move to the question of how to build data sets and algorithms without that bias. “We can ask how to mitigate bias; we can ask how to have human oversight over these systems,” says Narayanan. “Does a visual corpus even represent the world? Can you create a more representative corpus?”

The project is in its infancy, but Russakovsky is confident that such bias can be measured and migrated in machines, ultimately allowing the machines to be better, and fairer, than humans. “It’s very difficult to reveal bias in humans, [and] very difficult to convince humans that they are biased. Human bias is the result of years of cultural exposure and is extremely difficult to undo for all sorts of reasons,” she says.

RUSSAKOVSKY APPROACHES THE BIAS problem not only from a technical perspective, but also from a personal one. As a woman in a male-dominated field, Russakovsky has had firsthand experience with bias. She describes her first years as a Stanford graduate student — before joining Li’s lab — as alienating. The only female Ph.D. student in that first lab, she says she was doubted by colleagues and plagued by “imposter syndrome” — the false sense that others were qualified but she was not.

Four years into her Ph.D., on the brink of quitting computer science, she moved to Li’s AI lab and found a mentor and role model. Her confidence grew, and she began looking for ways to help herself and other women, helping to start a workshop for women working in computer vision. Near the end of her time at Stanford, looking for a way to pay forward the mentorship she had received, Russakovsky floated to Li the idea of creating a camp to teach girls — particularly lower-income and minority students — about designing AI systems for the social good.

Together, in 2015 the two women launched a two-week summer camp for rising high-school sophomores, coaching the girls to develop AI programs to solve human problems. They called the camp SAILORS: Stanford Artificial Intelligence Laboratory OutReach Summer, and in a survey afterward, about 90 percent of campers said they felt like they had a mentor or role model in computer science.

In 2017, the camp — now known as AI4ALL — expanded to the University of California, Berkeley. This summer, programs will take place at four more schools, including Princeton. The Stanford program is all girls, the Berkeley program is aimed at low-income students, and the Princeton program has encouraged applications from underrepresented communities: black, Latino, and Native American students. Princeton’s program will focus on the intersection between AI and policy, touching on issues directly relevant to the marginalized communities its students come from.

Felten, the co-director of the Princeton camp, hopes that solving problems like these will inspire the students to design AI systems that don’t perpetuate bias and that address social problems. “You’re looking at things that connect to their lives, and what’s happening in their communities,” he says. He hopes the camp will get students to think about the effects of AI on realms like education and economic equality: “That kind of synergy between the technical stuff ... and the policy stuff will be, I think, one of the signs that we’re succeeding,” he says.

Zoe Ashwood, a Princeton Ph.D. student who will be one of the program’s instructors, says she was inspired to sign on after visiting last year’s camp at Stanford and seeing the students’ work: One team had developed a tool using machine learning to turn tweets after a natural disaster into useful information for first responders; another designed a machine-vision system that could tell if doctors were sanitizing their hands before entering hospital rooms.

For Russakovsky, every success by AI4ALL’s alumni is proof that diverse teams can build more reliable and “more humane” AI based on less biased data, applied to a wider range of problems, and implemented in more useful ways. “Forget political correctness,” she says, “diversity brings better outcomes. When people believe this, we’ve won the battle.”