In the researchers' experiments, the three programs' error rates in determining the gender of light-skinned men were never worse than 0.8 percent.

The findings raise questions about how today's neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.

Chance discoveries The three programs that Buolamwini and Gebru investigated were general-purpose facial-analysis systems, which could be used to match faces in different photos as well as to assess characteristics such as gender, age, and mood.

Several years ago, as a graduate student at the Media Lab, Buolamwini was working on a system she called Upbeat Walls, an interactive, multimedia art installation that allowed users to control colorful patterns projected on a reflective surface by moving their heads.

The team that Buolamwini assembled to work on the project was ethnically diverse, but the researchers found that, when it came time to present the device in public, they had to rely on one of the lighter-skinned team members to demonstrate it.

Quantitative standards To begin investigating the programs' biases systematically, Buolamwini first assembled a set of images in which women and people with dark skin are much better-represented than they are in the data sets typically used to evaluate face-analysis systems.

Next, she worked with a dermatologic surgeon to code the images according to the Fitzpatrick scale of skin tones, a six-point scale, from light to dark, originally developed by dermatologists as a means of assessing risk of sunburn.

Three commercially released facial-analysis programs from major technology companies demonstrate both skin-type and gender biases, according to a new paper researchers from MIT and Stanford University will present later this month at the Conference on Fairness, Accountability, and Transparency.

The findings raise questions about how today’s neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.

All three systems treated gender classification as a binary decision — male or female — which made their performance on that task particularly easy to assess statistically.

Several years ago, as a graduate student at the Media Lab, Buolamwini was working on a system she called Upbeat Walls, an interactive, multimedia art installation that allowed users to control colorful patterns projected on a reflective surface by moving their heads.

The team that Buolamwini assembled to work on the project was ethnically diverse, but the researchers found that, when it came time to present the device in public, they had to rely on one of the lighter-skinned team members to demonstrate it.

Quantitative standards To begin investigating the programs’ biases systematically, Buolamwini first assembled a set of images in which women and people with dark skin are much better-represented than they are in the data sets typically used to evaluate face-analysis systems.

Next, she worked with a dermatologic surgeon to code the images according to the Fitzpatrick scale of skin tones, a six-point scale, from light to dark, originally developed by dermatologists as a means of assessing risk of sunburn.

is that our benchmarks, the standards by which we measure success, themselves can give us a false sense of progress.” “This is an area where the data sets have a large influence on what happens to the model,” says Ruchir Puri, chief architect of IBM’s Watson artificial-intelligence system.

Facial Recognition Is Accurate, if You’re a White Guy

So she turned her attention to fighting the bias built into digital technology.

Now 28 and a doctoral student, after studying as a Rhodes scholar and a Fulbright fellow, she is an advocate in the new field of “algorithmic accountability,” which seeks to make automated decisions more transparent, explainable and fair.

Buolamwini studied the performance of three leading face recognition systems — by Microsoft, IBM and Megvii of China — by classifying how well they could guess the gender of people with different skin tones.

These companies were selected because they offered gender classification features in their facial analysis software — and their code was publicly available for testing.

AI facial analysis demonstrates both racial and gender bias

In order to test these systems, MIT researcher Joy Buolamwini collected over 1,200 images that contained a greater proportion of women and people of color and coded skin color based on the Fitzpatrick scale of skin tones, in consultation with a dermatologic surgeon.

Even with that knowledge, these figures are staggering, and it's important that companies who work on this kind of software take into account the breadth of diversity that exists in their user base, rather than limiting themselves to the white men that often dominate their workforces.

New research out of MIT’s Media Lab is underscoring what other experts have reported or at least suspected before: facial recognition technology is subject to biases based on the data sets provided and the conditions in which algorithms are created.

Puri wrote that IBM conducted its own facial recognition study using the faces of parliamentarians, and while he acknowledges that IBM’s data set and methodology were slightly different, says “the error rates of IBM’s upcoming visual recognition service are significantly lower than those of the three systems presented in the paper.” Still, it’s hardly the first time that facial recognition technology has been proven inaccurate.

Two years ago, The Atlantic reported on how facial recognition technology used for law enforcement purposes may “disproportionately implicate African Americans.” It’s one of the larger concerns around this still-emerging technology – that innocent people could become suspects in crimes because of inaccurate tech – and something that Buolamwini and Gebru also cover in their paper, citing a year-long investigation across 100 police departments that revealed “African-American individuals are more likely to be stopped by law enforcement and be subjected to face recognition searches than individuals of other ethnicities.” And, as The Atlantic story points out, other groups have found in the past that facial recognition algorithms developed in Asia were more likely to accurately identify Asian people than white people;

Photo Algorithms ID White Men Fine&#8212;Black Women, Not So Much

Facial recognition is becoming more pervasive in consumer products and law enforcement, backed by increasingly powerful machine-learning technology.

The companies’ algorithms proved near perfect at identifying the gender of men with lighter skin, but frequently erred when analyzing images of women with dark skin.

Google’s photo-organizing service still censors the search terms “gorilla” and “monkey” after an incident nearly three years ago in which algorithms tagged black people as gorillas, for example.

The question of how to ensure machine-learning systems deployed in consumer products, commercial systems, and government programs has become a major topic of discussion in the field of AI.

2016 report from Georgetown described wide, largely unregulated deployment of facial recognition by the FBI, as well local and state police forces, and evidence the systems in use were less accurate for African-Americans.

An IBM white paper says tests using that new dataset found the improved gender-detection service has an error rate of 3.5 percent on darker female faces.

Microsoft, IBM, Google, and Amazon pitch cloud services for tasks like parsing the meaning of images or text as a way for industries such as sports, healthcare, and manufacturing to tap artificial intelligence capabilities previously limited to tech companies.

At one point the glasses say “I think it’s a man jumping in the air doing a trick on a skateboard” when a young white man zips past.

Technical documentation for Microsoft’s service says that gender detection, along with other attributes it reports for faces such as emotion and age, are “still experimental and may not be very accurate.” DJ Patil, chief data scientist for the United States under President Obama, says the study’s findings highlight the need for tech companies to ensure their machine-learning systems work equally well for all types of people.

“We need that transparency of this is where it works, this is where it doesn’t.” Buolamwini and Gebru’s paper argues that only disclosing a suite of accuracy numbers for different groups of people can truly give users a sense of the capabilities of image processing software used to scrutinize people.

On Monday, January 21, 2019

Gender Shades

The Gender Shades Project pilots an intersectional approach to inclusive product testing for AI. Gender Shades is a preliminary excavation of inadvertent ...

Fairness in Machine Learning

Machine learning is increasingly being adopted by various domains: governments, credit, recruiting, advertising, and many others. Fairness and equality are ...