Diversity Crisis in AI, 2017 edition

Written: 16 Aug 2017 by Rachel Thomas

Deep learning has great potential, but currently the people using this technology are overwhelmingly white and male. We’re already seeing society’s racial and gender biases being encoded into software that uses AI when built by such a homogeneous group. Additionally, people can’t address problems that they’re not aware of, and with more diverse practitioners, a wider variety of important societal problems will be tackled.

We want to get deep learning into the hands of as many people as possible, from as many diverse backgrounds as possible. People with different backgrounds have different problems they’re interested in solving. The traditional approach is to start with an AI expert and then give them a problem to work on; at fast.ai we want people who are knowledgeable and passionate about the problems they are working on, and we’ll teach them the deep learning needed to address them.

Deep Learning can be misused

Deep learning isn’t “more biased” than simpler models such as regression; however, the amazing effectiveness of deep learning suggests that it will be used in far more applications. As a society, we risk encoding our existing gender and racial biases into algorithms that determine medical care, employment decisions, criminal justice decisions, and more. This is already happening with simple models, but the widespread adoption of deep learning will rapidly accelerate this trend. The next 5 to 10 years are a particularly crucial time. We must get more women and people of Color building this technology in order to recognize, prevent, or address these baises.

Earlier this year, Taser (now rebranded Axon), the maker of the electronic stun guns, acquired two AI companies. Taser/Axon owns 80% of the police body camera market in the US, keeps this footage from police body cams in private databases, and is now advertising that they are developing technology for “predictive policing”. As a private company they are not subject to the same public records laws or oversight that police departments are. Given that racial bias in policing has been well-documented and shown to create negative feedback loops, this is terrifying. What kind of biases may be in their datasets or algorithms?

Google’s popular Word2Vec language library (covered in Lesson 5 of our course and in a workshop I gave this summer) has learned meaningful analogies, such as man is to king as women is to queen. However, it also creates sexist analogies such as man is to computer programmer as woman is to homemaker. This is concerning as Word2Vec has become a commonly used building block in a wide variety of applications. This is not the first (or even second) time Google’s use of deep learning has showed troubling biases. In 2015, Google Photos labeled Black people as “gorillas” while automatically labeling photos. Google Translate continues to provide sexist translations such as translating “O bir doktor. O bir hemşire” to “He is a doctor. She is a nurse” even though the original Turkish did not specify gender.

The state of diversity in AI

A year after prominent Google AI leader Jeff Dean said he is deeply worried about the lack of diversity in AI, guess what the diversity stats of the Google Brain team is? It is ~94% male with 44 men and just 3 women and over 70% White. OpenAI’s openness does not extend to sharing diversity stats or who works there, and from photos, the OpenAI team looks extremely homogenous. I’d guess that it’s even less diverse than Google Brain. Earlier this year Vanity Fair ran an article about AI that featured 60 men, without quoting a single woman that works in AI.

Google Brain, OpenAI, and the media can’t solely blame the pipeline for this lack of diversity, given that there are over 1,000 women active in machine learning. Furthermore, Google has a training program to bring engineers in other areas up to speed on AI, which could be a great way to increase diversity. However, this program is only available to Google engineers, and just 3% of Google’s technical employees are Black or Latino (despite the fact that 90,000 Black and Latino students have graduated with computer science majors in the US in the last decade); thus, this training program is not going to have much impact on diversity.