Race and Ethnicity in Data Science - The Data Science Diversity Gap

How diverse will a lucrative, growing field like data science be in the future?

Will it end up like computer science today (not very diverse) or computer science a few decades ago (much more so)? One way to prognosticate the future demographic composition of data science is to look at who is studying data science and its prerequisite skills today. For data science, the results are not encouraging.

A recent article in Forbes notes, "Women hold only about 26% of data jobs in the United States. There are a few proposed reasons for the gender gap: a lack of STEM education for women early on in life, lack of mentorship for women in data science, and human resources rules and regulations not catching up to gender balance policies, to name a few."

Moreover, federal civil rights data further demonstrate that "black and Latino high school students are being shortchanged in their access to high-level math and science courses that could prepare them for college" and for careers in fields like data science.

Just how diverse is data science? More specifically, if we look at the study of data science as a predictor of future participation in the field, what is the gender and demographic breakdown down of its students compared to other fields?

We analyzed data from Priceonomics customer General Assembly, an education company that trains students in data science and other technical fields. We analyzed data from their part-time programs (which typically reach students who already have jobs and are looking to expand their skill set as they pursue a promotion or a career shift), here's what we found:

While great gender parity strides have been made in fields like web development and user experience (UX) design, data science - a relatively newer concentration - still has a ways to go.

Off all the technical education fields we studied, data science had the lowest representation of female students, at just 35.3%.

Additionally, among these same technical fields, data science had the lowest percentage of African American and Latino/Hispanic students enrolled.

Gender and Data Science

For our analysis, we went through five months' worth (September 2016 through January 2017) of anonymized enrollment data for part-time General Assembly students (those enrolled in 10- to 12-week evening courses). We chose to focus on part-time data (rather than the full-time program) because the sample size was bigger though the results would be similar.

First, let's take a look specifically at the gender breakdown of students in these courses.

Some courses, like Product Management and Data Analytics, seem to come close to gender parity. Front-End Web Development falls in right around the average across all courses, and in Digital Marketing and User Experience Design, both more consumer-facing fields, two-thirds or more students are women.

But the Data Science course shows the largest composition of male students - and the lowest of female students, at just 35.3%.

Race and Ethnicity in Data Science

Turning to the same anonymized data set, let's now look at race and ethnicity.

Across all courses, 85.4% of part-time these part-time students have a bachelor's degree or higher; in Data Science, that figure is 93.8%. This seems to largely be driven by the fact that there are far more master's and Ph.D. graduates in Data Science (37.7%) than the overall average (24.%). A surprisingly high 3.7% of students hold a Ph.D. - more than triple the average of 1.2%.

Data Science seems to draw from a smaller, more specialized pool, which could, in part, perpetuate diversity issues.

Data Science Is Still New

Female and minority students have made positive strides in coding and tech education in this data set.

When coding and web development started getting increasingly popular two decades ago, the fields were almost entirely dominated by men - most of whom were white.

Looking at the data here, though, it's clear things have changed dramatically: Front-End Web Development courses are now 57% female and boast the highest percentage of students of color of any course. Since data science is still a relatively new field, it is possible things may just take some time to equalize but it's entirely possible it won't unless the issue is addressed directly.