Women are underrepresented in tech. This realization is nothing new. Just look at numbers released by Facebook, Google, Intel, Slack, and many, many more. But the numbers might be even worse than these reports imply.

At a recent tech event, I overheard a side conversation about the lack of gender diversity in tech. The small group was discussing the fact that even though women make up about 30% of the workforce in tech, higher level engineering teams rarely have more than a few women.

One of the participants in this conversation commented that this was because male developers are just generally more talented than female developers. No one in the group objected.

Hmm…

From personal experience at Toptal and my university experience in engineering at Princeton, which was nearly 50/50 male vs. female, I know this is false. I’ve worked with a number of incredible, profoundly smart female engineers in all kinds of roles. Yet the numbers don’t seem to match my own experience, especially when you start looking at more senior engineering roles.

And addressing this disparity is important. It’s not just diversity for the sake of diversity. If men and women are equally intelligent, statistically speaking, then out of the smartest ten people in the world, five should be male and five should be female. Thus, if your team is anything less than an equal balance of men and women, then your team is probably not the best it can be.

If your team is anything less than an equal balance of men and women, then your team is probably not the best it can be.

In a perfect system, diversity is a probabilistic result. But these aren’t the results we’re seeing.

After overhearing this conversation, I wanted to take a look at numbers to better understand if/where software team building tendencies were going wrong. I searched Google for trends in the gender breakdown across skill levels in software engineering, but I wasn’t able to find much, so I decided to look at the publicly available data on GitHub. I scraped 5,000 profiles to get names, number of followers, number of contributions, and number of repositories. I then used the open source package genderize.io to figure out the gender of each profile.

There were so few women in this first batch that I had to add more data to make even simple graphs significant, so I scraped 15,000 more.

Open Source Is Dominated by Men

Even before getting into any further analysis, it was obvious that the percentage of women was extremely low. Of the 20,000 profiles, genderize.io was able to confidently determine the gender of 15,374. Of those, just 6.0% (926) were women. The disparity gets more severe once you start taking a look at user activity.

Let’s take 10 contributions as the cutoff for the difference between a user who has just created a profile and maybe experimented a bit and one who has at least delved into an open source project or started their own. The result: 5.4% women.

Just 5.4% of GitHub users with over 10 contributions from our random sample are female.

In fact, if we divide users into buckets according to their number of contributions (with a minimum of 1,000 users in each bucket), the percentage of female users tends to decrease as contributions go up.

Not only are there far fewer females on GitHub than tech industry gender diversity numbers might suggest, but it looks like the percentage of females decreases as user activity increases.

I kept digging, looking at gender across number of followers and number of repositories, and observed the same trend. This was especially clear when looking at the number of repositories:

Again, we see that the percentage of females decreases as we move to buckets with more repositories.

So what’s going on here? Is GitHub activity a reasonable indicator of programming expertise in the first place? (I think it is.) Are talented female engineers less likely to actively contribute to open source than their male counterparts? Are these results another indicator of the tech industry’s entry/retention problems when it comes to female engineers?

Why Are the Numbers in the Open Source Community So Low?

Numbers for women in the tech industry are already pretty bleak, but they’re even worse in open source projects.

A lot of previous research has focused on the reasons why women are not willing to embark in STEM-related subjects and careers. Some conclude a general lack of interest in STEM subjects. Others believe women decide against pursuing STEM careers after being stereotyped by family and teachers. Still others cite a lack of role models or a combination of multiple causes.

According to a study on gender in StackOverflow, “The issue of gender and STEM-related subjects has been studied for several years, and mostly from the point of view of ‘why’ women do not engage with scientific studies or careers. Lesser attention has so far been given to quantify the phenomenon and representation of women in online communities (as technology-‘users’), what are their levels of participation, and whether differences can be detected at the gender level. Only anecdotal evidence has been gathered on how specific communities actively discourage women from participating.”

But when we spend so much time focusing on why there are fewer women pursuing STEM-related subjects, we lose focus on another important disparity: if 28% of CS masters degrees go to women, why are the numbers in the open source community so much lower?

There are a few possibilities to consider when thinking about an answer to this question:

1. Maybe there isn’t a strong correlation between programming talent and GitHub activity.

In the tech industry, many developers go to GitHub early in their careers as it’s a prerequisite to be taken seriously. However, it seems that fewer aspiring female developers view open source this way. Is it possible that this data is all coincidental and does not mean much in relation to the number of talented female software engineers in the tech industry?

Both agreed that while being active on GitHub is typically a good indicator of engineering expertise, the reverse isn’t true, mentioning that they know plenty of great engineers who aren’t involved in open source at all. The tech industry agrees too, with many companies assessing GitHub profiles during hiring processes (although this practice seems to be quite biased, which isn’t really a surprise given the results of my study).

GitHub activity is generally a good indicator of engineering expertise, but the reverse isn’t true… Plenty of great engineers aren’t on GitHub.

Bozhidar suggested that open source contributors are often more likely to be the type of people who push for big internal changes in a company setting. Anna-Chiara commented that it takes a great deal of confidence to contribute to open source, something that she thought may be more difficult for female developers to overcome, given the tech industry’s poor history with welcoming women.

There are certainly several biases that could potentially be at play with this GitHub data (including the fact that almost 25% of the names couldn’t be classified as male/female with confidence).

However, Bozhidar, Anna-Chiara, and I agreed that GitHub activity level is generally a good indicator of programming expertise. Yet this data suggests a trend of talented female programmers choosing to discontinue (or never start) their open source pursuits in favor of other options.

2. Numbers cited in tech company reports include non-tech roles.

Many companies in the tech industry cite that they employ between 25 and 30 percent women. This number, however, can be misleading. Most of these larger numbers - yes, they are the larger ones - include both technical and non-technical roles.

As you begin to examine the percentage of female employees in technical roles, the numbers drop even lower.

At Facebook, 32 percent of employees are female, but only 16 percent of technical roles belong to women. At Google, there’s a similar drop of 30 percent female employees in the company as a whole to 18 percent in technical roles. Slack drops from 39 percent female overall to 18 percent in engineering roles. Of the companies I’ve examined, Intel has the smallest jump, going from 24.1 percent female overall to 19.4 percent in technical roles.

So even though many companies boast a percentage of female employees that is about a quarter or even a third of the company, the number of women in technical roles is actually much lower. It seems that claims of 15 to 20 percent would be more accurate.

But that still leaves a huge disparity between the percentage of women involved in technical or engineering roles at tech companies and the percentage of women who contribute to open source projects on GitHub.

3. Female programmers are leaving the tech industry.

If activity on GitHub correlates with seniority and expertise, then the extremely low number of active female contributors (low even compared to female contributors overall) could be explained by the alarmingly high departure rate of female engineers from the tech industry.

Among women who join the tech industry, 56 percent leave by mid-career, which is double the attrition rate for men.

If the tech industry can’t retain as many women past their mid-career mark, then it’s likely that they won’t be contributing to many open source projects either.

But this line of reasoning also begs the question: Is the correlation between seniority and contribution actually true? Many frequent OSS contributors are relatively new programmers who are trying to establish a name for themselves - so where are the women from that group?

4. GitHub can be an unwelcoming community for female programmers.

Commenting on an article about women in tech, one female developer says, “In regards to the open source projects - I’ve been thinking about this recently. I actually haven’t committed to any and it definitely puts a kink in my career… I feel like it’s a circle I can’t get into. But mostly I fear the excessive spotlight of being a sole female programmer on a publicly available project. In light of how women are treated on the internet, this fear does not seem unreasonable.”

Anna-Chiara believes this kind of apprehension is a common theme amongst female engineers, especially when it comes to OSS. When I asked her if she thought women were less likely to contribute to open source projects, she responded, without hesitation, yes.

Anna-Chiara also brought up the possibility that female GitHub users might try to adopt a gender-neutral or male name to ensure they would be taken seriously (remember that genderize.io was not able to confidently determine the gender of about a quarter of the profiles scraped).

That does not mean, however, that female contributors are not out there. Bozidhar brings up Exercism.io, a popular project started by Katrina Owen that has several female contributors. He also mentions Bodil Stokke, a female developer from Norway with an extremely extensive history of popular open source contributions.

Anna-Chiara also suggests that if a project had women among the top contributors or leaders, female developers might be more likely to contribute to it. Unfortunately, compared to the number of male-dominated projects out there, female-led OSS projects are hard to find.

But the issue is larger than just OSS. “If I think of the women I know in development, it’s nowhere close to the 20% that you hear about at these big companies. I don’t think it’s even anywhere close to 10%,” Anna-Chiara tells me. “The result of this analysis of GitHub doesn’t surprise me.”

5. Implicit biases that shape the tech industry might be trickling into GitHub.

Eric Ries points out problems of implicit biases in the tech industry. Even if individual people within systems are not biased, it is still extremely easy for those systems to become biased. People also have unconscious biases, which complicates the issue even further.

In his article, Eric uses the example of orchestras, which were primarily all-male until the 1970s. People believed that male performers had a superior aptitude for music than female performers. However, once orchestras started separating musicians from judges with a physical screen during auditions, the numbers shifted significantly, and people began to accept that men and women played equally well on average.

If similar biases come into play with hiring systems in the tech industry, it could help explain the smaller percentage of female software engineers that I discussed earlier. And if fewer female software engineers are being hired, those effects could trickle into open source communities like GitHub. If someone is rejected for full-time programming roles, they might come to believe that they are not as talented, and would therefore be less likely to have the confidence to contribute to open source projects.

Where does this leave us?

Here are some follow-up questions that come to mind for me (and there are plenty more):

1. How are these numbers changing over time?

Getting more women involved in the tech industry is a highly-discussed topic right now, and the rise of coding bootcamps that require contributions should have a positive impact, including when it comes to open source. How effective are those discussions and the various new initiatives? What would these numbers look like 3 years ago? 5 years ago? What about in a year?

2. How else can we analyze GitHub data?

Anna-Chiara suggested examining the gender breakdown of users based on the number of forks they have to get an idea of how frequently female GitHub users are experimenting with a project in some way. Additionally, there are other factors at play, such as age group, that might affect our findings. Open source has been a staple of the tech industry for a long time, but GitHub was only founded in 2008.

3. Is there a good way to look at which GitHub users are employing a fake name?

If the percentage of women that use a fake name is much higher than the percentage of women on GitHub overall, that would make a very strong statement about how welcoming GitHub (and tech in general, to a certain extent) is as a community.

4. How do these numbers change when you start looking at location?

This is imperfect, as interaction on GitHub is theoretically location-agnostic. But can we learn anything from the tech communities in countries that have a proportion of female GitHub users that is higher than average.

And here are some ideas for improving these numbers (again, there are of course plenty more):

1. Can the pages of popular GitHub repositories be improved?

When I discussed this topic with Bozhidar, he mentioned that most projects/communities on GitHub have leaders who are extremely patient, welcoming, and happy to guide new open source contributors through the early stages of the project. This does not seem to be common knowledge at all (remember the aforementioned comment from a female developer who felt that open source communities were “a circle [she couldn’t] get into”).

Are new GitHub users aware that this type of mentorship and support exists (assuming that it’s as prevalent as he says), and would a new user know how to easily find such guidance? Could improvements be made to the interfaces of popular GitHub repositories to make this more obvious and make them more welcoming? For example, if popular repository pages included something like an official “Repository Mentor” role, maybe it would be much clearer that a welcoming, experienced user was available to answer any questions.

There are plenty of posts out there that teach you how to use GitHub by walking you through pulls/pushes, commits, branching, and more, but I find next to nothing in terms of guidelines for interacting within the GitHub community (if you know of any, please post relevant links in the comments).

A how-to guide for navigating GitHub community etiquette and best practices according to your skill level might help to break down the intimidation and spotlight elements of contributing to open source. This is definitely something that could encourage more aspiring new developers to get involved. Stay tuned for a guide like this from Toptal.

3. More mentorship could make an enormous difference.

Bozhidar commented on the importance of developers involved in the project who were willing to help newcomers get started with basic tasks, while Anna-Chiara discussed how it could be quite intimidating to jump into a project and open your work up to criticism. It seems that there is a great deal that could be done to make open source communities more welcoming for everyone, including women. Stay tuned for an initiative from Toptal here as well!

Are you surprised by the results from GitHub? What do you think they mean?

Comments

WTF, really WTF
Anyone care if a line of code was written for women or men, if we have less women in our area is because they prefer to work in other specialties instead of coding, just that.

Stefany Dyulgerova

Well, I believe in the end, it all boils down to skills. And 99% of the people doesn't care whether the programmer is male or female. The 1% are sexist bad people.

Stefany Dyulgerova

Yes, noooooooooooo one cares, as long as the line of code works!

wilder

Fully agree with you! Everyone has the opportunity to decide for themselves. We dont want quotas :)

OlegM

Why would you care if your code was written by a male/female/other as long as it is good code? If I had a company, I would hire the most motivated and best talent I could get, regardless of gender. Discrimination either way is a terrible idea when your goal should be a good product.

mara99.com

I consider myself lucky to be a woman in front-end (web) development because from my experience discrimination based on the gender is very low or non-existent. Coders and designers are more focused on the quality of the work and we actually never discuss gender difference because they are irrelevant.
That said I think web development is the most progressive branch and I'm happy to see many female designers and coders at the meet-ups.

Irina Gudkova

I'm not happy about GitHub profile stats being a criteria of competence. During my over-10-years software engineer career 100% of projects I was hired to work on were private commercial (this is true for Toptal jobs as well, right?), and 100% of projects I've been working on my own are private commercial too. And I wouldn't say they are less exciting or less technically demanding than open source, they are just business, i.e. targeting a customer but not a fellow developer.

mars

After 50 years of coding, who cares if a person or a machine wrote the code. Does it work and efficiently so? On rare occasion with a remote online project, a handle like TuringMachine does great work and then surprises me at a conference. TuringMachine is a lady, and nothing said in months of emails and chats indicated so. (Skype?) Other times, GavinS or David D turns out to be a seventeen year old kid. Write good code.

Maëlys

Of course open-source isn't open to women. Have you seen the shitstorm that erupted on GitHub after they proposed a code of conduct that would address the extra obstacles women face to make it into this community and make it clear they're welcome? They had to pull it back.
https://github.com/todogroup/opencodeofconduct/issues/84
Or look to general communities like r/linux or r/programming whenever the topic of the inclusion of women comes up. It does anything but feel like women are welcome: they deny or defend well-documented obstacles that women face (which targets them as a group and not men.) The resulting numbers are pretty black and white, and yet it's just denials and the meritocracy myth.
How many people would willingly want to participate in this kind of environment? A lot do - some of the best coders I know are women. But there'd be many more if this wasn't so openly a boy's club.

Margaret Henderson

I think there is another factor for women in IT and in GitHub. I think as women get mid-career, they are more likely to have families. Yes- men are too! But, generally speaking, women are still the primary care givers and men are still the primary income providers. IT demands 50+ hour work weeks. The better you are, the more the company wants of your time. Women have to choose between career goals and family. If we want more women in IT, we need more life/work balance in IT.

Trisha Kunst Martinez

If most of the code is writte by men, it likely targets men as customers, based on experience. Problem solvers often find the best value when trying to solve a problem for themselves and then scale it. Add this to the possible questions: are the primary customers for an area in tech primarily men or women? (E.g. front end web, as commented by mara99). If yes, do the coders in this area of tech look like the customers? Which is cusal and which is simply correlated?
Better question - what are women's main unsolved problems? The answer might point to an area ripe for disruption. IMHO only - these are follow on research questions for study, not answers!

Filip Dupanović

I'd love to see more women on GitHub, but in the end it's platform is curtailed to assist only a subset of the skills present in a community and it encompasses only a fraction of all the work invested into open-source projects. When you immerse yourself into the actual community behind a project, you will see that the rate of women involved begins to skyrocket.
It is an important indicator definitely worth the attention, but a perspective that's solely based on relating genders and LOC/NOC is as broken as relating project completion with LOC/NOC.

Ricardo

Every time I hear this discussion It's like tech is nirvana. Like everyone's career should be in tech and it's very amazing and there's something that is leaving women outside that amazing.
Well, it's not. There are higher paying jobs: doctors, lawyers; There are easier ways to project yourself: banker, politician; There are more fun jobs: designer, photographer; There are more soul seeking jobs: geologist, biologist. Think about it. In tech you have to work long hours, usually alone, locked in an office, the work never gets easy as you get older and you will always be framed as a "technical person", which means, can't get a grip in real life.
Probably women are smarter and don't get carried away for nothing, like men.

Hunter Stevens

I have been on github for nearly a year. First, notice that my name is generally regarded as male. If the theory of women being mocked on github is true, I could pick a random avatar and be done with that. Second, my open source contributions are low. I do file issues here and there, but most contributions are to readmes or style guides.
Also keep in mind that Github allows private repos to thrive. Where I work, we use Github for code review and pull requests, but the repo is private. Currently, my profile displays 1400+ contributions since I signed up. Anyone not from my company would see <100 probably. I think using Github has a model for this study is OK, but not the best.
Finally, I have to agree with others. I do not think the coding community is unaccepting of women. Rather, I think women are nervous to be one of the few on a team or of contributors. (As noted in this article) I also think those who do not hire more women IN TECH are just sexist. The sexism is found on the internet in general, and in people in general. I do not think it is localized to the coding community.
I definitely think programs, internships, etc are key to growing the female coder base. There is still a stigma of pushing young girls away from even appreciating technology. (Ahem, computer science Barbie) But that is just another discussion for another time.

Hunter Stevens

I agree. I think that for new developers, having open source projects during or right after college is good. However, seasoned developers spend most of their time on private repos. Of course, having open source contributions are great, but it does not determine the ability or even experience of the programmer.

Kawlinz

If good githubbers are generaly good engineers, but good engineers aren't necessarily into open source, then I have your answer.
Women don't need to have a large example of open source online to be taken seriously. Since companies are desperate to hire women in order to avoid the sexism accusations, why would women need a catalogue of work that they've done for free?
You're welcome.
(Or men in general just enjoy programming more than ladies... Could be that)

Breanden Beneschott

Does that mean you disagree with the statements in the post?
For example:
"GitHub activity is generally a good indicator of engineering expertise, but the reverse isn’t true… Plenty of great engineers aren’t on GitHub."

Breanden Beneschott

Could you share some examples where the rates have skyrocketed?

Breanden Beneschott

Do you think the data would look significantly different if private repos were included?

ToyotaBedZRock

You are ignoring a career advancement opportunity. Even for a new male programmer contribution to open source can cause anxiety. But it allows you to build a network of people who can help advance your future employment opportunities and it shows business you can handle criticism well. If you start your own project it shows your creative and can bring that to a private company. Programmers interact with each other more than customers.

ToyotaBedZRock

Because it was a misguided idea. It would have caused indiscriminate damage to many projects. And do you go there to code or search for offensive things. Allow your refusal to lend your expertise speak for itself and help a competing project.

Alexey

Such a misinterpretation of statistical data! The author looks into
the numbers and completely ignores the base cause of the research: why woman don't become coders.
Instead of crunching raw numbers why not to take your female CS friend out for a coffee and ask why she has a CS degree and does not write code? The answer simply would be: it's boring and not interesting!
No prospects
Why so many woman are CS educated? - because there're good prospects for IT roles, but not for coders. Writing code may be fun, but in modern IT industries software developers are treated as an easily replaceable labour asset (Is not TopTal business about that?). The prospects within the "technical" domain in a large company are rather limited.
Better options available elsewhere
Being a software engineer myself I can object: there're challenges in complex architectural solutions, certain recognition among peers, self-esteem, etc. Developers here will add thousands more reasons of why "writing code is cool". But if you are a smart and educated woman you can find much better ways to express yourself then writing code.
Hard
Writing code is fun, but it's hard too: long learning curve, complex concepts to adopt, solving software problems is not easy as it may look from outside. There are much quicker ways to progress with your carer then writing code.
Does not appeal to woman's nature
Being a good developer means certain level of autism and focus on the narrow problems solving. Woman's nature is being social and communicable (remember these 14K words a day a woman needs to say) that is quite opposite to a coder. The communicative nature of a woman can explain a high number of CS female students that go to BA related areas after graduation rather than into coding.
Guys give up these talks on bringing woman into coding, let them stay woman and humans! ;)
* * *

Kathryn Hoster

You are mansplaining to a professional. You have no idea how tiresome this flippant dismissal of female input can be. For every woman, there are 8 - 10 men whose greatest joy in life appears to be telling her she is wrong, regardless of how knowledgeable she is and how profoundly ignorant they are. What drives this bizarre male behavior?

Eduardo Pereira

I think the low number of women in tech goes beyond quotas or if they prefer to work in other specialities, while we still thinking like that we are still blind for the real reason, in the past women was in great number when the subject was programming or computer (main frame) maintenance, fact: people are not taking in consideration when try to argue: why don't have a larger number of women coding.

Eduardo Pereira

Disagree, tech jobs have high gain even for junior developers, which a doctor, lawyer or another job have to study 10 years to have the same income (not a rule, but very common here in Brazil). Also the society need more people available to support and help with technology to transform our society. And I'm not talking about only on code, I'm talking about science too