Analyzing a public data set using pandas

Background

On Women’s Day 2018, HackerRank published a Women in Tech Report which presented what we need to (continue to) do to #PressForProgress – the theme for this year. The findings were from a developer survey conducted by HackerRank in February 2018. They also published their data set in Kaggle and I was curious about the data and the questions that were asked, to see if I can come out with my own set of observations.

Analysis

There were 25K responses in total, and the ratio of responses from men to women was approximately 5:1. Of the nearly 4K women who responded, there were participants from 111 countries.

The maximum participation was from women in India and USA and the participants from the remaining 109 countries were a tiny fraction of the top 2 participants combined. So I decided to look more closely into the women participants from these 2 countries.

When the student participation was removed, the closeness in the number of professional women who took the survey in the top 2 countries called for a closer study!

Country

Total Women Participants

Job Title – Student

Job Title – Non Student

India

1453

892

561

USA

1000

454

546

Table 1: Breakdown of women participants from top 2 countries

Inferences

Indian women may continue to dominate the software industry

It has been close to 30 years since India introduced computer science and subsequently IT (information technology) in its engineering curriculum. It is still going strong as a profession option for women. This is why in the 18-24 year old bucket, India has a very strong lead.

But the drop in the 25-34 year old bucket is surprising. It would be good to further study this age segment to find answers to the following questions:

Is there a dearth of software jobs in India for more experienced women?

Does marriage/kids/family responsibility pull women out of the workforce?

Do women not find the support / growth path in their software career?

Role parity is heartening to see

Across the spectrum of positions, there was representation from both countries in each job role. It is good to note that though few, the more senior roles did have non-zero values. Over a period of time – with sustained efforts from individuals and companies – one can hope for equilibrium across roles.

Learning

Handling a publicly available (but small enough and easily understandable) data set