Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Calling all aspiring women in Data Science

Datathon participants at the Microsoft New England Research and Development center. Photo credit: Dana J. Quigley; @DJQPhotography

What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is a proud partner of WiDS; in addition to supporting the Datathon via the webinar, Microsoft also provided Xboxes as prizes.

One of the main drivers for engagement is the WiDS Datathon, now in its second year, that kicks off in the weeks preceding the conference, with the winners announced at Stanford during the conference. This year’s Datathon had participants working on a classic image classification problem using computer vision techniques. The challenge to be solved is an environmental one. Rampant deforestation caused by oil palm production (oil palm is a common ingredient across products in everyday use) has led to devastation of the eco habitats of many animal and plant species. One way to get ahead of the problem is to identify where the deforestation is taking place. These are remote regions and satellite imagery is an effective means of smart detection and intervention. Planet provided a set of hi-res satellite images and Figure8 helped annotate them and created a training, testing and holdout dataset for the Datathon. The Datathon has led to workshops in several countries with participants coming together to form teams to solve the challenge.

Datathon rules allow for teams of up to four people, with the requirement that at least half of each team be female or identify as female. Within weeks, the Datathon attracted over 200 teams. I took a shot at solving the problem using Microsoft Custom vision, one of the cognitive services available on Azure. Using the custom vision UI, I was able to build a classifier with a handful of training images within minutes. Extending the classifier to include hundreds of images was easy using the Python SDK for Custom vision. Such is the power of cognitive services in Azure; you can build a transfer learning-based powerful image classification algorithm with less than 100 lines of code. The model improved by simply continuing to add more images from the geo-images training dataset to the existing custom vision model, which was a simple and effective demonstration of the importance of increasing training data for higher model accuracy.

Training images count

Precision

Recall

60

79.60%

79.60%

1,800

97.50%

97.10%

5,000

99.60%

99.10%

We hosted a WiDS webinar that covered basic machine learning concepts and a tutorial with the custom vision solution. The webinar recording and slides are available for those who missed it.

This democratization of machine learning tools is an important factor in opening up the field of data science to a wide audience of data science students and practitioners. The other factor, especially relevant to attracting women to data science, is the focus on socially relevant datasets and problems, such as this year’s oil palm classification problem.

More broadly than data science, AI has a burgeoning effort of socially relevant subfields that are applicable to a growing demographic of women technologists and students. These include topics such as eliminating bias in AI systems through fairness, accountability and transparency, secure machine learning, privacy, ethics, policy impacting and domain specific machine learning.

Congratulations to all participants – visit the WiDS Datathon page for the full list of winners. We look forward to continuing our engagement with the growing community of data scientists as they tackle challenges that will have positive lasting impact on research and technology!

Up Next

Alan Turing asked the question “can machines think?” in 1950 and it still intrigues us today. At The Alan Turing Institute, the United Kingdom’s national institute for data science in London, more than 150 researchers are pursuing this question by bringing their thinking to fundamental and real-world problems to push the boundaries of data science. […]

Emotions make us human. Researchers at The Alan Turing Institute in the United Kingdom are using artificial intelligence and machine learning to push the state of the art in data science to better understand what makes us happy, angry and frustrated. “Our research seeks to try and measure aspects of the world that we, as […]

By Vani Mandava, Director, Data Science Outreach, Microsoft Research The National Science Foundation (NSF)-supported Big Data Innovation Hubs launched a National Transportation Data Challenge with a kickoff event in Seattle in May 2017. Microsoft Outreach, through its partnership with the Big Data Hubs organized an Azure workshop and participated in a panel discussion on ‘How […]