THE BCN AIR QUALITY DATATHON CHRONICLES

On Sunday January 21st, at about 14:00, the winners of the BCN Air Quality Datathon were announced by the jury. This scene concluded an intense weekend in which 12 teams formed by data scientists with all kinds of backgrounds and coming from different countries worked hard to achieve a clear goal: use data to improve the air quality predictions that the Barcelona Supercomputer Center (BSC) performs with the CALIOPE system.

It all began on Saturday 20th at 9:00, when the first participants arrived and collected the wonderful green t-shirt with the motto “Keep modelling and mind the air quality”. Then, after the kind words of our host Vicenç Villatoro (the director of CCCB), Janet Sanz Deputy (mayor for Ecology, Urbanism and Mobility #Barcelona), and people from the companies that made the event possible (the sponsors Gauss&Neumann, Social Point and Holaluz), the datathon was presented and the challenge made public to the participants.

Given the concentration of NO2 observed hourly in 7 measurement stations, and hourly predictions of the concentration of NO2 performed every day with the CALIOPE system, the challenge was to find the model that best predicted the probability for a set of days in 2015 to exceed a threshold concentration of 100 µg/m3 at least in 1 hour of the day.

After that, the teams had about 24 hours to design and implement their models and submit their predictions. At that moment, the strategies of the different teams started to emerge. Some discussed how to build the model before implementing it, while others started coding straight away to make the most with the available time. While experienced teams used a rigorous methodology to work in parallel at a fast pace, some newbies struggled to find a way to combine different languages or pass data from one computer to another. All of this in an environment of concentration but also of relaxation.

After a night in which some participants (and some organizers) did not sleep much, the predictions were finally submitted on Sunday morning. It was the turn for the teams to describe their work in 4-minute presentations in front of a jury formed by Carlos Pérez García-Pando, Kim Serradell and Maria Teresa Pay from BSC, Marc Torrent from the Big Data Center of Excellence, Salvador Lladó from Leitat, and Manuel Bruscas and Didac Fortuny from BcnAnalytics.

Two awards were given: The accuracy award, which was given to the team with more precise predictions, consisted on 2000 € and a pass for the Mobile World Congress 2018 for each member of the team. The winning team was “Worthless Without Coffee”, who performed a time series prediction using concentration values of the previous days, predictions of the CALIOPE system, concentration increases, some calendar variables and the characteristics of the measurement stations. They have kindly agreed to share their code, which can be found following this link.

The creativity award took into account the originality in facing the challenge and the insights found within the data. The winners of this award were the team “Dreamers”, who proposed some appealing policies to improve the air quality, and the team “Alpha”, who made useful suggestions to the members of the BSC to improve their predictions based on what they observed within the data. Each team won 600€ and passes for the 4 Years From Now 2018 event.

The datathon is over but there is still room to improve air quality predictions. For this reason, the data set will be kept public and any restless data scientist will be able to access it and keep working on the problem. Following this link anyone can download the data and the documentation given in the datathon. So, data scientists, keep modelling and mind the air quality!