All posts by admin

In the last months we have seen that Ethics has emerged as an extremely sensitive topic for Data and Analytics community. Most likely, one of the main drivers of this wave of concern was Facebook scandal: Mark Zuckerberg (founder and CEO of Facebook) had to testify in front of US Congress about how his company handles its users’ data and how this could have influenced results in recent elections in several countries. But Facebook is not the only company whose practices are under scrutiny. Tones of questions have also been raised regarding how much personal data Google collects and how this is being used: according to Guillaume Chaslot (an ex-Google engineer), the Youtube algorithm “does not appear to be optimising for what is truthful, or balanced, or healthy for democracy”.

In other words, we are talking not only about privacy but also on how data could even threaten our political system. As Cathy O’Neil writes in her must-read book Weapons of math destruction, “the math-powered applications powering the data economy were based on choices made by fallible human beings. Some of these choices were no doubt made with the best intentions. Nevertheless, many of the models encoded human prejudice, misunderstanding and bias into the software systems that increasingly managed our lives. Like gods, these mathematical models are opaque (…) Their verdicts, even wrong or harmful, were beyond dispute or appeal. And they tended to punish the poor and the oppressed in our society, while making the rich richer”.

As Data-Driven professionals we cannot ignore this inconvenient truth and must address it. This is one of the reasons we at BcnAnalytics organised a session to discuss about Data & Ethics. As speakers we had Carlos Castillo (Distinguished Research Professor at Universitat Pompeu Fabra) and Gemma Galdon (Founder at Eticas Research & Consulting and Researcher at Universitat de Barcelona).

Carlos focused his talk on algorithmic discrimination. He initially reviewed the concept of discrimination from a philosophical perspective and then explained the concept of group discrimination, which means “disadvantageous treatment to an individual because he or she belongs to a specific socially salient group”. According to Carlos a further step is statistical discrimination which can be observed “when group discrimination happens because of some statistical belief, which means that someone has certain data, has looked at this data and based on statistics extracted from this data has decided to treat someone worse than another person”. After reviewing these concepts, Carlos raised the key issue: machine learning algorithms can discriminate.

Why is that? Machine learning systems take data and extract statistical beliefs from this data and therefore they are enabled to discriminate some individuals, regardless of intention and animosity. The key aspect is the consequences of this algorithm in terms of treating worst a person because he or she belongs to a group. Carlos emphasized that to avoid this discrimination, models need to optimize not only accuracy but also need to look at “the risk of two different populations of not getting the same outcome”. Carlos also highlighted how important is that systems are transparent: “if you get a negative outcome, you have to have a way to challenge this decision in a way that is effective… If I am denied a loan or parole, I need to have a way of effectively challenge the decision to say the systems was wrong in my case”.

Gemma started her talk quoting “The Fall of Public Man” from Richard Sennett. “In a city full of sensors and cameras and surveillance everywhere, where would Romeo and Juliet fall in love?”. From Gemma’s perspective, technology is changing our lives and we really need to ask ourselves: Why we are investing in technology? What kind of societies are these technologies creating or promoting? Are we building the cities that we want to build? Do we want to live in a world where everything is remembered? Do we want to live in a world where we can never forget? As she mentioned: “for the first time in history, forgetting is more expensive than remembering. Everything we do is recorded by a camera or a sensor”. Gemma, then, started to review real cases on non-expected outcomes of certain technologies. For instance, smart borders based on biometrics. They were not part of the legislative debate because they were seen “as technical amendments”, but currently biometrics have become our IDs, and certain individuals self-mutilate when they want to hide their identities. In other words, their bodies became their enemies.

Gemma asked herself: “How can we hide behind a technical amendment? And what about false positives? There is no redress mechanism”. According to her the most burning issue is we, as society, did not think technology could fail. But it fails. And this triggers the key issue: the way we do technology is very irresponsible and no one is facing the consequences of their actions, the consequences of their false positives…which might be human rights. Gemma ended her speech highlighting the fact we need to start thinking how technology is impacting our civilization: “we have the responsibility to decide how we build a social-technical infrastructure that is responsible and desirable for our generation and the next generations”.

BcnAnalytics family keeps growing. In the last weeks two new people joined our core team.

First one to join BcnAnalytics was Didac Fortuny. He is data scientist at Holaluz, a company that connects people to green power. In his own words: “I have a PhD in Physics in which I used data analytics to study the impact of climate change to Mediterranean precipitation. I also teach in a MSc in renewable energy and energy sustainability”.

Last addition has been Alejandra Manrique, She has more than 20 years of expertise in data analytics helping companies to get the most value out of data. In her own words: “I have worked in multiple sectors, water and enviroment, telecommunications, retail, automotive and media. I have international experience in different countries in Europe, America and Australia”.

The Barcelona GSE Data Science Center coordinates and promotes interdisciplinary and methodological research, training, and knowledge transfer in Data Science. They are now organising some academic seminars and conferences. See below their upcoming events for the month of March.

In the field of causality, we want to understand how a system reacts under interventions. These questions go beyond statistical dependences and can therefore not be answered by standard regression or classification techniques. In this tutorial, you will be introduced to the interesting problem of causal inference as well as recent developments in the field. We will introduce structural causal models, formalize interventional distributions, and define causal effects as well as show how to compute them. We will present three ideas that can be used to infer causal structure from data: (1) finding (conditional) independences in the data, (2) restricting structural equation models and (3) exploiting the fact that causal models remain invariant in different environments. If time allows, we will also show how causal concepts could be used in more classical machine learning problems. No prior knowledge about causality is required. The material is also covered in a recently published book (open access).

The course will offer an introduction to deep learning along with an extensive practical hands-on session in Python. We will cover deep feedforward models, convolutional networks used mainly in image processing, recurrent neural networks used commonly in text processing, autoencoders, word2vec, as well as introduce optimization for deep learning. During the hands-on workshop, we will use deep learning techniques on images and natural-language text.

Bayes Comp is a biennial conference sponsored by the ISBA section of the same name. The conference and the section both aim to promote original research into Bayesian computational methods for inference and decision making and to encourage the use of frontier computational tools among practitioners, the development of adapted software, languages, platforms, and dedicated machines, and to translate and disseminate methods developed in other disciplines among statisticians.

In BcnAnalytics we are really passionate about Data. At the same time, we also have some concerns about ethical aspects of a data-driven world. So, we are pleased to announce our next event will focus on “Data and Ethics”.

Event will be on April 11th 19h at MWC, and as usual doors will open at 18:45.

We will have two great speakers in our panel: Carlos Castillo (Distinguished Research Professor at Universitat Pompeu Fabra) and Gemma Galdon (Founder at Eticas Research & Consulting and Researcher at Universitat de Barcelona). Both will share their views on ethical aspects when using data and building algorithms. They will raise concerns around bias, discrimination and opacity in a data-driven world and how this might negatively affect certain people on their lives.

As usual, after the talks we will have time for networking and free cold beers.

On Sunday January 21st, at about 14:00, the winners of the BCN Air Quality Datathon were announced by the jury. This scene concluded an intense weekend in which 12 teams formed by data scientists with all kinds of backgrounds and coming from different countries worked hard to achieve a clear goal: use data to improve the air quality predictions that the Barcelona Supercomputer Center (BSC) performs with the CALIOPE system.

It all began on Saturday 20th at 9:00, when the first participants arrived and collected the wonderful green t-shirt with the motto “Keep modelling and mind the air quality”. Then, after the kind words of our host Vicenç Villatoro (the director of CCCB), Janet Sanz Deputy (mayor for Ecology, Urbanism and Mobility #Barcelona), and people from the companies that made the event possible (the sponsors Gauss&Neumann, Social Point and Holaluz), the datathon was presented and the challenge made public to the participants.

Given the concentration of NO2 observed hourly in 7 measurement stations, and hourly predictions of the concentration of NO2 performed every day with the CALIOPE system, the challenge was to find the model that best predicted the probability for a set of days in 2015 to exceed a threshold concentration of 100 µg/m3 at least in 1 hour of the day.

After that, the teams had about 24 hours to design and implement their models and submit their predictions. At that moment, the strategies of the different teams started to emerge. Some discussed how to build the model before implementing it, while others started coding straight away to make the most with the available time. While experienced teams used a rigorous methodology to work in parallel at a fast pace, some newbies struggled to find a way to combine different languages or pass data from one computer to another. All of this in an environment of concentration but also of relaxation.

After a night in which some participants (and some organizers) did not sleep much, the predictions were finally submitted on Sunday morning. It was the turn for the teams to describe their work in 4-minute presentations in front of a jury formed by Carlos Pérez García-Pando, Kim Serradell and Maria Teresa Pay from BSC, Marc Torrent from the Big Data Center of Excellence, Salvador Lladó from Leitat, and Manuel Bruscas and Didac Fortuny from BcnAnalytics.

Two awards were given: The accuracy award, which was given to the team with more precise predictions, consisted on 2000 € and a pass for the Mobile World Congress 2018 for each member of the team. The winning team was “Worthless Without Coffee”, who performed a time series prediction using concentration values of the previous days, predictions of the CALIOPE system, concentration increases, some calendar variables and the characteristics of the measurement stations. They have kindly agreed to share their code, which can be found following this link.

The creativity award took into account the originality in facing the challenge and the insights found within the data. The winners of this award were the team “Dreamers”, who proposed some appealing policies to improve the air quality, and the team “Alpha”, who made useful suggestions to the members of the BSC to improve their predictions based on what they observed within the data. Each team won 600€ and passes for the 4 Years From Now 2018 event.

The datathon is over but there is still room to improve air quality predictions. For this reason, the data set will be kept public and any restless data scientist will be able to access it and keep working on the problem. Following this link anyone can download the data and the documentation given in the datathon. So, data scientists, keep modelling and mind the air quality!

A few years ago, when we created Bcn Analytics our vision was Barcelona can become a European analytics hub. Our ambition was to foster that different members of community (business, academia, data professionals) could meet and share experiences and knowledge. Now, 3 years after, we feel proud of we accomplished. We have organised 10 meet-ups where fantastic speakers from great organisations have shared their expertise: we had guests from Google, New York University, King.com, La Caixa, Telefonica, Schibsted, Social Point, BBVA, IPSOS or Vistaprint, among others. We also had the chance to organise two Datathons with Social Point so data scientists could compete to win some prizes while having fun with data.

The Datathon is going to be part of the exhibition “After the End of the World” which is being organised by CCCBB. Participants of the datathon will have to build a prediction model on Barcelona pollution levels. We have more than 3.000€ in prizes thanks to our sponsors Social Point, Holaluz and Gauss & Neumann. We also have the support of Mobile World Congress.

We are pleased to anounce our next event “How start-ups are using Machine Learning to disrupt industries”.

We will have two great speakers in our panel: LongLong Yu (Co-Founder & Head of Research at Wide Eyes) and Aleix Ruiz de Vila (Chief Data Officer at Onna). Both will share the learnings and insights on how their companies are disrupting some industries applying various machine learning techniques.

Bio LongLong Yu: he received the MSc degree in computer vision and artificial intelligence from the Autonomous University of Barcelona (UAB) in 2013. In the same year he co-founded the artificial intelligence and image recognition company Wide Eyes Technologies (Wide Eyes). His career as computer vision and machine learning geek started with human detection for surveillance and face recognition for biometric analysis. Currently, he is Member of the Board and Head of Research and Innovation at Wide Eyes and focuses mainly on image classification, retrieval and object detection for the fashion industry.

Bio Aleix Ruiz de Vila: PhD in mathematics, has applied machine in areas such as transportation, journalism and retail. Currently as Chief Data Science at Onna is responsible for developing and putting in production machine learning models for documents management. Cofounder of the Barcelona R Users Group and Barcelona Machine Learning Study Group meetups, also collaborated with BcnAnalytics

Are you coming to this year’s PyConEs? Don’t lose the chance to participate in the PythonHack that is organising Kernel Analytics!

We offer three different challenges:
Accuracy Contest: if you want to prove that your models can beat the rest, this is your contest.

Web App solution: for back-end and front-end developers that can give an operative and fancy solution to an open question

Happy Hour challenge: in-person machine learning challenge. Participants will have free drinks during the challenge!
In order for you to assist to all conferences, Kernel is releasing the datasets of the Accuracy Contest and Web App solution one week before the PyConEs begins.

Happy Hour will also take place after Saturday’s conferences end.

Choose which contest you want to participate in and register as soon as you can because there are limited spots.

Check out the whole video of our last event with NYU professor Joan Bruna, assistant professor of the Courant Institute (NYU). Joan shows us some important applications of Deep Learning and which are the next challenges of this hot field.

We are pleased to anounce our next event and the first in our series: Machine Learning Series II. It is a great honour to have in our panel Joan Bruna, Assistant Professor at Courant Institute, NYU, in the Department of Computer Science.

Joan Bruna will share the learnings and insights of applying various machine learning techniques to a number of different use cases throughout his career, such as image or real-time video recognition, among others. The conference will combine an initial master class and a debate with the audience.

WHEREAula Capella, ground floor, at Historical Building of Universitat de Barcelona, at Plaça Universitat, at Gran Via de les Corts Catalanes, 585

Bio. Joan Bruna is an Assistant Professor at Courant Institute, NYU, in the Department of Computer Science, Department of Mathematics (affiliated) and the Center for Data Science, since Fall 2016. He is currently on leave from UC Berkeley (Statistics Department).

His research interests touch several areas of Machine Learning, Signal Processing and High-Dimensional Statistics. In particular, in the past few years he has been working on Deep Convolutional Networks, studying some of its theoretical properties and applications to several Computer Vision tasks.

Before that, he worked at FAIR (Facebook AI Research) in New York, working on Unsupervised Learning. Prior to that, he was a postdoctoral researcher at Courant Institute, NYU, under the supervision of Prof. Yann Lecun.

Joan completed his PhD in 2013 at Ecole Polytechnique, France, under the supervision of Prof. Stephane Mallat. Before his PhD he was a Research Engineer at a semi-conductor company, developing real-time video processing algorithms. Even before that, he did a MSc at Ecole Normale Superieure de Cachan in Applied Mathematics (MVA) and his undergrad at UPC (Universitat Politècnica de Catalunya, Barcelona) in both Mathematics and Telecommunication Engineering.