AI for Good: Fighting COVID-19 with Data Science

This is the first of two blog posts about our recent participation in the Pandemic Response Hackathon. Our project (CoronaRank) was one of only 5 projects out of 230 submissions chosen to present at the closing ceremony. For the technical details of our CoronaRank solution (Markov Chains, R, Shiny, and how to quickly manipulate a dataset of >100GB) see the follow-up article here.

The COVID-19 pandemic is putting an unprecedented strain on communities, healthcare systems, and the economy. Much of the effort towards containing the spread of the virus remains with taking individual responsibility for the benefit of the wider community. Various governmental agencies and international organizations are putting policies in place aimed at containing the pandemic and maximizing the efficiency of healthcare service delivery.

What can a data science company do to assist these efforts?

Our AI for Good initiative aims at bridging the gap between tech expertise and those in need of such support who are at the forefront of the fight for a sustainable future of our planet. Committed to this vision, we set out to contribute our data science skills to a project which could reduce the impact of the COVID-19 pandemic.

We recently got this chance during a hackathon centered around finding solutions for the global pandemic. During the hackathon, we developed CoronaRank – an algorithm which provides users with a personal coronavirus risk score and generates heat maps of risky areas.

Pandemic Response Hackathon

Devpost is a platform that provides the tech community with an opportunity to contribute to overcoming various global challenges. Their recent Pandemic Response Hackathon asked the participants to develop technologies to solve what appears to be the greatest public health challenge in decades.

The hackathon launched on the 27th of March. Over the course of the next three days more than 2,000 participants got involved and submitted upwards of 230 projects across four tracks:

Public Health and Information Sharing

Epidemiology & Science of the Disease

Keeping our Health Workers Safe

Second-Order Societal Impacts

30 different organizations committed resources including cloud computing from Amazon AWS, visualisation tools from Mapbox, datasets from Veraset, and many others.

We entered the hackathon in collaboration with Ewa Knitter, an infectious disease epidemiologist who kindly offered to support our efforts.

Problems we set out to tackle

After initial discussions we identified a number of problems particularly compelling in the current outbreak and we realised that they can be addressed using geolocation data. Specifically:

COVID-19 tests are a limited resource, and there’s not an obvious way to decide who should be tested.

Since few tests are being done, and partly because many infected people are asymptomatic, it’s difficult to know which people and areas to avoid.

Supply chain management in the healthcare sector is going to be extremely difficult moving forward and policymakers need information on the current potential hotspots where an outbreak might be imminent.

Many young healthy people are ignoring social distancing guidance on the basis that they have a low personal risk. We need a way to illustrate how breaking isolation can affect communities.

Our solution

To address these problems, we decided to create heat maps of pandemic hotspots with high human interaction. Such heat maps would give public officials an idea of the locations for the next potential outbreak and provide the users with information about the risk of noncompliance with public health measures.

To achieve this we took inspiration from Google’s PageRank algorithm, which ranks web pages based partly on their interactions and connections with other popular web pages. We replicated this methodology in epidemiology with Markov Chain modeling. The resulting CoronaRank is an algorithm that uses geolocation data, epidemiology data, self-reporting, and Markov Chain modeling to assess the likelihood of coronavirus exposure.

To create and implement CoronaRank we made use of the Veraset database for New York. Veraset provides anonymized phone geolocation data giving each individual a unique identifier.

The challenge was to analyze this large dataset (over 100GB of data per day) in a limited timeframe. However, building on our previous experience with Big Data, we were able to quickly develop the algorithm. We went on to embed it within a web application — Community Shield — designed for use on smartphones, which displays pandemic hotspots – areas with high activity in a recent period, as well as give the user a risk score depending on how many interactions they had in these hotspot areas.

An individual’s CoronaRank is the likelihood that they may be infected with COVID-19. Confirmed cases are assigned a CoronaRank of 1. Non-confirmed persons are assigned a CoronaRank of 0<x<1 based on the interactions or possible interactions with others based on geolocation data from the past two weeks obtained from phones.

Demo of user input.

The more you travel to risky places, the higher your CoronaRank. The more high-rank people visit a place, the more risky it becomes.

A high risk individual (CoronaRank of 0.9) visited a number of high risk areas in Manhattan recently.

You can test out the demo of the app here. For now it includes three predefined risk profiles to showcase the app’s capabilities.

Our plans for the future

We plan to develop the CoronaRank algorithm further by including a self-reporting feature. This way, the user can anonymously provide information about their COVID-related symptoms (if any). This will affect their CoronaRank and by extension that of all other people they met in the recent weeks. This would be very valuable to public health organizations that do not have the capacity to screen and test each citizen.

We also aim to integrate Google Takeout to import personal location data into the app to make it fully user-specific and improve the UI.

We hope to partner with governmental and international institutions to get endorsement for the app and deliver it to the public. A long-term collaboration would help to turn the app into a comprehensive tool to educate individuals and drive informed healthcare delivery policy for public institutions. To make this a reality we need to obtain cloud resources to make this app available at scale. Please don’t hesitate to reach out if you would like to provide resources or collaborate with us on the project.

If you want to find out what we can do for your industry please leave us a message and we will reach out to you.

By completing the form, I agree to receive commercial information by email or phone from Appsilon Data Science. I can withdraw my consent at any time. The data will be processed until the consent is withdrawn. The administrator processes data in accordance with the Privacy Policy. I have the right to access data, rectify, delete or limit processing, the right to object, the right to submit a complaint to the supervisory authority or transfer data. *