Covid-19: The Power of Big Data and AI

Article By : M. Di Paolo Emilio

COVID-19 comes from a family of viruses associated with severe acute respiratory syndrome (SARS) and the common cold. Big data and predictive analysis, in combination with artificial intelligence and a variety of thermal sensors, are powerful tools to contain the spread of this epidemic and minimize its resulting deaths.

Given that testing for the virus is sporadic at best, the numbers of cases of infection are often very uncertain, and the real danger of the virus is questioned. A decisive contribution to support epidemiological experts could come from data analysis techniques.

Data analysis plays a fundamental role, as does mathematics, which, together with physics, allows us to have an in-depth understanding of the details of nature and how things are made. As in the past years, the pioneers of data science have made an incredible impact on the world where data and analysis have been used to drive significant change in the course of a spread of the disease. One of the first historical applications of data analysis was in 1852, during a cholera outbreak in London. John Snow, one of the first data-driven epidemiologists, was able to geospatially analyze the deaths that occurred in London and thus isolate the source of the disease. Relying on his analysis, authorities were able to target their interventions to rapidly check the spread of the epidemic.

Let’s evaluate the data

Running models through data analysis systems has proven to be able to approximate how trends might progress. An example is the SIR model; it is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. The model uses coupled equations analyzing the number of susceptible people S(t), number of people infected I(t), and number of people who have recovered R(t). One of the simplest SIR models is the Kermack-McKendrick model. The Kermack-McKendrick epidemic model is considered the foundation on which many other compartmental models were based. In this regard, I found Ettore Mariotti’s analysis very interesting.

The idea is first of all to consider an island, our system, where people are not allowed in or out. Every individual can be in one of the following states at a given time: “Susceptible,” “Infected” and “Recovered,” hence the acronym SIR, because with a certain probability people who have never had the disease (S) can become ill and infected (I) for a certain period before they recover (R). In the case of CoVID-19, it is appropriate to extend the model with an additional state, “Exposed,” to include people who have the virus but are not yet infectious (SEIR model).

SEIR Model [Source: triplebyte.com]

This model considers two factors: the dynamics of the virus, and the interaction of individuals. The latter is very complex and would require technology like the one described in the previous paragraph. With all this, it is possible to define the R0 parameter, which represents the number of people that an infected person can potentially infect.

Let’s suppose, for example, that person A is sick and that our system has an R0=2. This would mean that A will infect two people. Those two people will, in turn, infect four people, who will infect another two people each (so 4 * 2 = 8) and so on. This highlights the fact that the spread of the disease is multiplicative rather than additive. R0 can capture three basic scenarios, as shown in Figure 2.

R0 basic scenarios [Source: triplebyte.com]

The closure of schools, gyms, etc. decreases the social interaction of people, thus lowering R0. The health system is limited, and it is very important to reduce this parameter below unity. If R0>1 the disease spreads, if R0<1, the disease disappears. It is reasonable to expect governments to impose stricter constraints on people’s mobility in an attempt to reduce R0.

It is important to note that R0 measures the potential transmission of a disease, not the rate at which the disease spreads. Consider the ubiquitous nature of influenza viruses, which have an R0 of only about 1.3. A high R0 is a cause for concern, but not a cause for panic.

R0 is an average, so it can be influenced by factors such as super-spreaders events. A super-spreader is an infected individual who infects an unexpectedly large number of people. Super-spreader events occurred during SARS and MERS epidemics, and the current Covid-19. Such events are not necessarily a bad sign because they may indicate that fewer people are perpetuating an epidemic. Super-spreaders may also be easier to identify and contain, as their symptoms are likely to be more severe.

In short, R0 is a moving target. Tracking each case and the transmission of the disease is extremely difficult, so estimating R0 is complex and challenging. Estimates often change with the availability of new data.

Which technological solutions could slow down or end the spread of Covid-19 and get R0 under control? The use of AI, together with data from the GPS movement of mobile phones, allows creating analytical models to predict which neighborhoods are more likely to have future cases or those where urgent intervention is needed.

Big data, AI and sensors

In case of an epidemic, clinical data can be highly variable in terms of quality and consistency. Complications of this sort include cases of false-positive patients. Big data and AI can be employed to check compliance with quarantine and machine learning can be used for drug research. These are just some of the solutions offered by new digital technologies to face the coronavirus emergency. From Asia, there are many examples of interventions implemented through the use of digital technologies.

Drones equipped with smart scanners and cameras provide the ability to detect those who do not comply with quarantine measures and to check people’s body temperature. The use of intelligent cameras in China and Taiwan has allowed intercepting people who are not wearing a mask but also to carry out a real-time thermal scan to detect possible cases of fever.

For example, the Chinese company SenseTime has developed a platform that scans people’s faces even if they wear a mask, while Alibaba has developed a new AI-based coronavirus diagnosis system. SenseTime is a global company focused on developing AI technologies that advance the world’s economies, society and humanity for a better tomorrow. It is also the world’s most-funded AI pure-play with the highest valuation.

SenseTime has announced that its contactless temperature detection software has been implemented in subway stations, schools, and public centers in Beijing, Shanghai, and Shenzhen. Alibaba, meanwhile, has developed a new diagnosis system for the Covid-19 based on artificial intelligence that allows the detection of new coronavirus cases with an accuracy rate of up to 96 percent by means of computer tomographic scans (i.e., CT scans.)

Evolution of a virus [Source: graphen.ai]

Graphen, together with Columbia University, is trying to define the canonical form of each gene localization of the virus, and identify the exact variant(s). It uses its Ardi AI platform which mimics the functions of human brain to store these mutation data and visualize them. In the visualization, each red node represents a virus. Each green node represents a set of viruses possessing exactly the same genome sequence. A virus’ information including location, gender, age, etc can be seen by clicking a red node.

Another useful tool for pandemic control is big data. In this period of emergency, it has been widely used to improve surveillance systems in order to map the spread of the virus.

The acquisition and processing of big data required the design of new methodologies and technologies for collection and analysis. In particular, we can distinguish four types or methodologies of big data analysis:

Descriptive Analysis, i.e., the methodologies and technologies used to describe the current and past situation of business processes or business projects, representing in a synthetic and graphical way the performance indicators of the activity;

Predictive Analysis, i.e., the data analysis tools that help to understand what could happen in the future using mathematical techniques such as regression and predictive models;

Prescriptive Analysis, used to identify effective strategic and operational solutions;

Automated Analysis, which includes the tools that allow the desired action to be implemented autonomously and in an automated manner and according to the result of the analyses that have been conducted.

Alibaba has also developed an app (Alipay Health Code), which, using the big data made available by the Chinese healthcare system, indicates who can or cannot access public spaces.

BlueDot, a Toronto-based startup that uses a platform built around artificial intelligence, has developed intelligent systems for the automatic monitoring of the spread of infectious diseases and their prediction. During the spread of SARS, the BlueDot platform had already had positive results. In December 2019, BlueDot also raised the alarm about the severity of the coronavirus syndrome, which proved to be correct. Among the tools used by BlueDot, there are also techniques that go under the name NLP (Natural Language Processing) that process people’s languages and ways of expressing themselves.

Insilico Medicine is another company focused on disease prevention through artificial intelligence. Insilico Medicine is developing and applying next-generation artificial intelligence and deep learning approaches to every step of the drug discovery and drug development process. It is currently developing a technology that will inform doctors about molecules that can fight the coronavirus. After recently analyzing molecules, the system of Insilico Medicine, is able to provide feedback on those that are suitable to fight the coronavirus. The start-up is currently developing a database of information in vaccine development projects.

Satellite analysis technologies have seen WeBank researchers use them to identify hot spots in steel mills that have provided important information on the industry’s recovery.

In the early days of the epidemic, this analysis showed that steel production had dropped to a minimum of 29 percent of capacity. But by February 9, it had recovered to 76 percent. The researchers then looked at other types of production and commercial activities that use AI. One of the techniques was simply counting cars in large company car parks. This analysis showed that, as of February 10, Tesla car production in Shanghai had fully recovered, while tourism activities, such as Shanghai Disneyland, are still closed.

Side by side satellite images from December 30, 2019 (left) and January 29th, 2020, show that steel industry activity is still down in China [Source: spectrum.ieee.org].

By analyzing the GPS satellite data, it was possible to identify which people were commuting or not. The software then counted the number of commuters in each city and compared the number of commuters on a given day in 2019 and the corresponding date in 2020, starting on Chinese New Year. In both cases, Chinese New Year saw a huge drop in commuting, but unlike 2019, the number of people going to work did not recover after the holidays. As things slowly recovered, WeBank researchers calculated that by March 10, 2020, about 75 percent of the workforce had returned to work. Projecting from these curves, the researchers concluded that most Chinese workers, with the exception of Wuhan, will return to work by the end of March. Economic growth in the first quarter, according to their study, will be 36 percent.

Those attempting to respond to the coronavirus challenge have an important ally in technology. With solutions tested in emergency phases, could become a standard in the future.