Monday, April 17, 2017

Online Risk Mapping Tool Prototype Now Available

Infectious disease risk in the world is heterogeneous. In each country, most of the disease transmission can be concentrated in certain states and sometimes certain districts; disease control efforts thus benefit from focusing their efforts in the areas where transmission is most intense. A disease risk model can help to identify areas where the disease currently is and where the disease is going to be, by combining data about past disease incidence, past control efforts, and other ancillary data sources like sanitation, population density, or air temperature.

The RiskMapper tool was built to demonstrate that it is possible to build disease risk models quickly and easily. Using past incidence data, the current prototype can build a risk model in approximately 15 minutes from start to finish. The results can then be exported in multiple formats, so that they can be used in presentation or other analyses. The RiskMapper also includes an interactive visualization tool to map disease incidence and predicted risk so that users can interact with their data and with the results of the risk models constructed.

This is made possible, on the technical side, by a platform that is hosted on multiple AZURE virtual machines. This shortens wait time by enabling the different steps of the workflow (data cleaning, aggregation, model running) to be parsed and run in parallel, and to enable multiple users to run the tool simultaneously. Also, a separated web server is dedicated to serving the website pages and ensure a responsive experience even when the system is busy.

The core of the project is leveraging the versatility and robustness of Python as well as its flexiblity to stitch together a large array of technologies. The data clean-up software is written in C#, the aggregation script is in Python, the three models are run using R, and the visualizations are put together with JavaScript and HTML. Additionally, a Python daemon was created to intelligently distribute tasks to the different servers and minimize the wait time for the users.

Our goal is to use this demonstration to test the idea of providing risk modeling as an automated service, on demand, to users that could be around the world, in multiple time zones, without latency. Although currently the risk models included in the tool are simple and data input is limited to disease incidence, several improvements are possible in the future: for example, importing multiple data sources, adding a larger library of risk models, or having more data visualization options. The prioritization of these improvements will depend on what would be most useful to the users; the feedback from participants of the RiskMapper workshop will help to determine where this tool will evolve to next.