UROP Openings

Using Machine Learning and Electronic Health Record to Simulate Clinical Trials to Re-purpose Drugs for Unmet Medical Needs

Term:

Summer

Department:

15: Management

Faculty Supervisor:

Row Welsch

Faculty email:

rwelsch@mit.edu

Apply by:

May 1, 2020

Contact:

rwelsch@mit.edu

Project Description

The aim of the project is to develop and validate
methods to repurpose FDA approved drugs, drawing on concepts from statistics,
data science, and machine learning, applied to a large electronic health
records (EHR) dataset. The specific context of our work will be an effort to
repurpose medicines that are currently FDA approved and marketed for certain
conditions that can be shown to offer therapeutic value in treating
significant unmet medical needs, including Alzheimer's Disease and cancer. We
aim to use observational data to construct and compare cohorts of patients in
a fashion that emulates clinical trials, effectively conducting
“in-silico” or “synthetic trials.” For conducting this work, we are
able to access the UK’s Clinical Practice Research Datalink (CPRD) that
chronicles some 20 million persons who received primary medical care over a
period as long as thirty years; the Explorys EHR database containing records for over
50 million patients; Medicaid and MarketScan insurance claims records of over 100
million patients; and linked claims and EHR data for a subset of approximately 5 million
patients. EHRs and claims data offer great potential in CER, however
they have many issues that make analyses challenging and complicated. The
data sets are large and getting larger; there is a significant amount of
missing data; it is high dimensional and the dimensionality is growing
rapidly; there are errors and outliers; some patients enter the database and
then leave or leave and come back; patients enter the data base at varying
times in their life and new patients are always arriving and others leave as
they die or move away. Our aims are to develop analytical methods that
address these issues and facilitate rigorous comparison of the clinical
effectiveness of candidate drugs with a reference therapy for selected
medical disorders. We expect to make contributions facilitating
development of analytical strategies suitable for application to large numbers of clinical
studies at once, without relying extensively on clinical judgment for each analysis.

Pre-requisites

Interest in medicine or health care and facility with
large datasets; experience with SQL and Python or R