Infectious diseases surveillance programs provide public health workers with important information for predicting and understanding emergence of epidemics, allowing for timely allocation of resources needed to contain the epidemics. Collecting disease agent molecular sequence information is becoming widespread, especially in surveillance of infectious diseases caused by RNA viruses, such as influenza. Phylodynamics is an emerging statistical framework that allows epidemiologists to harness information present in disease agent sequences in order to shed light on spatio-temporal population dynamics of these agents. Although sophisticated Bayesian inferential tools for phylodynamics have emerged in the last decade, these tools concentrate on sequence data alone, failing to integrate other sources of information (e.g. incidence time series data) into the phylodynamic framework. We claim that integrating multiple sources of information will make phylodynamic inference more precise, allowing for sharper predictions of disease dynamics and for statistical testing of scientific hypotheses. To test this assertion, we propose a series of new statistical methods for integration of multiple sources of information into Bayeisan infectious disease phylodynamics. We will start by developing a new Bayesian method for estimation of population dynamics directly from genomic data that combines the coalescent process, a powerful tool from population genetics, with modern Gaussian process-based Bayesian nonparametric inference (Aim 1). Our preliminary results show that the new method is more accurate than state-of-the-art Bayesian phylodynamics methods. Moreover, the proposed Gaussian process framework will liberate us from drawbacks of the current methodology and will allow us to extend this approach further to estimate correlations between the population size fluctuations and other time-varying variables of interest (Aim 2). This extension is significant, because estimating such correlations is of paramount importance to infectious disease epidemiologists and because all current phylodynamic methods are incapable of such estimation. We will also develop a new model to confront currently ignored dependence of times at which disease agent sequences are sampled on the disease dynamics (Aim 2). Explicit modeling of these sampling times should improve both accuracy and precision of the phylodynamic inference. In all our modeling efforts, we will pay close attention to computational feasibility of the proposed methods by designing efficient Markov chain Monte Carlo algorithms to perform Bayesian inference. To test our new methodology we will analyze benchmark infectious disease data sets, where available external information about disease dynamics will help us validate our methods. In addition, we will mine publicly available databases in order to perform novel data analysis using our newly developed methodology (Aim 3). One of the main deliverables of this research will be open source software, implementing the proposed new Bayesian phylodynamic methods for integration of infectious disease sequence data with other sources of information. 1

Public Health Relevance

Monitoring infectious disease dynamics is important for timely detection of infectious disease epidemics and for organizing timely public health response to these epidemics. Disease agent sequence data is becoming an important source of information in the infectious disease surveillance programs. We propose a series of new statistical methods for analyzing such sequence data. This new statistical methodology will enable epidemiologists to elucidate population dynamics of infectious disease agents and to integrate sequence data with other data collected during infectious disease surveillance programs. 1