Department of Physics and Department of Systems Biology, Columbia University, New York, New York, USA.

2

Department of Systems Biology and Department of Biomedical Informatics, Columbia University, New York, New York, USA.

Abstract

BACKGROUND:

Viral outbreaks, such as the 2014 ebolavirus, can spread rapidly and have complex evolutionary dynamics, including coinfection and bulk transmission of multiple viral populations. Genomic surveillance can be hindered when the spread of the outbreak exceeds the evolutionary rate, in which case consensus approaches will have limited resolution. Deep sequencing of infected patients can identify genomic variants present in intrahost populations at subclonal frequencies (i.e. <50%). Shared subclonal variants (SSVs) can provide additional phylogenetic resolution and inform about disease transmission patterns.

METHODS:

We use metrics from population genetics to analyze data from the 2014 ebolavirus outbreak in Sierra Leone and identify phylogenetic signal arising from SSVs. We use methods derived from information theory to measure a lower bound on transmission bottleneck size.

RESULTS AND CONCLUSIONS:

We identify several SSV that shed light on phylogenetic relationships not captured by consensus-based analyses. We find that transmission bottleneck size is larger than one founder population, yet significantly smaller than the intrahost effective population. Our results demonstrate the important role of shared subclonal variants in genomic surveillance.

A) Consensus-based distances have inherent resolution of a single nucleotide; however, intrahost subclonal variants provide subnucleotide resolution. Red dots represent pairwise consensus distances and blue dots represent pairwise Nei's genetic distances incorporating subclonal variants. (Here, we only show data collected from the chiefdom of Jawie.) B) There was minimal rise in intrahost genomic diversity during the course of the disease. The dip at the third temporal sample in G4769 corresponded to its lower sequencing depth and less sensitivity in identifying variants compared to patient's other samples. The relative rise in diversity in G3676 corresponded to 1-2% change in frequency of six variants, still within their allele frequency confidence intervals. C) Samples that shared subclonal variants also had similar consensus genomes. In 26 pairs with SSV (Table 2), the mean pairwise consensus distance was significantly smaller than that of pairs with no SSV (<1 nucleotide in SSV pairs versus >2 in non-SSV pairs, rank-sum test p-value: 5.1e-08).

The left panel shows a neighbor-joining tree constructed using the consensus sequence for each sample. The right panel shows a neighbor-joining tree constructed from incorporating SSVs and distances computed using Nei's standard method. Specific differences between the two are highlighted and assigned cluster names within their respective clade.