Abstract

Transmission of Zika virus (ZIKV) in the Americas was first confirmed in May 2015 in northeast Brazil. Brazil has had the highest number of reported ZIKV cases worldwide (more than 200,000 by 24 December 2016) and the most cases associated with microcephaly and other birth defects (2,366 confirmed by 31 December 2016). Since the initial detection of ZIKV in Brazil, more than 45 countries in the Americas have reported local ZIKV transmission, with 24 of these reporting severe ZIKV-associated disease. However, the origin and epidemic history of ZIKV in Brazil and the Americas remain poorly understood, despite the value of this information for interpreting observed trends in reported microcephaly. Here we address this issue by generating 54 complete or partial ZIKV genomes, mostly from Brazil, and reporting data generated by a mobile genomics laboratory that travelled across northeast Brazil in 2016. One sequence represents the earliest confirmed ZIKV infection in Brazil. Analyses of viral genomes with ecological and epidemiological data yield an estimate that ZIKV was present in northeast Brazil by February 2014 and is likely to have disseminated from there, nationally and internationally, before the first detection of ZIKV in the Americas. Estimated dates for the international spread of ZIKV from Brazil indicate the duration of pre-detection cryptic transmission in recipient regions. The role of northeast Brazil in the establishment of ZIKV in the Americas is further supported by geographic analysis of ZIKV transmission potential and by estimates of the basic reproduction number of the virus.

a. The distribution of CT-values for the RT-qPCR+ samples tested during the ZiBRA journey in Brazil (n=181 samples; median CT = 35.96). b. shows the distribution of the temporal lag between the date of onset of clinical symptoms and the date of sample collection of RT-qPCR+ samples (median lag = 2 days). Red dashed lines represent the median of the distributions. (c) Validation of sequencing approaches. A phylogeny of the ZIKV Asian genotype estimated using PhyML is shown. The expanded clade highlighted in blue contains the WHO reference ZIKV sequence (accession number KX369547), which was generated using Illumina MiSeq. Sequences generated using MinION chemistries R9.4 2D, R9.4 1D, R9 1D, R9 2D and R7.3 2D contain no nucleotide differences and hence were also placed in this clade. Scale bars represent expected nucleotide substitutions per site (s/s). Am-ZIKV=American Zika virus lineage.

Temporal signal of the ZIKV Asian genotype. The correlation between sampling dates and genetic distances from the tips to the root of a maximum likelihood (ML) tree, estimated using PhyML, was explored using TempEst. a. Estimates for the dataset used in the phylogenetic analysis presented in . estimates for the same dataset with the addition of the P6-740 strain sampled in 1966 (accession number HQ234499).

A non-clock maximum likelihood phylogeny of our ZIKV data set. Bootstrap branch support values are shown at each node. The phylogeny was estimated using PhyML. Sequences generated in this study are highlighted in red. Scale bar represents expected nucleotide substitutions per site.

Epidemic growth rates estimated from weekly ZIKV notified cases in Brazil. Time series show the number of ZIKV notified cases in each region of Brazil. Periods from which exponential growth were estimated are highlighted in grey.

Seasonal suitability for ZIKV transmission in the Americas. These maps were estimated by collating data on Aedes mosquitoes, temperature, relative humidity and precipitation, and are the basis of the trends in suitability for different regions shown in main text and . For method details, see ,.

Partial autocorrelation functions for the linear model associating climatic suitability and ZIKV notified cases in each geographic region in Brazil. The residuals for the North, Northeast, Centre-West and Southeast regions show no autocorrelation, while a small amount of autocorrelation cannot be excluded for the South region.

a. Sampling location of genome sequences from Brazil and the Americas. Federal states in Brazil are coloured according to 5 geographic regions (lower inset). A red line surrounds the states surveyed by the ZiBRA mobile lab in 2016. State codes are PA=Pará, MA=Maranhão, CE=Ceará, TO=Tocantins, RN=Rio Grande do Norte, PB=Paraíba, PE=Pernambuco, AL=Alagoas, BA=Bahia, RJ=Rio de Janeiro, SP=São Paulo. Underlined states represent those from which sequences in this study were generated (upper inset). Publicly available sequences were also collated from non-underlined states. b. Confirmed and notified ZIKV cases in NE Brazil. Upper panel shows the temporal distribution of RT-qPCR+ cases detected during ZiBRA fieldwork. Only samples with known collection dates are included (n=138 out of 181 confirmed cases). Lower panel shows notified ZIKV cases in NE Brazil between 01 Jan 2015 and 19 Nov 2016 (n=122,779). The dashed line represents the average climatic vector suitability score for NE Brazil (Methods). The vertical arrow indicates date of ZIKV confirmation in NE Brazil/Americas. c. Notified ZIKV cases in the Centre-West, Southeast, North, and South regions of Brazil (clockwise from top left). The dashed lines represent the average climatic vector suitability score for each region.

a. The percentage of ZIKV genome sequenced plotted against RT-qPCR Ct-value, for each sample. Each circle represents a sequence recovered from an infected individual in Brazil and is coloured by sampling location. b. Illustration of sequencing coverage across the ZIKV genome for the ZiBRA sequences, including data generated by both mobile and static laboratories. c. Regression of sequence sampling dates against root-to-tip genetic distances in a maximum likelihood phylogeny of the Asian-ZIKV lineage. contains a comparable analysis that also includes P6-740 (the oldest Asian-ZIKV strain collected in 1966). d. Average pairwise genetic diversity of the PreAm-ZIKV strains (grey line) and of the Am-ZIKV lineage (black line), calculated using a sliding window of 300 nucleotides with a step size of 50 nucleotides.

Maximum clade credibility phylogeny, estimated from complete and partial Am-ZIKV genomes using a molecular clock phylogeographic approach (Methods). Terminal branches with yellow circles indicate sequences reported in this study. Terminal branches with no circles and reduced opacity are those reported in a companion paper. Thin vertical grey boxes indicate statistical uncertainty of estimated dates of nodes A, B and C (). Branch colours indicate the most probable ancestral lineage locations. Diamonds at internal nodes are sized in proportion to clade posterior probabilities. For selected nodes, coloured numbers show the posterior probabilities of ancestral locations and numbers in grey are clade posterior probabilities. Asterisks indicate the three available genomes from microcephaly cases. A black arrow indicates the oldest Brazilian ZIKV sequence. The grey arrow and dotted line denotes when ZIKV was first confirmed in the Americas. Nodes A and B are equivalent to the nodes named identically in. Text labels along the bottom of the figure denote clades of sequences from regions outside of NE Brazil. RJ1 to RJ4 are clades from Rio de Janeiro state, TO from Tocantins, and SP1 from São Paulo state. Clades from outside Brazil are denoted CB1 and CB2 (Caribbean), SA1 and SA2 (South America excluding Brazil), and CA1 (Central America). Thin grey horizontal lines along the bottom of the figure denote sequences from Brazil.

The earliest inferred dates of lineage export to non-Brazilian regions, represented by box-and-whisker plots. Each plot corresponds to the earliest movement between a pair of locations with well-supported virus lineage migration. The first exports to South America outside Brazil (SA1 in ), to Central America (CA1) and to the Caribbean (CB1) are shown in panels a–c, respectively. Box and whisker plots were generated in ggplot2, with boxes representing the median and interquartile ranges of the estimated date of earliest movement. In each of a–c, dashed lines show the estimated climatic vector suitability score for each recipient region, averaged across the countries for which sequence data is available (see Methods). In each of a-c, the bar plots show available notified ZIKV case data (plots adapted from PAHO) for the countries with the earliest confirmed cases (Colombia in panel a, Mexico in b, and Puerto Rico in c). Coloured arrows indicate the earliest confirmation of ZIKV autochthonous cases in each non-Brazilian region. The vertical dashed line represents the date of ZIKV confirmation in the Americas.