ABSTRACT

Since the India and Indian Ocean outbreaks of 2005 and 2006, the global distribution of chikungunya virus (CHIKV) and the locations of epidemics have dramatically shifted. First, the Indian Ocean lineage (IOL) caused sustained epidemics in India and has radiated to many other countries. Second, the Asian lineage has caused frequent outbreaks in the Pacific islands and in 2013 was introduced into the Caribbean, followed by rapid spread to nearly all of the neotropics. Further, CHIKV epidemics, as well as exported cases, have been reported in central Africa after a long period of perceived silence. To understand these changes and to anticipate the future of the virus, the exact distribution, genetic diversity, transmission routes, and future epidemic potential of CHIKV require further assessment. To do so, we conducted the most comprehensive phylogenetic analysis to date, examined CHIKV evolution and transmission, and explored distinct genetic factors associated with the emergence of the East/Central/South African (ECSA) lineage, the IOL, and the Asian lineage. Our results reveal contrasting evolutionary patterns among the lineages, with growing genetic diversities observed in each, and suggest that CHIKV will continue to be a major public health threat with the potential for further emergence and spread.

IMPORTANCE Chikungunya fever is a reemerging infectious disease that is transmitted by Aedes mosquitoes and causes severe health and economic burdens in affected populations. Since the unprecedented Indian Ocean and Indian subcontinent outbreaks of 2005 and 2006, CHIKV has further expanded its geographic range, including to the Americas in 2013. Its evolution and transmission during and following these epidemics, as well as the recent evolution and spread of other lineages, require optimal assessment. Using newly obtained genome sequences, we provide a comprehensive update of the global distribution of CHIKV genetic diversity and analyze factors associated with recent outbreaks. These results provide a solid foundation for future evolutionary studies of CHIKV that can elucidate emergence mechanisms and also may help to predict future epidemics.

INTRODUCTION

Chikungunya virus (CHIKV) (family Togaviridae, genus Alphavirus) is a mosquito-borne virus that causes a fever-rash-arthralgia syndrome in humans, with debilitating joint pain lasting up to several years; the disease is referred to as chikungunya fever (CHIKF) (1). Along with dengue, Zika, and yellow fever viruses, CHIKV is transmitted in human populations by two broadly distributed mosquito vectors, Aedes aegypti and Aedes albopictus (2–5). Recently, CHIKV established endemic transmission and outbreaks in Africa, Asia, Europe, the Americas, and the Pacific archipelago (6). The rapid expansion of CHIKF, together with its severe morbidity and economic burden, make CHIKV one of the most important arthropod-borne viruses (arboviruses) and a major global health threat.

After suspected outbreaks in Asia and the Americas during the 18th and 19th centuries (7), CHIKV was first isolated during a 1952-1953 epidemic in present-day Tanzania (8, 9). Subsequently, numerous sporadic CHIKF outbreaks have been noted in Africa, with larger epidemics reported in Thailand in the late 1950s and early 1960s and in India from the early 1960s to the early 1970s (10). Whereas the Indian strains disappeared for unknown reasons, the Thailand strains have continued to circulate in Southeast Asia. Phylogenetic studies have revealed that these early CHIKV strains form three major, geographically distinct lineages: the West African enzootic, East/Central/South African (ECSA) enzootic, and Asian endemic/epidemic lineages (11). The enzootic lineages involve sylvatic transmission by arboreal Aedes mosquitoes among nonhuman primates with occasional spillover to humans, while the endemic/epidemic lineages are transmitted human to human by A. aegypti and, recently, A. albopictus mosquitoes in peridomestic settings (10, 12). Bayesian coalescent analysis has estimated that the Asian and ECSA lineages diverged about 100 years ago, although it is unclear exactly when the Asian lineage was introduced from Africa (13).

The recent spread of CHIKV, particularly two major reemergence events with explosive and extensive spread through long-range international transmission, has brought CHIKV to the forefront of global health attention (14). The first recent emergence involving the ECSA lineage was first recognized with an outbreak in coastal Kenya in 2004, followed by spread to Indian Ocean islands, such as Comoros and La Reunion, causing tens to hundreds of thousands of infections on each island (15). In 2006, this newly emerged epidemic Indian Ocean lineage (IOL) strain spread from East Africa to India, leading to over 1 million cases in the first year alone (16). From India, the IOL spread rapidly to initiate transmission in Southeast Asia, Italy, and France, with exported cases reaching most parts of the world (17, 18).

The second recent CHIKV reemergence involved the endemic Asian lineage. After several decades with few reports of outbreaks or cases in Asia, strains of this lineage caused a series of outbreaks in Malaysia (2006) and Pacific islands, such as New Caledonia (2011); Papua New Guinea (2012); Yap in the Federated States of Micronesia (2013); Tonga, Samoa, American Samoa, Tokelau, and French Polynesia (2014); and Kiribati and the Cook Islands (2015) (5). Then, an Asian lineage strain was detected in the Caribbean island of St. Martin in late 2013 (19), followed by a large-scale epidemic with spread to 26 islands and 14 mainland American countries within 1 year, resulting in more than 1 million suspected cases. As of August 2016, approximately 2 million cases were suspected in 45 countries and territories, according to the Pan American Health Organization. In addition, circulation of a newly imported ECSA strain was also reported for the first time in the Americas, in Brazil in 2014 (20). The virus was imported by a traveler from Angola, but its subsequent distribution in Brazil or elsewhere in the Americas has not been reported.

With these dramatic changes in the global distribution of CHIKV and the emergence of major epidemics, patterns of regional CHIKV circulation, including those of different lineages, paths of spread, evolution, and adaptation, as well as future epidemic risks, require comprehensive assessment. Critical questions include the following. (i) What is the genetic diversity of CHIKV in Africa? Where are the enzootic lineages presently distributed, and how likely is another exportation and global emergence of an enzootic lineage from Africa? (ii) Since the Indian Ocean and Asian outbreaks, where and by what paths has this IOL continued to spread? Furthermore, previous studies have identified convergent selection of a substitution in the E1 envelope glycoprotein (E1-A226V) in multiple clades of the IOL, followed by several E2 envelope glycoprotein substitutions that further adapted some IOL strains to a novel mosquito vector, A. albopictus, contributing to the scale and range of Asian epidemics (21). However, it is unclear what role IOL strains without these adaptive mutations played in the Indian subcontinent and subsequent Southeast Asian outbreaks and to what extent they continue to circulate. (iii) Despite numerous reports of CHIKF outbreaks in the Western Hemisphere associated with the Asian lineage (19, 20, 22–25), their exact transmission route, as well as their phylogenetic relationships with strains from the Pacific islands, have not been fully resolved. It is also unclear whether the emergence of the Asian lineage in the Western Hemisphere was purely a stochastic event or was facilitated by adaptive CHIKV evolution for more efficient urban transmission. Finally, examination of the interplay between Asian and IOL epidemic lineages in Southeast Asia could help us to predict the future distribution of CHIKV genetic diversity and the outcome when multiple lineages cocirculate and potentially compete sympatrically.

To address these questions and provide a timely update on global CHIKV activity, we conducted a comprehensive phylogenetic analysis examining the distinct genetic factors associated with the evolution and emergence of the ECSA and Asian lineages and the IOL. Our results elucidated dramatic movements and previously unsampled genetic diversity, indicating that CHIKV will continue to be a major reemerging global public health threat.

MATERIALS AND METHODS

Virus samples and Illumina sequencing.Twelve CHIKV strains sequenced in this study, listed in Table 1, were obtained from the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA) at the University of Texas Medical Branch (UTMB), Galveston, TX. RNA was extracted using TRIzol (Invitrogen, Carlsbad, CA) according to the manufacturers' protocols. Illumina sequencing was performed at the J. Craig Venter Institute (JCVI) or the Genomic Core Facility at UTMB.

At JCVI, sequence-independent single-primer amplification (SISPA) was performed on purified RNA as previously described (26, 27). Barcoded SISPA products were pooled, and adaptors were ligated following the manufacturers' instructions (Illumina, San Diego, CA). Libraries were sequenced on the Illumina MiSeq (2- by 300-bp paired end [PE]). A combination of JCVI- and CLCbio-developed tools were used to process read data from all next-generation sequencing (NGS) runs (26, 28).

At UTMB, libraries were prepared using the TruSeq RNA sample preparation kit, and Illumina sequencing was performed under the recommended conditions (Illumina). Cluster formation of the library DNA templates was performed using the TruSeq PE Cluster kit v3 and the Illumina cBot workstation under the recommended conditions. Paired-end 50-base sequencing by synthesis was performed using the TruSeq SBS kit v3 on an Illumina HiSeq 1500, following protocols defined by the manufacturer. Sequence analysis and assembly were completed at CBio Inc. (Fremont, CA). Briefly, sequences were first subjected to a BLAST analysis to determine the numbers of viral reads and contaminating reads from hosts or other sources. Next, because virus stocks were prepared on Vero cells, the contaminating reads were filtered using the African green monkey (Chlorocebus sabaeus) genome as a template. The remaining Fastq files were further processed using Trimmomatic (72) to remove low-quality sequences. Finally, assembly was performed using iMetAMOS (http://www.cbcb.umd.edu/software/imetamos) following the standard parameters.

Sequencing of the 3′ UTR.To confirm the 3′ untranslated genome region (UTR) structure in the Caribbean/Latin American (referred to below as American) outbreak lineage strains, we sequenced the 3′ UTRs of 10 additional samples (Table 2) by the Sanger method. CHIKV RNAs used in this study were directly purified from human serum or from the supernatants of cell cultures after up to two passages in Vero cells. The 3′ UTR was amplified by the primer pair Ch21 + 10000 (5′-CACGTAACAGTGATCCCG-3′) and Cons-20mer (5′-TTTTTTTTGAAATATTAAAAACAAAATA-3′) using the Titan One Tube RT-PCR system (Roche, Mannheim, Germany). The amplicons were subsequently gel purified and sequenced using the BigDye Terminator v3.1 cycle-sequencing kit (Applied Biosystems, Foster City, CA) and an ABI Prism model 3100 genetic analyzer (Applied Biosystems). Then, the sequences were edited and assembled in Sequencher v4.9 (Gene Codes, Ann Arbor, MI) and deposited in the GenBank database (Table 2).

Phylogenetic analysis. (i) Data set construction.First, all available CHIKV complete genomic sequences (excluding vaccine and cloning vector strains) available as of February 2016 were downloaded from the GenBank library, aligned using MUSCLE (29), and manually adjusted in Se-Al (available at http://tree.bio.ed.ac.uk/software/seal/) according to amino acid sequence alignments to preserve codon homology. Due to the ambiguous alignments of the 3′ UTR and the rapid evolution of this genome region, only the open reading frames (ORFs) comprising an alignment of 11,097 nucleotides (nt) were used for phylogenetic tree reconstruction. Second, sequences potentially resulting from artifacts were examined and removed based on a maximum-likelihood (ML) guide tree reconstructed using an initial neighbor-joining tree and a nearest-neighbor interchange (NNI) branch-swapping method implemented in PAUP* (30), based on a GTR+I+Γ nucleotide substitution model selected by ModelTest (31, 32). These sequences included (i) India/MH4/2000 (nearly identical to Uganda/UgAg4155/1982, suggesting laboratory contamination), (ii) India/STMWG01/2011 (similar to Senegal/A301/1963, also suggesting contamination), and (iii) India/STMWG02/2011 (potentially an assembly error that probably caused in silico recombination between an IOL strain and Senegal/A301/1963). Finally, a few sequences with missing sample years were omitted from the analysis. This resulted in a final data set of 309 sequences. For BEAST analysis, which involves substitution rate estimations, the highly passaged Angola/M2022/1962 strain, which includes an abnormally long terminal branch, probably representing passage-adaptive or random mutations, was removed.

(ii) Phylogenetic reconstruction.A maximum-likelihood tree was first constructed for the complete data set using the methods described above. Due to the preponderant number of sequences in ECSA/IOL (180) and Asian (116) lineages over West African (13) lineages and the lack of new sequences from the last group since our previous 2010 analysis (13), we performed independent analyses of ECSA/IOL and Asian lineages to better understand and compare their evolutionary dynamics using the Bayesian Markov chain Monte Carlo (MCMC) method available in the BEAST v1.8.2 package (18). To exclude the effect of rate variation along branches, which could cause inaccurate estimation of divergence times, we further divided the ECSA lineage into ECSA/African (21 sequences; strains other than IOL, except the two Kenya 2004 strains) and IOL (161 sequences, with two Kenya 2004 strains) data sets and the Asian lineage into Asian/Old (17 sequences; strains before 1996) and Asian/Reemergence (99 sequences; clades I and II, after 2006) data sets. First, to determine the extent of the temporal structure of the sequences and consequently the reliability of our estimates of substitution rates and times to the most recent common ancestors (tMRCAs), we performed a regression analysis of tree root-to-tip genetic distances against sampling dates using the program Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/ [17]). This analysis was performed for each lineage based on the corresponding ML tree generated using the methods described above. A strong temporal structural (R2 > 0.73) was observed in each data set. We did not estimate the evolutionary rate for the American lineage due to its lack of temporal structure (R2 = 0.15), reflecting the short period of circulation in the Western Hemisphere.

All the CHIKV strains were dated according to the year and month (if known) of their collection; strains with only a collection year were assigned the median-year value of July. For each data set, BEAST analysis was performed based on either a relaxed molecular clock (uncorrelated lognormal) or a random local clock (33), with the substitution model recommended by jModelTest (34) (GTR+I+Γ for ECSA/IOL and GTR+Γ for the Asian lineage). Two demographic models, Bayesian Skyline and Bayesian Skyride, were employed under each clock model, and the results were compared. In each case, MCMC chains were run for a sufficient time to achieve convergence (accessed using the Tracer program [http://tree.bio.ed.ac.uk/software/tracer]). Statistical uncertainty in parameter estimates was reflected as 95% highest probability density (HPD) values. The maximum clade credibility (MCC) tree across all of the plausible trees generated by BEAST was then computed using the TreeAnnotator program available in the BEAST package, with the first 10% of trees removed as burn in. To illustrate the geographical locations of sample strains and potential transmission relationships, we colored the external branches according to the country/region of origin for each CHIKV strain, with the colors of the internal branches estimated according to the parsimony principle.

RESULTS

Evolutionary rates of different CHIKV lineages.To compare the evolutionary patterns of ECSA and Asian lineages, including their different evolutionary periods, and to accurately estimate the tMRCA and divergence time along their evolution history, we divided ECSA into African and IOL data sets, as illustrated in Fig. 1A. The Asian lineage was divided into old (before 1996) and reemerging (after 2006) data sets (Fig. 1A). Phylogenetic estimations, including the MCC tree, substitution rates, and tMRCAs, were performed based on the complete and subdivided data sets for each lineage, with the results summarized in Table 3. Overall, uncorrelated relaxed clock (lognormal) and random local clock models estimated similar evolutionary rates in the range of 1.80 × 10−4 to 8.46 × 10−4 substitutions/site/year. Significant rate variation was observed between African (1.80 × 10−4 to 2.77 × 10−4 substitutions/site/year) and IOL (5.81 × 10−4 to 8.46 × 10−4 substitutions/site/year) subsets in the ECSA lineage, with the latter ∼3 times higher than the former. In contrast, the differences in evolutionary rates between the old (1958 to 1995) and reemerged (2006 to 2015) subsets in the Asian lineage were smaller: 2.95 × 10−4 to 5.46 × 10−4 substitutions/site/year for old and 3.87 × 10−4 to 7.38 × 10−4 substitutions/site/year for reemerged, although the latter had a larger 95% HPD value than the former.

Evolution of the CHIKV Asian lineage. (A) Maximum-likelihood tree based on concatenated ORFs. The branch lengths reflect genetic distances. Bootstrap values are labeled for major lineages and clades. (B) MCC Bayesian tree of the Asian CHIKV lineage based on random local clocks and the Skyline population model using all Asian lineage strains. Branch lengths are scaled to the sampling and divergence times, and the branches are color coded for sample location. Thick gray horizontal node bars representing 95% HPD values of the node height are shown only for those with a posterior probability of ≥90.

Comparison of nucleotide substitution rates and tMRCAs among different CHIKV lineages based on different data sets and demographic models

Correspondingly, the large rate variation between the African and IOL subsets in the ECSA lineage made estimation of the tMRCA and divergence time less reliable using the total data set under the uncorrelated relaxed clock (lognormal) model, where evolutionary rates along each branch are assumed to be clustered around a mean (35). The most recent common ancestor of the ECSA lineage was dated to 1932 to 1951 or 1948 to 1953 based on the total data set according to the Bayesian Skyline or Skyride demographic model, respectively. However, it was dated to 1901 to 1932 or 1904 to 1932 based on the African subset only. This result not only was ∼30 years earlier than that estimated with the total data set, but also showed no significant difference between the two demographic models. The result indicates that the sylvatic ECSA lineage in Africa and the epidemic IOL have significantly different evolutionary patterns, as discussed in our previous study (13). In contrast, the tMRCAs of the American lineage estimated by the reemerged (2012.81 to 2013.74, where numbers after the decimal point represent a fraction of the year; e.g., 81 is in October) and total (2011.80 to 2013.72) Asian data sets were similar. However, the discrepancy in time scale estimation in the ECSA lineage between different data sets was resolved by the random local clock model, which considers rate variation among lineages. Specifically, it provided not only nearly identical estimates of the tMRCA of the IOL based on the total (2003.55 to 2004.38) and IOL (2003.31 to 2004.47) data sets, but also similar estimates of that of the ECSA lineage based on the total (1900 to 1924) and African (1901 to 1932) data sets. However, due to the complexity of the model, convergence was not achieved when the Skyride demographic model was implemented for the total ECSA data set.

Surprisingly, a lower substitution rate was observed in the reemerging Asian lineage (3.87 × 10−4 to 7.38 × 10−4 substitutions/site/year) than in the IOL (5.81 × 10−4 to 8.46 × 10−4 substitutions/site/year) despite the fact that both caused urban epidemics. The different evolution rates of the IOL versus the American lineage may reflect their different evolutionary mechanisms (such as adaptive evolution versus population bottlenecks) and/or transmission patterns, as well as amounts of replication; these hypotheses deserve further study.

Evolution of the Asian CHIKV lineage.Overall, the Asian CHIKV lineage in the Eastern Hemisphere revealed a temporal/spatial pattern of evolution with infrequent international transmission. Based on our phylogenetic tree (Fig. 1B), after the initial introduction during or before the 1950s, the lineage circulated in South and Southeast Asia as geographically segregated sublineages. The Indian sublineage has not been sampled since 1973, presumably reflecting its extinction. Although the Southeast Asian sublineage has not been sampled in Thailand since 1995, it was introduced to Indonesia in the early 1980s and was presumably circulating endemically in the region with occasionally reported cases, such as those in Malaysia in 1998 (36). Since 2006, two deeply divergent clades of the Asian lineage have circulated in Southeast Asia and the Pacific islands, both originating from a 1980s Indonesia strain or a close relative. Clade I was further subdivided into two subclades: one contained strains from 2006-2007 Malaysian outbreaks, and another was sampled from New Caledonia in 2011 and Indonesia in 2013. Clade II contains 2006 Indonesian, 2012 China, 2013 Philippines, 2013 Micronesia, and 2014 American Samoa CHIKV strains. These data suggest that transmission of the Asian lineage in the Eastern Hemisphere has shifted to Indonesia, the South Pacific islands, and probably the Philippines. However, other regions of endemicity may also remain without sampling.

The American CHIKV lineage apparently originated from Asian clade II, with the MRCA estimated between November 2011 and October 2013 (Table 3). Regardless, the ancestral strain of the American outbreaks diverged from those South Pacific strains a few years before the 2013 Caribbean introduction without detection.

Notably, all Asian lineage CHIKV strains harbor the E1-A98T substitution, which constrains the penetrance of the E1-A226V substitution that mediates adaptation to the novel mosquito vector A. albopictus in some IOL strains (37). This finding, coupled with a lack in any American CHIKV strain of E1 and E2 A. albopictus-adaptive mutations seen in IOL strains (21), as well as a lack of field data implicating A. albopictus in transmission in the Americas, corroborates the prediction (38) that this epistatic constraint on the American lineage will restrict it mainly to A. aegypti transmission.

CHIKV transmission in the Americas.Due to the rapid spread and explosive transmission of the American CHIKF outbreaks, accompanied by insufficient informative (synapomorphic) mutations, the evolutionary history within the American lineage was not fully resolved. This was reflected by the low posterior support (<90) of most clades and clusters, indicated by the lack of node bars in Fig. 1B. Nevertheless, multiple American clades were observed, with strains from Martinique and Guadeloupe appearing in the basal position, suggesting an initial Caribbean source of the outbreak. It is unclear, however, whether this basal position relative to St. Martin is correct due to the paucity of strains from the latter location, where CHIKV was first reported in the Americas. From there, American lineage strains spread to other Caribbean islands and then to Central, North, and South America. This multiclade structure indicated that initial transmission was multidirectional. Each of these American clades included strains from multiple countries or territories, and several countries had cocirculating strains from different clades, such as Mexico, Trinidad, Brazil, and Colombia.

In summary, CHIKV strains in the Western Hemisphere did not form geographically isolated clades but rather represented an intertwined transmission network among different countries and regions. Surprisingly, a strain from French Polynesia collected in January 2015 also fell within the American lineage, with a Virgin Islands October 2014 strain as its closest neighbor. This indicates that CHIKV transmission between the South Pacific and Caribbean islands may occur frequently and bidirectionally, suggesting that the initial Caribbean outbreak strain could have originated from a South Pacific island.

Unique genetic structures of the American CHIKV lineage.Two distinct genetic characters were observed in the American lineage. As previously reported, a deletion in nsP3 codons 379 to 382 was observed in all American lineage strains, along with other Asian clade II strains (Fig. 1B). Surprisingly, a larger, overlapping deletion in nsP3 codons 376 to 382 was observed in another clade currently circulating in Indonesia and New Caledonia, as well as in some 2006-2007 Malaysian strains. It is not clear if these two deletions share an origin (with the shorter 379-to-382 deletion presumably preceding a subsequent 376-to-378 deletion) and why only some of the Malaysian 2006-2007 strains have the deletion. Based on the parsimony principle, recombination between a Malaysian CHIKV strain and the nsP3 376-to-382 deletion and another cocirculating strain without the deletion could have generated the Malaysian 2006-2007 strains lacking the deletion.

In addition to the nsP3 deletion, an insertion of 177 nt (39), presumably resulting from duplication of the immediately adjacent region, was observed in the 3′ UTRs of all American CHIKV strains (Fig. 1B). Due to the assembly of Illumina reads using a non-Caribbean reference strain, this insertion was overlooked in the early Caribbean sequences deposited in GenBank. Independent reports of an elongated 3′ UTR in a few Brazilian and Caribbean strains prompted us to confirm by Sanger sequencing that all American 3′ UTRs have the same insertion. As indicated in Fig. 1B, these strains were distributed in all American lineage clades, as well as at the basal position. This finding, together with the absence of the insertion in Asian lineage strains from the Eastern Hemisphere, indicates that the insertion most likely occurred as a founder effect during the introduction from the Eastern Hemisphere, presumably by an infected traveler or mosquito.

ECSA lineage in Africa and new exportations.The ECSA lineage was named based on its initial detection mainly in East, Central, and South Africa. Despite the recent 2004 epidemic in East Africa (coastal Kenya), giving rise to the Indian Ocean and Indian subcontinent epidemics, little information is known about (i) where ECSA strains are currently circulating enzootically and/or endemically; (ii) whether the Kenya 2004 strains spread to other African regions; and (iii) if the IOL has spread in Africa and potentially replaced the historical ECSA strains there, given its dramatic spread in Asia, Oceania, and Europe. The availability of newly derived sequences of ECSA origin allowed us to address these questions.

Based on our phylogenetic tree (Fig. 2A), ECSA strains have not been collected from South Africa and analyzed since 1976. Our results also provide no evidence for further spread of the Kenya 2004 strain in Africa or of the reintroduction of IOL strains into Africa. Rather, newly identified ECSA strains/sublineages were observed in Central Africa (Cameroon, Gabon, and Congo), China, and Brazil. All of these strains originated from a Central African clade, and the lack of close relatives of the strains indicates cryptic endemic/enzootic transmission in the region. Strikingly, instead of forming a single clade, these newly sampled strains were all deeply rooted in the 1978-1982 Central African clade as three independent branches. This finding, together with the deep divergence between the Central African clade and the IOL-ancestral Kenya 2004 strain, suggests that East and Central Africa harbor rich, largely unsampled CHIKV genetic diversity, with deeply divergent strains cocirculating in these regions. The repeated exportation of ECSA strains to Asia (13), Indian Ocean islands (15), Europe (40–42), and South America (20), as well as ever-increasing intercontinental transportation, suggests that the ECSA lineage will seed additional epidemics in the near future. Importantly, the Cameroon (2006), Gabon (2007), and Congo (2011) strains derived from the Central African clade of the ECSA lineage all contain the E1-A226V A. albopictus-adaptive substitution, suggesting a high risk of urban transmission in locations where this vector is more abundant than A. aegypti, such as southern Europe (43).

Evolution of the CHIKV ECSA lineage. (A) MCC tree of the CHIKV ECSA lineage in Africa, excluding the IOL and including all African and a few exported strains, other than the IOL, based on random local clocks and the Skyline population model. (B) MCC tree of the CHIKV IOL based on random local clocks and the Skyline population model. The branch lengths are scaled to sampling and divergence times, and the branches are color coded for sample location according to the legend. The gray horizontal node bars representing 95% HPD values of the node height are shown only for those with posterior possibilities of ≥90.

Evolution and expansion of the IOL.Since the unprecedented 2005-2006 epidemics in India and the Indian Ocean islands, IOL strains have rapidly spread to Southeast Asia and have been exported to many other countries in Asia, Europe, the Middle East, Oceania, and the Americas. The current circulation patterns in these regions require further investigation. Consistent with our previous study, the IOL consists of two independent sublineages: the Indian Ocean islands sublineage and the India sublineage that emerged independently from coastal Kenya and later spread to Southeast Asia, Italy, and France (Fig. 2B). In the Indian Ocean islands, after the initial major outbreak in 2005 and 2006, there were no further reports of CHIKF outbreaks until the CHIKV reemergence in La Reunion Island in 2010 (no genome sequence is available) (44). However, the case exported to Germany in 2007 indicates that a low level of endemic transmission persisted. On the other hand, the IOL Indian sublineage continued endemic/epidemic circulation in India and radiated to cause additional outbreaks in surrounding countries, such as Sri Lanka, Bangladesh, Yemen, and China. Importantly, IOL strains have established persistent transmission in Southeast Asia. During 2008 and 2009, despite at least three exportations from India to Southeast Asian countries, such as Singapore, Thailand, and Malaysia (Fig. 2B), the major IOL sublineage that caused simultaneous outbreaks in Malaysia, Thailand, and Singapore appears to have established persistent transmission in Thailand, as suggested by inclusion of the 2013 Thailand strains in the same clade. In addition, this sublineage also spread to Myanmar, China, and Cambodia. Clearly, Southeast Asia remains an endemic reservoir for potential global IOL introductions.

Previous studies demonstrated convergent selection of E1-A226V in both the Indian Ocean islands and four independent clades in India that were identified during subsequent outbreaks. We examined the progress and distribution of this substitution. As shown in Fig. 2B, India contained the most CHIKV genetic diversity, with both E1-226A and E1-226V variants actively circulating, and both have been repeatedly exported to surrounding countries. Similarly, the convergent E1-A226V substitution occurred in two independent clades after introduction of the IOL to Sri Lanka. However, due to the lack of recent sequences, the genetic diversity of CHIKV in South Asia is unclear, i.e., whether either or both clades and/or the E1-226 variants coexist. In contrast, in Southeast Asia, the IOL strains mainly comprise a single monophyletic clade that acquired E1-226V before its introduction from South Asia.

DISCUSSION

The complex phylogenetic structure of the ECSA lineage indicates deep genetic diversity and further emergence risk.The evolutionary patterns and emergence potential of arboviruses are directly related to their transmission cycles. In Africa, CHIKV circulates in a sylvatic cycle among nonhuman primates, mediated by forest-dwelling mosquitoes, such as Aedes furcifer, Aedes africanus, and Aedes luteocephalus (45). Human infections are considered mainly incidental spillovers, although endemic circulation may occur for extended periods. Outside Africa, CHIKV is transmitted among human populations via two anthropophilic mosquitoes, A. aegypti and A. albopictus, in urban transmission cycles. Not surprisingly, distinct phylogenetic patterns were observed between the sylvatic/enzootic ECSA lineage and the epidemic IOL and American lineage. On one hand, the shallow bush-like trees of the IOL and the American lineage result from founder effects, followed by radiation due to rapid and explosive expansion into a large geographic area, with intertwined branches reflecting frequent genetic exchange among different regions, presumably by travelers. In contrast, the phylogenetic structure of the ECSA strains in Africa suggests deep, longstanding genetic diversity with long regionally defined branches that diverged about 40 to 50 years ago, reflecting limited genetic exchange among geographic regions of Africa.

The paucity of reported endemic/enzootic CHIKV transmission in Africa despite cases exported from distinct origins suggests additional, undetected genetic diversity. For example, despite the close genetic relationship of CHIKV strains that caused autochthonous 2014 transmission in Brazil to an Angola 1962 strain, no outbreaks have been reported from Angola or surrounding regions since 1973. Similarly, the origin of the exported 2010 China ECSA strain remains unknown because closely related strains have not been identified in Africa. Furthermore, the lack of CHIKV strains closely related to those that caused the Kenya 2004 outbreak, which seeded the Indian Ocean and Indian outbreaks, underscores unsampled CHIKV circulation in East Africa.

Nevertheless, the ability of ECSA CHIKV strains to seed outbreaks overseas, as well as occasionally recognized local outbreaks in African cities, such as in Cameroon, Gabon, and Congo, where A. aegypti and A. albopictus were identified as vectors, indicates that ECSA strains can readily initiate urban transmission. Indeed, the emergence of the E1-A226V mutation in the Cameroon/Gabon/Congo monophyletic clade, as well as the identification of A. albopictus as a vector in some African outbreaks (46–48), suggests that this mutation is probably selected by the same mechanism that occurred in the IOL. The frequent detection of outbreaks in Central African cities over a relatively short term (2006 to 2011) also suggests that this ECSA clade has established continuous urban transmission in Africa and may now be evolving independently of enzootic transmission. Not surprisingly, this ECSA clade seeded autochthonous A. albopictus-borne transmission in Montpellier, France, from September to October 2014, introduced by a traveler returning from Cameroon (40). The establishment of an urban CHIKV lineage in Africa, particularly one that can be transmitted by both A. aegypti and A. albopictus, poses a great risk for further global exportation and emergence.

Determinants of the emergence of the Asian CHIKV lineage in the Caribbean.Perhaps the most intriguing question regarding the CHIKV outbreak in the Americas is why the Asian lineage eventually invaded the Americas rather than the IOL, which was introduced years earlier into the Americas by thousands of infected travelers. Did genetic, ecological/geographical, or stochastic factors play more important roles in this establishment?

Genetically, the American lineage CHIKV strains evolved from an ancestor with two unique characteristics. The 3′ UTR insertion, which represents an approximate duplication of the previously defined DR(1 + 2) region, has been shown to improve CHIKV fitness for replication in mosquito cells without a significant effect in Vero cells (39). Previous in vivo studies with A. aegypti mosquitoes and CD-1 infant mice suggested a similar fitness effect of DR(1 + 2) duplication in mosquitoes but a slightly deleterious phenotype in mammals (49). Therefore, to some extent, increased fitness in A. aegypti mosquitoes, the only vector directly implicated in CHIKV transmission in the Americas (50), mediated by this duplication may have facilitated transmission and spread from the Caribbean. The selection of longer CHIKV 3′ UTRs has been shown to occur mainly during replication in the mosquito host (49), suggesting that this duplication probably arose in the vector. However, the putative fitness advantage was apparently not strong enough for establishment in South Pacific islands, as suggested by the predominance of Asian lineage strains with the shorter 3′ UTR there. Therefore, the duplication may have arisen in Oceania but been fixed in the Americas due to a founder effect after introduction either by a traveler or a mosquito. On the other hand, it is unclear whether the Asian lineage nsP3 deletion confers any fitness advantage in any host or vector. The alphaviral nsP3 protein contains two distinct domains: the highly conserved N-terminal macrodomain and the hypervariable domain (HVD), which is devoid of any predicted secondary structure (51). The nsP3 deletions in the Asian lineage fall into the hypervariable domain, which is highly plastic during alphavirus evolution. Due to its rapid evolution, the HVD is hypothesized to be involved in the optimization of replication in diverse host cell types (51). Chimeric viruses containing CHIKV and o'nyong-nyong virus (ONNV) genes showed that, in the CHIKV backbone, the ONNV nsP3 gene confers the ability to infect Anopheles mosquitoes, natural vectors of this close relative of CHIKV (52). Another study suggested that deletion of the HVD of Sindbis virus causes it to be defective for mosquito cell infection (53). Attenuated virulence and reduced replication rates of Semliki Forest virus mutants lacking some portion of the nsP3 HVD were also observed in vertebrate cells (54). However, a 34-amino-acid deletion in the nsP3 region in Venezuelan equine encephalitis virus had no detectable effect on replication in vertebrate cells. In summary, the effect of nsP3 appears to be host and virus specific, so the potential fitness effect of the nsP3 deletion in American CHIKV strains requires further study.

An important yet understudied observation is the close phylogenetic relationship between CHIKV strains from the Pacific and those in the Caribbean islands. The Asian lineage strains most closely related to those from the Caribbean are from South Pacific islands and the Philippines. In addition, a 2015 strain from French Polynesia in the South Pacific was apparently introduced back from the Caribbean, indicating frequent transmission between the two regions. Interestingly, the spread of Zika virus seems have taken the same route: from Pacific islands to the Americas coincident with the introduction of CHIKV to the Caribbean (55).

Despite the frequent international transmission of infectious agents, adjacent regions that share similar ecological systems and vector populations still provide the best chance for the spread of arboviruses. It is the frequent genetic exchange, combined with suitable ecological conditions for efficient transmission, that promotes the possibility of an outbreak after CHIKV introduction. Given that the IOL is currently circulating in Southeast Asia, including Thailand, Cambodia, and Malaysia, with few reports in Indonesia and the Philippines (see below), better surveillance to monitor the progress of IOL in Southeast Asia may help to predict its potential for future expansion, especially for temperate regions inhabited by A. albopictus but not A. aegypti.

Cocirculation of different lineages and potential outcomes.Due to the recent CHIKV geographic expansions, lineages or clades with different virulences or vector preferences have come to cocirculate. First, although the E1-A226V mutation has been selected in different IOL strains multiple times and has been rapidly distributed to many countries in South and Southeast Asia, in certain areas of India, such as Andhra Pradesh State, only strains with E1-226A were observed (56, 57). The predominance of E1-226A versus E1-226V in different states parallels the abundance of A. aegypti and A. albopictus (58), indicating that vector preference plays an important role in IOL distribution.

The cocirculation of the IOL and the Asian lineage in Southeast Asia appears to be more complicated. CHIKF outbreaks in Southeast Asia have been markedly episodic, with long interepidemic intervals. This can be at least partially attributed to durable immunity, as suggested by serosurveys in Thailand (59), the Philippines (60, 61), and Cambodia (62). However, the recent introduction into the region of the IOL has resulted in large-scale epidemics in both Malaysia and Thailand in 2008 and 2009. In Thailand, the Asian CHIKV lineage caused significant outbreaks in the past but has not been reported since 1995. The 2008-2009 Thailand outbreak, caused by IOL strains, led to 46,000 reported cases in 43 out of 75 provinces. In Malaysia, the Asian lineage caused small, regional CHIKF outbreaks in 1998 and in 2006 and 2007 (63). In November 2006, a small outbreak was reportedly caused by an IOL strain imported by a traveler from India (64) (no CHIKV genome sequence is available). However, two independent IOL clades (Fig. 2B) caused national-scale outbreaks in 2008 and 2009 (65, 66). Experimental studies have demonstrated that Asian and IOL strains generate cross-protective immunity (66); therefore, the rapid spread of the IOL in Thailand and Malaysia could be attributed to its higher replication rate than the Asian lineage, as suggested by in vitro data (49). However, in Indonesia and the Philippines, despite the detection of IOL strains in 2011 (67) and 2013 (68), large-scale outbreaks during this period were still dominated by the Asian lineage. The factors that caused the different epidemiological patterns of the 2 epidemic CHIKV lineages among Southeast Asian countries require further analysis.

Similar cocirculation of ECSA and Asian lineage CHIKV strains has occurred in Brazil since 2014, when both American and ECSA lineage strains arrived to initiate autochthonous transmission in Oiapoque, state of Amapá (northeast), and Feira de Santana, state of Bahia (southeast), respectively. Modeling studies (20) suggest the potential for these lineages to overlap in the same geographical region. Recent sequences show that the ECSA lineage continued to circulate in the state of Bahia as of July 2015 and spread south to the state of Pernambuco in March 2016. On the other hand, an American strain caused the first autochthonous transmission in the state of Rio de Janeiro in December 2015 (69). Our phylogenetic trees (Fig. 1B) suggest that the Rio strain is closely related to a Brazilian strain from the state of Pernambuco, although it is unclear if it is a direct descendant. The continued spread and interactions of these strains demand additional monitoring. Of potential importance is the presence in the ECSA Brazilian strain of the E1-211I residue compared to E1-211T found in IOL strains. This mutation has been shown to constrain the ability of CHIKV to infect A. albopictus in the presence of E1-226V (70).

Finally, CHIKV has recently been exported to many other countries and caused local transmission in some. For example, the IOL caused outbreaks in Italy in 2007 (41) and France in 2010 (42). Furthermore, ECSA, Asian, and IOL strains have been introduced into China, with the IOL introduced multiple times from different origins (Fig. 1 and 2). Although CHIKV is not known to have established endemic transmission in any of these countries, suitable vector populations and climates suggest the potential for repeated introductions and epidemics.

In summary, the dramatic global expansion of CHIKV since 2004, together with the continuing spread of the mosquito vectors A. aegypti and A. albopictus, such as in Russia (71), indicates that CHIKV will continue to expand its global distribution and become an increasing threat to public health.

ACKNOWLEDGMENTS

We thank Philippe Lemey and Edward Holmes for their advice and Mathilde Guerbois-Galla for technical help.

This project was funded in part by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under awards AI120942, AI121452, and U19AI110819. Some of the genomes were sequenced with funding provided by DHS S&T through contract no. HSHQDC-13-C-B0009 (Capturing Global Biodiversity of Pathogens by Whole Genome Sequencing).

The content is solely our responsibility and does not necessarily represent the official views of the National Institutes of Health.

. 2015. Molecular characterisation of chikungunya virus infections in Trinidad and comparison of clinical and laboratory features with dengue and other acute febrile cases. PLoS Negl Trop Dis9:e0004199.doi:10.1371/journal.pntd.0004199.

. 1994. Deletion and duplication mutations in the C-terminal nonconserved region of Sindbis virus nsP3: effects on phosphorylation and on virus replication in vertebrate and invertebrate cells. Virology202:224–232.doi:10.1006/viro.1994.1338.

. 2015. High rate of subclinical chikungunya virus infection and association of neutralizing antibody with protection in a prospective cohort in the Philippines. PLoS Negl Trop Dis9:e0003764.doi:10.1371/journal.pntd.0003764.