21Department of Veterinary Medicine, University of Cambridge, Cambridge, UK

Correspondence and requests for materials should be addressed to C.A.R. (Email: ku.ca.mac@44rac)

Author Contributions C.A.R. and T.B. conceived the research. C.A.R. and T.B. drafted the manuscript with substantial support from P.L. and S.R.. I.G.B., S.B., M.C., N.J.C., R.S.D., C.P.G., A.C.H., A.K., A.K., X.L., J.W.M., T.O., V.P., Y.S., M.T., D.W., and X.X. coordinated and produced the influenza surveillance data. T.B. performed the modeling and data analyses along with C.A.R., S.R., P.L., M.A.S. and A.R.. T.B. created the figures. All authors discussed the results and contributed to the revision of the final manuscript.

Abstract

Understanding the spatio-temporal patterns of emergence and circulation of new human seasonal influenza virus variants is a key scientific and public health challenge. The global circulation patterns of influenza A/H3N2 viruses are well-characterized1-7 but the patterns of A/H1N1 and B viruses have remained largely unexplored. Here, based on analyses of 9,604 hemagglutinin sequences of human seasonal influenza viruses from 2000–2012, we show that the global circulation patterns of A/H1N1 (up to 2009), B/Victoria, and B/Yamagata viruses differ substantially from those of A/H3N2 viruses. While genetic variants of A/H3N2 viruses did not persist locally between epidemics and were reseeded from East and Southeast (E-SE) Asia, genetic variants of A/H1N1 and B viruses persisted across multiple seasons and exhibited complex global dynamics with E-SE Asia playing a limited role in disseminating new variants. The less frequent global movement of influenza A/H1N1 and B viruses coincided with slower rates of antigenic evolution, lower ages of infection, and smaller less frequent epidemics compared to A/H3N2 viruses. Detailed epidemic models support differences in age of infection, combined with the less frequent travel of children, as likely drivers of the differences in the patterns of global circulation, suggesting a complex interaction between virus evolution, epidemiology and human behavior.

Owing to the frequency and severity of human seasonal influenza A H3N2 virus epidemics, recent work has focused on the global circulation dynamics of H3N2 viruses1-7. Studies have shown that, each year, H3N2 epidemics worldwide result from the introduction of new genetic variants from East and Southeast (E-SE) Asia, where viruses circulate via a network of temporally overlapping epidemics1,2,4,5, rather than local persistence1,3,6,7. In addition to H3N2, H1N1 viruses and two antigenically diverged lineages of influenza B viruses, B/Victoria/2/1987-like (Vic) and B/Yamagata/16/1988-like (Yam), circulate among humans with lower but substantial disease burdens8,9. Despite their importance, the global circulation dynamics of former seasonal H1N1 viruses (preceeding the 2009 pandemic) and B viruses have been largely neglected.

Given that influenza A and B viruses cause similar symptoms and evolve by similar mechanisms of immune escape, we hypothesized that each would follow similar patterns of global circulation, with new genetic variants originating in E-SE Asia that rapidly replace existing genetic variants. To test this hypothesis we compared the global circulation patterns of the hemagglutinin (HA) genes of H3N2, former seasonal H1N1, Vic, and Yam viruses. We assembled datasets of HA sequences with complete HA1 domains for each subtype from the World Health Organization Global Influenza Surveillance and Response System and the Influenza Research Database10 covering 2000–2012. To reduce the impact of surveillance biases, we subsampled these data to more equitable spatiotemporal distributions, resulting in datasets comprising 4006 H3N2, 2144 H1N1, 1999 Vic, and 1455 Yam HA sequences (Extended Data Fig. 1). Though deficient in viruses from Africa and Eastern Europe, these are the most geographically and temporally comprehensive seasonal influenza virus datasets assembled to date.

By estimating temporally-resolved phylogenetic trees for each subtype, we revealed faster rates of nucleotide mutation and amino acid substitution in H3N2 and H1N1 than in the B viruses (consistent with previous work11,12), but more genealogical diversity in B viruses than A viruses (Extended Data Table 1). This inverse relationship between evolutionary rate and genealogical diversity is expected if increased mutation rate correlates with antigenic drift13 and drives increased adaptive evolution, thus purging HA genetic diversity14. By inferring geographic ancestry using Bayesian phylogeographic methods15, we found a consistent pattern for H3N2 viruses (Fig. 1a) in which viruses worldwide rapidly coalesce to the trunk of the tree (average time to trunk = 1.42 years), with trunk viruses mostly originating from E-SE Asia (Extended Data Fig. 2a). This finding is consistent with previously reported patterns1,2,4,5, with E-SE Asia acting as the source population for epidemics worldwide.

In addition to China and Southeast Asia, India frequently contributed viruses to the trunk of the tree suggesting that the global circulation of H3N2 viruses is maintained by an E-SE Asian network that includes India. India’s role in the global dissemination of H3N2 viruses may have been similar historically, but India-wide influenza surveillance only began in 2004. There were brief periods, notably the 2007–2008 Northern Hemisphere winter, when regions outside E-SE Asia contributed to the trunk of the H3N2 tree. However, these instances were rare and trunk viruses from outside E-SE Asia descended directly from viruses within E-SE Asia (Fig. 1a). Quantifying the average ancestry of strains from each geographic region in the 3 years prior to sampling showed prominent roles for China, India, and Southeast Asia in seeding epidemics in all regions (Extended Data Fig. 3).

Surprisingly, the global circulation patterns of former seasonal H1N1 viruses differed substantially from those observed for H3N2 viruses (Fig. 1). Like H3N2, most lineages of H1N1 viruses eventually coalesced with viruses from E-SE Asia and India. However, this coalescence was slower than for H3N2 viruses with prolonged co-circulation of geographically segregated H1N1 lineages (Fig. 1b, Extended Data Figs. 3 and ​and4).4). Geographic segregation of H1N1 viruses was particularly pronounced beginning in 2004/2005, with the emergence of three co-circulating genetic lineages (Fig. 1b, nodes 1-3) that each independently acquired HA mutations leading to antigenic evolution from the A/New Caledonia/20/1999-like phenotype to the A/Solomon Islands/3/2006-like phenotype. These lineages circulated in Southeast Asia (node 1), China (node 2) and India (node 3), with the Indian lineage eventually spreading worldwide prior to the emergence of H1N1pdm09 viruses.

Phylogeographic analyses of B Vic and Yam viruses revealed further differences from H3N2 viruses with lineages frequently circulating outside of E-SE Asia for several years without evidence of seeding from E-SE Asia (Fig. 1c,d). Prominent examples include the seeding of the North American 2006/2007 Vic season directly from 2005/2006 North American viruses and the seeding of the North American 2001/2002 Yam season directly from 2000/2001 North American viruses (Extended Data Fig. 4). Similarly, lineages of viruses within E-SE Asia commonly circulated exclusively in E-SE Asia for more than 1 year. These long circulating E-SE Asian lineages were most apparent for Vic viruses where two lineages (Fig. 1c, nodes 1 and 2) persisted independently in China and SE Asia for over 5 years without spreading to other regions and led to the co-circulation of three distinct Vic antigenic variants in different parts of the world during 2007/2008 (Extended Data Fig. 5a).

Patterns of persistence of genetic variants differed by subtype and region, with H3N2 viruses persisting regionally for an average of ~6 months, H1N1 for ~9 months, Vic for ~13 months and Yam for ~12 months. H3N2 viruses showed comparably short durations of persistence across the world (Fig. 1), with the exceptions of India and China. Patterns within China were characterized by North and South lineages contributing jointly to persistence as combining North and South phylogeny nodes resulted in substantially greater persistence estimates than from North or South lineages alone (Fig. 1). For H3N2, evidence for joint contributions to persistence by region pairs that exclude China is comparatively weak (Extended Data Fig. 6a, Supplementary Information). For Vic and Yam, the mean duration of persistence was longer than for H3N2 or H1N1 in most regions, particularly in India and China where mean durations were >2 years (Fig. 1, Extended Data Fig. 4). Duration of regional persistence correlated with the proportion of virus originating from that region (Extended Data Fig. 6b) and observed phylogeographic patterns were robust to subsampling assumptions (Supplementary Information, Extended Data Table 2).

To investigate differences in the global migration patterns of H3N2, H1N1 and B viruses, we used the spatiotemporally-resolved phylogenies to estimate the amounts of virus movement between regions (Fig. 2). Rates of movement between pairs of regions were highly correlated between viruses with Spearman correlation coefficients ranging from 0.65 (H3N2 vs Yam) to 0.75 (H3N2 vs H1N1), suggesting similar global connectivity networks for all viruses. However, while the overall structure of the migration network was similar, H3N2 viruses moved between regions more frequently than H1N1 and B viruses (migration events per lineage per year H3N2 = 1.96, H1N1 = 1.27, Vic = 0.93, Yam = 0.97, Extended Data Table 1).

We hypothesized a relationship between rates of global movement and rates of antigenic drift: though rates of genetic evolution were similar for H3N2 and H1N1 viruses, both H1N1 and B viruses evolved antigenically more slowly than H3N2 viruses13 (Extended Data Table 1). We also hypothesized that lower rates of immune escape for B and H1N1 compared with H3N2 would lead to: younger average ages of infection as children increasingly comprise the largest pool of susceptible individuals; and smaller, less frequent epidemics owing to smaller populations of susceptible individuals13. These differences are consistent with results from several community-based cohort studies that found that children were more frequently infected with B viruses than were adults8,16,17. Age of infection data covering 2002–2011 from Australia show that H1N1 and B viruses infect younger individuals than H3N2 viruses (Extended Data Fig. 5b-d, median age of infection H3N2 = 30y, H1N1 = 20y, B = 16y) and epidemiological data from Australia and the United States show reduced size and frequency of H1N1 and B epidemics compared to H3N2 (Extended Data Fig. 5f-i).

Differences in age of infection may explain differences in global circulation as children travel long distances much less frequently than adults (Extended Data Fig. 5e). A previous study hypothesized that age-specific patterns of infection could lead to differences in contact rates and the spread of influenza types within the United States over the course of a single season18. Here, we hypothesized that differential global air travel provides a plausible mechanism by which H1N1 and B viruses show increased genetic differentiation and reduced rates of global migration across multiple seasons, compared to H3N2 viruses.

To test the impact of differences in age distribution of infection on global patterns of virus movement, we constructed a multi-patch transmission model. We modeled two scenarios for host movement: 1. age-independent mixing between patches; 2. age-stratified mixing with host movement derived from air travel passenger age data (Extended Data Fig. 5e). In the age-independent scenario, model parameters only differed in rate of antigenic mutation, leading to differences in observed rates of antigenic drift among viruses and hence epidemic size and frequency (Extended Data Fig. 7). Faster antigenic drift resulted in greater incidence and more adult infections (Fig. 3a,b), but only modest differences in virus lineage movement (Fig. 3c, B-like viruses differ from H3-like viruses by a factor of 1.2), consistent with slightly faster spread of antigenically novel strains. However, age-stratified mixing between patches intensified the effect of antigenic drift on migration rate and created differences in rates of movement between patches more consistent with those observed for H3N2 vs H1N1 and B (Fig. 3c, B-like viruses differ from H3-like by a factor of 1.6). In the scenario with faster antigenic drift, infections were more mobile due to greater frequency of adult infection, causing a knock-on effect on rates of viral movement. The model also suggests that the differences in patterns of regional persistence observed in the phylogenies might be shaped by a combination of differences in rates of antigenic evolution and variation in amplitude of epidemic seasonality, with slowly evolving viruses persisting longer than rapidly evolving viruses at low amplitudes of seasonal forcing (Extended Data Fig. 8a, Supplementary Information).

In the model, varying transmission rate rather than antigenic mutation rate also resulted in differences in the observed rate of antigenic drift, with higher transmission resulting in faster drift (Extended Data Fig. 8b). The relationship between antigenic drift rate and migration rate is similar regardless of whether drift is modulated by mutation rate or transmission rate (Extended Data Fig. 8b). This finding is in line with theoretical work showing that epidemiological processes can influence influenza virus evolution19,20. However, there are important virological differences between influenza viruses that are likely to impact the efficiency and tempo at which antigenic variation is generated and fixed, which could in turn affect epidemiology21-24 (Supplementary Information).

Regardless of the underlying drivers, there is a remarkable correspondence in model behavior, quantified as a stable relationship between observable rate of antigenic drift and global circulation patterns. The patterns of epidemic spread observed here suggest that differences in ages of infection could explain patterns of global circulation across a variety of human viruses.

Methods

Sequence data

Hemagglutinin (HA) coding sequences for influenza A H3N2 viruses, former seasonal H1N1 viruses (preceding the 2009 pandemic), and influenza B virus lineages Victoria (Vic) and Yamagata (Yam) collected by the World Health Organization (WHO) Global Influenza Surveillance and Response Network including the National Institute of Virology, Pune, India between 2000 and 2012 were combined with human seasonal influenza virus sequences (minimum length = 984 base pairs) covering 2000 to 2012 from the Influenza Research Database10. After removing duplicate strains and strains overly divergent based on root-to-tip distances, the data set contained 9139 H3N2 sequences, 3789 H1N1 sequences, 2577 Vic sequences and 1821 Yam sequences. Sampling locations for these sequences were parsed from strain names. Sequences were grouped into 9 geographic regions: USA/Canada, South America, Europe, India, North China, South China, Japan/Korea, Southeast Asia and Oceania. Specifics of this partitioning are shown in Extended Data Figure 1. Groups were chosen to maximize available sequences within each region while still providing enough geographic diversity to ensure nearly global coverage. Sequences from Africa, Central America, the Middle East and Russia were excluded because of a lack of sufficient numbers of sequences to provide comparable estimates to other regions.

In the raw sequence data, some regions, such as the USA, were over-represented. Additionally, more recent years were over-represented compared to years at the start of the study period. In order to control for these sampling biases, we subsampled the raw data randomly by location and time to create a more equitable spatiotemporal distribution. The USA had consistently more sequences available every year from 2000 to 2012, thus in order to maintain similar total numbers of sequences for each region across the entire study period it was necessary to sample fewer sequences per year from the USA. We selected 50 sequences per region per year (40 for USA/Canada) for H3N2 and 80 sequences per region per year (45 for USA/Canada) for H1N1, Vic and Yam. This subsampling resulted in largely similar sequence counts across years and across regions for each virus, but overall more H3N2 sequences than H1N1 or B sequences, with 4006 H3N2 sequences, 2144 H1N1 sequences, 1999 Vic sequences and 1455 Yam sequences (Extended Data Fig. 1). When selecting subsampled sequences we first selected sequences with full day-month-year collection dates and then longer sequences over sequences with less precise dates or shorter sequences. HA sequence data for 1630 H3N2 isolates, 1600 H1N1 isolates, 1394 Vic isolates and 881 Yam isolates have been deposited in the Influenza Research Database10 and accession numbers for all sequences used provided as Supplementary Information.

Phylogeographic inference

Time-resolved phylogenetic trees were estimated for H3N2, H1N1, Vic and Yam using BEAST v1.8.125 and incorporated the SRD06 nucleotide substitution model26, a coalescent demographic model with constant effective population size and a strict molecular clock across branches. A strict molecular clock was chosen based on finding strong correlations between date of sampling and evolutionary distance in all datasets, as estimated by Path-O-Gen v1.427. Using a strict clock also reduced the risk of model over-parameterization (for example, for the complete H3N2 data set with a relaxed clock, there would be 2 × 4006 – 2 = 8010 branch-specific rates). Samples with imprecise dates (known only to the month or to the year) had their dates of sampling estimated assuming a uniform prior within the known temporal bounds28. Markov Chain Monte Carlo (MCMC) was run for 600 million steps and trees were sampled every 5 million steps after allowing a burn-in of 100 million steps, yielding a total sample of 100 trees for H1N1, Vic and Yam. With significantly more samples, H3N2 required a longer chain to converge. Here, MCMC was run in parallel for 2 chains, each with 650 million steps sampled every 3 million steps with a burn-in of 500 million steps and samples across chains combined, yielding a total of 100 sampled trees. These trees were treated as independent draws from the posterior space of trees when subsequently used in the robust counting and phylogeographic analyses29. Evolutionary rates in Extended Data Table 1 were estimated using the ‘renaissance’ counting methods of Lemey et al.30.

Phylogeographic patterns were estimated using a discrete-state continuous time Markov chain (CTMC) model, in which transition rates were estimated between each pair of regions15. We assumed a non-reversible transition model31 consisting of 72 separate rate parameters, each with a Bayesian stochastic search variable selection (BSSVS) indicator variable, and a separate overall rate of geographic transition. We assumed an exponential prior with mean of 1 for each transition rate, a negative binomial prior with mean of 9 and standard deviation of 9 for the total number of non-zero rates and an exponential prior with mean of 1 migration event per lineage per year for the overall geographic transition rate. MCMC was run for 12 million steps with a burn-in of 2 million steps, and parameters sampled every 10,000 steps and trees sampled every 100,000 steps, yielding a total sample of 1000 parameter states and 100 trees on which estimates were based. Pairwise migration rate estimates had an effective sample size (ESS) of 350 at the minimum and most had ESS greater than 500.

This procedure yielded posterior trees with the geographic states of internal nodes resolved. We analyzed these posterior trees using the program PACT v0.9.532 to compute the following summary statistics: a) genealogical diversity14, measuring the average time it takes for two randomly chosen contemporaneous lineages to coalesce, b) time to the most recent common ancestor (TMRCA)14, measuring the average time it takes for all contemporaneous lineages to find a common ancestor, c) genealogical FST, measuring the degree of population structure in contemporaneous lineages calculated as FST = (πb – πw)/πb, where πw is genealogical diversity between randomly sampled lineages from the same geographic region and πb is genealogical diversity between randomly sampled lineages from different geographic regions, d) persistence, measuring the average number of years for a tip to leave its sampled location, walking backwards up the phylogeny, e) migration rate, measuring the average number of migration events over the phylogeny divided by total tree length to give migration events per lineage per year, f) trunk location through time4, measuring the posterior distribution across sampled phylogenies of the trunk geographic state, where the trunk is defined as all branches ancestral to viruses sampled within 1 year of the most recent sample, g) region-specific ancestral geographic history, measuring the distribution of geographic locations of tips belonging to a particular region traced backwards in time through the phylogeny averaged across sampled phylogenies. Statistics (a), (b), (c), (f), and (g) were calculated across 0.1 year genealogical windows. These procedures gave an estimate of credible intervals for inferred ancestral locations across posterior phylogeographic reconstructions.

Code and data availability

Sequence data has been deposited with the Influenza Research Database10 and accession numbers provided as Source Data. The entire bioinformatic pipeline, including data subsampling, preparing XML files for BEAST, setting up PACT analyses and rendering figures is available at https://github.com/blab/global-migration. Analysis and data files are archived on the Dryad Digital Repository under DOI 10.5061/dryad.pc641.

Surveillance, travel and age-structure data

We investigated epidemic size and frequency using virological isolation data between 2000 and 2012 collected by the WHO Collaborating Centre for Reference and Research on Influenza at the Victorian Infectious Diseases Reference Laboratory (VIDRL), Melbourne, Australia and the Centers for Disease Control and Prevention, Atlanta, USA (Extended Data Fig. 5f–i). These isolations were categorized by date of sampling and by virus type: H3N2, H1N1, Vic, or Yam. The data from VIDRL also contained information on patient age. The age structure of incidence was estimated by constructing a distribution of age of infection from individuals >5 y (owing to the overrepresentation of <5 year old patients for all subtypes) (Extended Data Fig. 5b–d). Median age of infection was 30 y (H3N2), 20 y (H1N1) and 16 y (B) and mean age of infection was 33.9 y (H3N2), 23.1 y (H1N1) and 23.2 y (B). Median age of infection was significantly different for H3N2 vs H1N1 (P = 4.6 × 10−29, Mann-Whitney U test), H3N2 vs B (P = 1.2 × 10−62) and H1N1 vs B (P = 0.041). The patient age data from VIDRL were potentially biased by testing strategy and the generally higher severity of H3N2 virus infections. Children and working age adults were more likely to be tested than the elderly but the greater severity of H3N2 virus infections might spread and flatten the patient age distribution. For this reason we additionally tested excluding individuals >65 y and recalculating summary statistics, finding median ages of infection of 27 y (H3N2), 19 y (H1N1) and 15 y (B) and mean age of infection as 28.0 y (H3N2), 22.2 y (H1N1) and 20.3 y (B). We classified children as 0-15 years and adults as 16 years and older, and estimated proportion of childhood infections as 30% (H3N2), 52% (H1N1) and 60% (B). There are potentially other biases specific to individual sentinel physicians and hospitals that could affect sample collection. However, the estimate derived from the VIDRL data that ~60% of influenza B virus infections occur in children is consistent with other estimates (reviewed in Glezen et al.8). Other studies similarly corroborate the estimates of lower age of infection for H1N1 viruses as compared to H3N233,34.

Additionally, we analyzed the distribution of ages of ~102.5 million air passengers traveling through London Heathrow and London Gatwick airports in 2011 (Extended Data Fig. 5E) reported by Civil Aviation Authority of the UK35. Assuming that children of ages 0 to 15 make up 17% of the UK population (Office of National Statistics), this distribution suggests that children engage in air travel at 19% the rate of adults.

For the modeling described below, we estimated age-structured contact rates following the empirical mixing data provided by Mossong et al.36. These contact matrices were previously validated in modeling pertussis epidemiology37. We simplified the Mossong et al. mixing matrices to record child-to-child contacts, child-to-adult contacts, adult-to-child contacts and adult-to-adult contacts, where children were defined to be 0 to 15 and adults to be 16 or over. This resulted in the following mixing matrix

α=(1.00.210.210.26),

where rates are relative to child-to-child contact rates.

Epidemiological modeling

An individual-based model of influenza evolution and epidemiology was constructed following methods presented in Bedford et al.38. The model used here is identical to Bedford et al. except where specified below. The present implementation used a linear-strain space39,40, in which virus phenotype is represented by a continuous variable and cross-immunity between viruses is a function of distance between viruses in this space. We parameterized the model to compare scenarios of age-structured mixing between regions and to compare viruses with different rates of antigenic drift.

The model was simulated for 120 years with daily time steps and the first 100 years discarded to allow equilibrium to be reached. We modeled a metapopulation with individuals equally divided into three regions (North, Tropics, South). Individual’s ages were tracked throughout the simulation and those less than 16 years old were classified as children and those 16 or older were classified as adults. Transmission occurred by mass action, with transmission rates modified by regional compartment and by age compartment. Thus, for example, the force of infection into children in the Tropics followed

λct=∑i∈(a,c)βtαicIitSctNt+∑i∈(a,c)∑j∈(n,s)βtαicmiIijSctNt,

where βj is the seasonally forced contact rate in region j, α ac represents adult-to-child transmission, mi represents between-region transmission in age class i, Iij represents the number of infecteds in age class i in region j, Sij represents the number of susceptibles in age class i in region j, and Nj represents the total number of hosts in region j. The northern and southern regions were seasonally forced in opposite phase with a sinusoidal function following ε, while the tropics had no seasonal forcing.

Each virus possessed a one-dimensional antigenic phenotype ϕv and after recovery a host ‘remembered’ its infecting phenotype. For each contact event, the Euclidean distance from infecting phenotype ϕv was calculated to each of the phenotypes in the host immune history ϕh1, …, ϕhn. Here, one unit of antigenic distance was designed to roughly correspond to a twofold dilution of antiserum in a hemagglutination inhibition (HI) assay41. The probability ρ that infection occurred after exposure was proportional to the distance d to the closest phenotype in the host immune history, following ρ = min{d s, 1}. Each day there was a chance μ that an infection mutates to a new phenotype. This mutation rate represents a phenotypic rate, rather than genetic mutation rate, and can be thought of as arising from multiple genetic sources. When a mutation occurred, the virus’s phenotype was moved either left or right randomly and mutation size sampled from an exponential distribution with mean step size δ avg. Epidemiological parameters for the baseline epidemiological scenario with notation following Bedford et al.38 were:

Base transmission rate β = 0.88 per day

Duration of infection 1/ν = 5 days

Birth/death rate = 1/50 years

Total population size N = 45 million

Seasonal forcing in north and south ε = 0.15

Antigenic scaling s = 0.07

Antigenic mutation rate μ = 0.5 to 6.5 × 10 −4 per day

Average mutation size δavg = 0.3 units

Child-to-child transmission α cc = 1.00

Child-to-adult transmission α ca = 0.21

Adult-to-child transmission α ac = 0.21

Adult-to-adult transmission α aa = 0.26

Child between-region transmission mc = 0.0020

Adult between-region transmission ma = 0.0020

In the model with age-stratified mixing with host movement derived from air travel passenger age data, child between-region transmission mc was 0.0011 and adult between-region transmission ma was 0.0060.

In the course of the simulation, the underlying infection history of who infects whom was recorded and output as a complete infection tree. Without ample within-host diversity owing to chronic infection, the complete infection tree also generated a fully observed phylogenetic tree. Examining geographic location across the phylogenetic tree allowed us to directly calculate migration rate as total migration events observed (transitions from one region to another) divided by total opportunity (tree length).

Extended Data

Extended Data Figure 1

Circle area is proportional to the number of sequenced viruses originating from a location. Color indicates assignment to one of 9 geographic regions.

Extended Data Figure 2

Inferred location of the trunk of H3N2 tree through time in the primary dataset (a) and in a smaller secondary dataset (b)

Colored width at each time point indicates the posterior support for viruses from a particular geographic location comprising the trunk of the phylogenetic tree. Colors correspond to colored circles in persistence insets in Figure 1. The secondary datasets consist of 1391 H3N2 viruses, 1372 H1N1 viruses, 1394 Vic viruses and 1240 Yam viruses.

Extended Data Figure 3

Average inferred geographic history of region-specific samples for H3N2, former seasonal H1N1, Vic and Yam viruses from 2000 to 2012

In each panel, phylogeny tips belonging to a particular region were collected and their phylogeographic histories traced backwards in time averaging across the phylogenetic tree to combine all viruses within each region. The x-axis shows number of years backward in time from phylogeny tips from a particular region and the y-axis shows the geographic make up as stacked histogram of the ancestors of these tips, where region color-coding corresponds to the legend in Figure 1. For example, the top left panel shows the ancestry of USA and Canadian H3N2 viruses. At x = 0, all of these viruses are still in the USA or Canada and so an unbroken yellow band takes up the entire y. However, at x = 1 year, a number of different geographic regions appear on the y. This indicates that, 1 year back, ancestors of USA and Canadian viruses are primarily found in Southeast Asia, India and South China. The pattern in the top right panel shows that the ancestors of USA and Canadian Yam viruses more often remain in the USA or Canada with approximately 50% of ancestors remaining 1 year back. Each panel is constructed by averaging across region-specific tips within a tree, but also across sampled posterior trees.

Extended Data Figure 4

Each tree only contains viruses from a particular geographic region and thus tips are all a single color within a tree. Branch and trunk coloring have been retained from Figure 1 to highlight the inferred geographic ancestry of each lineage.

Extended Data Figure 5

Antigenic map of Vic viruses primarily collected in 2008 (a), age distribution of infections for H3N2 (b), H1N1 (c) and B (d) in Australia 2000–2011, age distribution of ~102.5 million passengers at London Heathrow and London Gatwick airports during 2011 (e), timeseries of virological characterizations from 2000 to 2012 of viruses from the USA by US CDC and from Australia by VIDRL for H3N2 (f), H1N1 (g), Vic (h) and Yam (i)

In (a), the positions of strains (colored circles) and antisera (uncolored squares) are fit such that the distances between strains and antisera in the map represent the corresponding hemagglutination inhibition (HI) measurements with the least error following Smith et al.41 using data on Vic viruses from the WHO Collaborating Centre for Reference and Research on Influenza at the Centers for Disease Control and Prevention, Atlanta, Georgia, USA. Strains are colored by antigenic cluster. Genetic clades corresponding to each antigenic cluster are marked with colored vertical bars in Fig 1c. The spacing between grid lines is one unit of antigenic distance corresponding to a twofold dilution of antiserum in the HI assay. In (f) to (i), virological characterizations are a surrogate for epidemiological activity that allow for accurate discrimination among H3N2, H1N1, Vic, and Yam viruses. These data generally reflect the relative magnitudes and frequencies of epidemics but in some cases will inflate magnitudes of very small epidemics due to preferential characterization of subtypes circulating at low levels.

Extended Data Figure 6

Combined persistence estimates across pairs of regions for H3N2, H1N1, Vic and Yam (a) and Spearman correlation of a region’s persistence vs the region’s contribution to phylogenetic ancestry for H3N2, H1N1, Vic and Yam (b)

In (a) and (b), persistence is measured as the average waiting time in years for a sample to leave its origin backwards in time in the phylogeny, with waiting time averaged across tips within a tree and across sampled posterior trees. In each panel of (a), the diagonal shows persistence within each of the 9 study regions and within the combined region of ‘China’, for which nodes in North China and in South China were considered to belong to a single region. The estimates along the diagonal are equivalent to the means shown in Figure 1. Off-diagonal elements show persistence estimates for pairwise combinations of regions. For example, the off-diagonal for North and South China is exactly equivalent to the diagonal element for ‘China’ and the off diagonal for ‘China’ and India represents mean persistence when combining nodes from North China, South China and India. In (b), origin proportion is measured as the proportion of the time that a region is represented when tracing back 2 or more years from each tip in the phylogeny, averaged across tips within a tree and across sampled posterior trees. Spearman’s ρ is not significant for any individual virus. However, the probability of observing 4 instances where each virus has a ρ of at least 0.32 is significant (P = 0.0017, bootstrap resampling test).

Extended Data Figure 7

Colors represent geographic regions with tropics in blue, north in yellow and south in red. Region-specific incidence patterns are shown in terms of cases per 100,000 individuals per week, patterns of antigenic drift in terms of increasing antigenic distance (roughly proportional to log2 HI units) over time and in the geographically labeled phylogeny. The parameterized antigenic mutation rate is 0.00015 antigenic mutations per infection per day in (a), 0.00035 in (b) and 0.00055 in (c), while the realized antigenic drift rate is 0.29 antigenic units per year in (a), 0.58 in (b) and 1.19 in (c). Between-region mixing is 5.26× faster in adults. Each panel shows output from a single simulation selected from the 112 shown in Figure 3, and is intended to show model behaviors over a range of parameters, not necessarily the behavior of particular viruses.

Extended Data Figure 8

Simulation results showing relationship between antigenic drift and persistence as a function of seasonality (a) and simulation results showing the effects of modulating transmission rate β on model behavior (b)

In (a), the seasonal forcing parameter ε follows ε = 0.00 (no forcing), ε = 0.04, ε = 0.08 and ε = 0.12 (moderate seasonal forcing). Points represent outcomes from a model in which adults travel between regions at 5.26× the rate of children. Solid black lines represent linear fits to the data. With 4 seasonality scenarios, 7 mutation rates and 8 replicates, there are 224 individual simulations shown. Persistence is measured as the average time in years taken for a tip to leave its region of origin going backwards in time, up the tree. In (b), transmission rate β in contacts per day is varied and compared to its effect on observed antigenic drift (in antigenic units per year), attack rate per year, proportion of childhood infections and migration rate between regions (in events per viral lineage per year). One antigenic unit is roughly equivalent to one log2 HI unit. Black points represent outcomes from a model in which children and adults travel between regions at equal rates. Red points represent outcomes from a model in which adults travel between regions at 5.26× the rate of children. Solid black and red lines represent LOESS fits to the data. With 2 travel scenarios, 7 transmission rates and 8 replicates, there are 112 individual simulations shown.

Supplementary Material

1

2

Acknowledgments

T.B. was supported by a Newton International Fellowship from the Royal Society and through NIH U54 GM111274. S.R. was supported by MRC (UK, Project MR/J008761/1), Wellcome Trust (UK, Project 093488/Z/10/Z), Fogarty International Centre (USA, R01 TW008246-01), DHS (USA, RAPIDD program), NIGMS (USA, MIDAS U01 GM110721-01) and NIHR (UK, Health Protection Research Unit funding). The Melbourne WHO Collaborating Centre for Reference and Research on Influenza was supported by the Australian Government Department of Health and thanks N. Komadina and Y.-M. Deng. The Atlanta WHO Collaborating Center for Surveillance, Epidemiology and Control of Influenza was supported by the U.S. Department of Health and Human Services. NIV thanks A.C. Mishra, M. Chawla-Sarkar, A.M. Abraham, D. Biswas, S. Shrikhande, AnuKumar B, and A. Jain. Influenza surveillance in India was expanded, in part, through US Cooperative Agreements (5U50C1024407 and U51IP000333) and by the Indian Council of Medical Research. M.A.S. was supported through NSF DMS 1264153 and NIH R01 AI 107034. Work of the WHO Collaborating Centre for Reference and Research on Influenza at the MRC National Institute for Medical Research was supported by U117512723. P.L., A.R. & M.A.S were supported by EU Seventh Framework Programme [FP7/2007-2013] under Grant Agreement no. 278433-PREDEMICS and ERC Grant agreement no. 260864. C.A.R. was supported by a University Research Fellowship from the Royal Society.

Footnotes

The authors declare no competing financial interests.

Online Content Methods, along with Extended Data Tables and Figures, and Supplementary Information are available in the online version of the paper; References unique to these sections appear only in the online paper. Accession numbers of all sequences used in this study are available to download as Extended Data.