bDepartment of Molecular, Cellular, and Biomedical Sciences, University of New Hampshire, Durham, NH 03824;eDepartment of Microbiology and Molecular Genetics, University of Pittsburgh, Pittsburgh, PA 15219

Significance

The fitness effect of many mutations depends on the genotype of the individual in which they occur. Are these dependencies predictable? Do dependencies build on existing variation between individuals to promote divergence, or do they act to favor genetic cohesion? We examine these questions by measuring the fitness effect of mutations that conferred a benefit in a laboratory-evolved population when transferred into genetically and phenotypically diverse natural isolates of the same species. We found that fitness effects were predicted by the fitness of the strain to which they were added but not by the genetic or ecological relationship of the recipient strains. This pattern extends findings that the current fitness of strain is a major predictor of its ability to adapt.

Abstract

The effect of a mutation depends on its interaction with the genetic background in which it is assessed. Studies in experimental systems have demonstrated that such interactions are common among beneficial mutations and often follow a pattern consistent with declining evolvability of more fit genotypes. However, these studies generally examine the consequences of interactions between a small number of focal mutations. It is not clear, therefore, that findings can be extrapolated to natural populations, where new mutations may be transferred between genetically divergent backgrounds. We build on work that examined interactions between four beneficial mutations selected in a laboratory-evolved population of Escherichia coli to test how they interact with the genomes of diverse natural isolates of the same species. We find that the fitness effect of transferred mutations depends weakly on the genetic and ecological similarity of recipient strains relative to the donor strain in which the mutations were selected. By contrast, mutation effects were strongly inversely correlated to the initial fitness of the recipient strain. That is, there was a pattern of diminishing returns whereby fit strains benefited proportionally less from an added mutation. Our results strengthen the view that the fitness of a strain can be a major determinant of its ability to adapt. They also support a role for barriers of transmission, rather than differential selection of transferred DNA, as an explanation of observed phylogenetically determined patterns of restricted recombination among E. coli strains.

Mutations can interact with one another and with their broader genetic background to affect fitness, a phenomenon known as epistasis (1). Epistasis plays a key role in many aspects of biology, including theories of speciation (2, 3), the evolution and maintenance of sex (4, 5), adaptation (6⇓–8), and evolutionary contingency (9, 10). Whereas early experimental studies focused on interactions between deletion or other knockout mutations, advances in genomic technologies now allow direct tests of interactions between spontaneously occurring beneficial mutations (11⇓⇓⇓⇓–16). Studies that manipulate beneficial mutations have the potential to identify general patterns that may underlie some degree of predictability in adaptive evolutionary outcomes. For example, interactions between beneficial mutations often follow a pattern of diminishing returns epistasis, such that the marginal benefit of additional mutations declines with the fitness of the recipient genotype (11⇓⇓⇓⇓–16). Diminishing returns epistasis reflects a global interaction between mutations that interact at the level of fitness (16). This pattern is consistent with the frequent observation of decelerating fitness trajectories as populations adapt to constant environments and the observation of a negative relationship between the starting fitness of a population and its initial rate of fitness increase (17⇓–19). In apparent contrast, some theoretical work predicts an excess of positive interactions along adaptive trajectories, highlighting the need for continued research in this area (20⇓–22).

Most studies that have examined interactions that affect beneficial mutations have focused on interactions between mutations arising in a single population or in replicate populations evolved from a common ancestor (11, 12, 14, 23, 24). In these cases, there are relatively few mutational differences separating different genotypes, limiting our ability to extrapolate findings to an understanding of the influence of mutation interactions on adaptive evolution in genetically diverse natural populations. Indeed, introgression experiments indicate the importance of the broader genetic background in determining the effect of specific genetic regions (reviewed in ref. 25), and even the interaction between two focal mutations depends strongly and perhaps unpredictably on the genetic background in which they are measured (15).

A better understanding of the interaction between beneficial mutations and diverse genetic backgrounds will allow us to address whether mutations selected in one background will tend to have similar effects across different backgrounds. This question is particularly important in considering the evolution of bacterial populations in which the horizontal transfer and integration of short homologous DNA sequences—typically on the order of ∼50–500 bp in Escherichia coli (26, 27)—means that the fate of new beneficial mutations will depend strongly on their interaction with the background of recipient strains. If the influence of interactions tends to be small, the effect of a beneficial mutation will be mostly independent of its genetic background, and it can spread broadly. If interactions are common, they may prevent the spread of potentially beneficial mutations between lineages. This kind of barrier has been proposed as a component of a bacterial species concept (28).

To the extent that mutation effects differ across genetic backgrounds, it is of interest to identify attributes of those backgrounds that might explain those differences. At least three candidate mechanisms have been presented: (i) The potential for differences in the specific genetic interactions that influence the effect of a new mutation increases with genetic distance. For this reason, closely related strains are expected, on average, to respond more similarly to the same new mutation than are divergent strains (2). (ii) Genetically divergent lineages can convergently evolve similar underlying genetic architectures through, for example, selection in similar ecological niches (29). For example, a comparison of bacterial metabolic networks revealed a correlation between metabolic network architecture and ecological profile that is not explained by phylogenetic relatedness (30, 31). To the extent that genetic architecture influences the effect of mutations, organisms that have similar architectures may respond similarly to an introduced mutation even if they are not genetically closely related. (iii) The relative benefit of a mutation transferred into different lineages is determined by the initial fitness of that lineage. This possibility stems from the frequent finding of globally determined fitness-dependent negative interactions between beneficial mutations (11, 32, 33). Finally, interactions between mutations and the broader genetic background may be essentially unpredictable, perhaps depending on relatively small numbers of large effect interactions.

To examine these possibilities, we transferred four beneficial mutations that fixed in a long-term laboratory-evolved E. coli population to a diverse set of E. coli natural isolate strains (12). By measuring the fitness effect of many mutation–strain combinations, we determined the effect of interactions between the focal mutations and recipient genetic backgrounds. We also examined whether the fitness effects of the beneficial mutations were correlated with phylogenetic relatedness and/or similarity of diet niche profile—which we use as a proxy for ecological similarity—between the original and recipient genetic backgrounds. We found that the fitness of all mutations depended strongly on genetic background. This effect was not explained by differences in the genetic relatedness of the recipients and was only weakly correlated with their ecological similarity. In contrast, there was a significant relationship between mutational effects and the initial fitness of the recipient strains. This relationship followed a pattern of diminishing returns whereby more fit strains benefited less from addition of the mutations.

Results

Mutation Effects Depend Strongly on Genetic Background.

We added each of four laboratory-selected mutations—in the genes or gene regions pykF, rbs, spoT, and topA—into a series of recipient naturally isolated strains of E. coli and measured their effect on strain fitness in the environment in which they were originally selected. Recipient strains are presented in Fig. 1 and Fig. S1 in the context of a broader phylogeny of E. coli derived from genome sequences of 96 diverse E. coli strains (Tables S1 and S2, and see SI Materials and Methods for details). For all mutations, recipient strains represent a random sample with respect to the complete strain set (Fig. S2).

Phylogeny of 96 E. coli strains based on their core genome (see SI Materials and Methods for details). Recipients of beneficial mutations are indicated (red symbols). The arrow indicates strain REL606, the ancestral strain in which mutations were initially selected as part of a laboratory evolution experiment (53). Clade and phylogroup assignments are based on previous classifications (57⇓⇓–60).

Phylogeny of 96 E. coli strains based on their shared accessory genome (see Materials and Methods for details) indicating strains to which the four beneficial mutations were transferred (red dots). Arrow indicates strain REL606, the strain in which mutations were initially selected as part of a laboratory evolution experiment (53).

Recipient strains are randomly drawn from the larger phylogeny. The strains used as recipients for each gene were tested for being representative draws from core and accessory genome phylogenies. The solid black line indicates the distribution of shared branch lengths of 1,000 draws of the same number of strains as used as recipients for the relevant gene transfer from the comprehensive phylogenies presented in Fig. 1 and Fig. S1. The red line indicates the shared branch length of the actual recipient strains. P values are calculated as the fraction of bootstrapped samples with a lower branch length than the actual sample. Phylogeny (core or accessory)–mutation combinations are indicated in the title of each panel.

We initially asked if a given mutation conferred the same effect in different recipient strains. In other words, do mutation-by-genetic background interactions influence the fitness effect of a potentially beneficial mutation? In all cases, these interactions were statistically significant, revealing a biologically meaningful range of background-dependent fitness effects (Fig. 2, Table 1, and Table S1). For example, fitness effects conferred by the pykF mutation, which was introduced into 23 recipient strains, ranged from –3% to 29%, compared with an effect of 9% in the strain in which it originally evolved. Only spoT had significantly deleterious effects in any recipient strain. Although it conferred a fitness benefit of 9% in the strain in which it was originally selected, it was deleterious in seven, neutral in three, and beneficial in only one recipient strain (Table S3).

Histograms of fitness effects of transferred mutations in different strains. Fitness of each mutation in the strain in which it was originally selected is indicated by the red symbol. Dashed lines indicate the mean effect of each mutation.

Fitness effects of mutations and descriptive characteristics of recipient strains

Next we asked if different mutations tended to confer similar relative effects when they were added to the same strain. Few strains received all four mutations, so to maximize the number of recipient strains that could be considered, we tested each pair of mutations separately. We found that for four of six mutation pairs (all except topA–pykF and spoT–pykF), the relative benefit conferred by transferred mutations was mostly determined by the specific recipient strain rather than by the identity of the mutation or by the interaction between strain and mutation (Fig. 3 and Fig. S3). This result is consistent with the existence of a general strain-dependent mechanism that determines how much a strain can benefit from a transferred mutation.

Components determining the fitness effects of each mutation pair. ANOVA was used to determine variance in fitness effects explained by recipient strain, mutation, and their interaction considering those strains into which each of two mutations were added (number of strains is indicated in parentheses).

Comparison of mutational effects in recipient strains recipient to multiple mutations. For each mutation–strain combination, normalized relative fitness is calculated as the log ratio of the fitness effect of a mutation to relative fitness of the same mutation in the background in which it was initially selected. These ratios are then further normalized to the same mean for each mutation. Symbols represent these estimates, and error bars indicate 95% confidence intervals.

Mutational Effects Are Not Well Explained by Phylogeny.

We sought to test candidate factors that might explain some of the variation in mutation effect across recipient backgrounds. Specifically, we tested for a relationship between ecological, genetic, and growth attributes of recipient natural isolate strains and the effect of each introduced mutation. Where appropriate, we did this by taking into account the phylogeny of the recipient strains.

The hypothesis that closely related strains are more likely to share genotype-by-mutation interactions, and thus respond similarly to introduction of a new mutation, leads to two testable predictions. First, there will be some phylogenetic signal in the distribution of a mutation’s fitness effect across multiple recipient strains. Second, there will be a negative relationship between the fitness effects of transferred mutations and the genetic distance of recipient strains relative to the strain in which they were originally selected.

To test for a general influence of a phylogenetic signal on fitness effects, we estimated Pagel’s λ, a measure of phylogenetic signal in a response variable, considering the fitness effect of each mutation in the context of phylogenies created based on both core and accessory genomes (34). Values of λ greater than 0 indicate some amount of phylogenetic signal such that the fitness effect of a transferred mutation is more similar in related strains than expected by chance alone. Only for the fitness effects of the pykF mutation mapped onto the core genome phylogeny did the lower bound of the maximum likelihood estimate of λ allow us to reject our null hypothesis of there being no phylogenetic signal (core genome phylogeny: pykF, P < 0.01; rbs, P = 0.69; spoT, P = 0.15; topA, P = 0.99; accessory genome phylogeny: pykF, P = 0.99; rbs, P = 0.99; spoT, P = 0.99; topA, P = 0.37). Use of alternative measures of phylogenetic signal gave qualitatively consistent results except that the phylogenetic signal of the pykF mutation was lost and the effects of the spoT and topA mutations were associated with the phylogeny by some metrics (Table S4). We consider these results to indicate a weak effect of phylogeny in determining the fitness effect of mutations transferred to our recipient strains.

Metrics of phylogenetic signal for fitness effect of each introduced mutation

Of course, the fact that we cannot reject the null hypothesis that phylogeny generally does not determine the effect of introduced mutations does not mean that some kind of phylogenetic signal does not exist. For example, it could be that mutational effects tend to be similar in strains closely related to the strain in which they were selected but are randomly distributed among more distantly related strains. To evaluate this possibility, we examined the relationship between the genetic distance of each recipient strain to the donor strain and the fitness effect of each mutation. In no case did we find any linear relationship between these variables, whether mutations were considered individually or in combination (Fig. 4 A and B and Fig. S4). Quadratic and exponential models, which can accommodate a baseline of neutral mutation effects in increasingly distantly related strains, did not provide any substantial improvement in fit (Fig. S4). We conclude that the fitness effects of transferred mutations are not well predicted by the genetic similarity of donor and recipient strains.

Relationship between recipient strain attributes and fitness effect of added mutations. Fitness effect of added mutations is compared against core and accessory genome distance of recipients relative to the donor strain in which mutations were originally selected (A and B, respectively); ecological similarity, measured using Biolog profiles, of recipient strains against the donor strain (C); and strain growth rate (D). Solid points indicate strains in which mutations were initially selected. Fitness effects are presented as the log ratio of fitness effect of a mutation in a given strain relative to its effect in the original strain normalized so that all mutations have a mean of zero. Dashed line indicates best fit linear correlation.

Relationship between genetic distance and fitness effects of added mutations. The four Top panels show the relationship between the genetic distance of a recipient strain relative to the strain in which the mutation was selected (the donor strain), based on core (black symbols) and accessory (red symbols) phylogenies, and the fitness effect of the indicated mutation when added to the recipient strain. The Bottom two panels show the relationship between core (Bottom Left) and accessory (Bottom Right) genetic distance of recipient strains relative to the donor strain and the normalized fitness effects of all added mutations. Summary statistics of each relationship are shown in each panel.

Mutational Effects Are Not Well Explained by Ecological Similarity.

Genotypes adapted to a similar ecological niche may be more likely to have similar underlying genetic architectures and therefore respond similarly to a new mutation, even if they are not genetically closely related. We tested this possibility in two ways. First, we examined the relationship between a proxy for ecological similarity—based on Biolog profiles—and the fitness effect of each introduced mutation. We found a general trend for more ecologically distant strains to benefit less from transferred mutations, but the relationship was weak. When mutations were considered together, there was a marginally significant negative relationship between mutational effect and ecological distance of the recipient relative to the donor strain (omitting the donor strain: r = –0.27, P = 0.06; ρ = –0.33, P = 0.02; Fig. 4C), but the relationship was not significant for any individual mutation (Fig. S5). To account for the fact that recipient strains can have the same overall similarity relative to the progenitor but be different from one another, we also tested for a signal of mutation effect in the context of a phylogeny based on Biolog profiles. In no case was any significant phylogenetic signal observed (Table S4). Together, these results indicate that current ecological similarity, as assessed by Biolog respiration profiles, does not explain differences in the fitness effect of any of the transferred mutations.

Relationship between ecological distance and fitness effects of added mutations. The four Top panels show the relationship between the ecological distance of a recipient strain relative to the strain in which the mutation was selected (the donor strain, red symbol) and the fitness effect of the indicated mutation when added to the recipient strain. Ecological distance is based on Biolog profiles of recipient strains assessed over 94 different resources. The Bottom two panels show the relationship between the ecological distance of all recipient strains (Bottom Left), or all except the donor strains (Bottom Right), relative to the donor strain, and the normalized fitness effects of all added mutations. Summary statistics of each relationship are shown in each panel.

Fitter Recipient Strains Benefit Less from Introduced Mutations.

Recent studies have found that epistatic interactions between beneficial mutations and their genetic backgrounds tend to become increasingly negative as the fitness of the genetic background increases (11⇓⇓⇓⇓–16). If this relationship also holds across diverse recipient strains, we predict a negative relationship between the absolute fitness of a strain (measured as growth rate) and the benefit conferred by addition of a mutation. We found that the fitness effect of two transferred mutations—in spoT and pykF—tended to decrease as the growth rate of the recipient strain increased (spoT: r = –0.6, P = 0.04; ρ = –0.66, P = 0.02; pykF: r = –0.54, P = 0.01; ρ = –0.53, P = 0.01) (Fig. S6). Negative, but nonsignificant, relationships were also found for the rbs and topA mutations (Fig. S6). As judged by comparison of Akaike information criterion (AIC) scores, no tested nonlinear relationship gave a substantially improved fit (Fig. S6). To increase our power to detect a relationship between growth rate and mutational effect, we also considered all mutations together (see SI Materials and Methods for details on the normalization procedure). When we do this, we find a strong overall signal of dependence of fitness effects of a transferred mutation and the growth rate of the recipient strain (r = –0.48, P < 0.001; ρ = –0.48, P < 0.001) (Fig. 4D). This relationship remains significant when two outlying strains are omitted from the analysis (r = –0.38, P < 0.001; ρ = –0.41, P < 0.001) (Fig. S6).

Relationship between growth rate and fitness effects of added mutations. The four Top panels show the relationship between the growth rate of a recipient strain relative to the strain in which the mutation was selected (the donor strain, red symbol) and the fitness effect of the indicated mutation when added to the recipient strain. The Bottom two panels show the relationship between the growth rate of all (Bottom Left), or all except two outlier strains with low and high growth rates (Bottom Right), recipient strains relative to the donor strain and the normalized fitness effects of all added mutations. Summary statistics of each relationship are shown in each panel.

To test if taking phylogeny into account might influence relationships between growth rate and mutation fitness effects, we performed phylogenetic generalized least squares (GLS) analyses comparing several different models of trait evolution (Table S5). In four of eight cases, the best-fitting models did not account for the phylogenetic relationships between recipient strains. Considering the core genome, phylogeny improved the model fit for the spoT mutation, but the relationship between growth rate and mutation fitness effect remained significant (P = 0.038). In fact, growth rates of recipient strains exhibited some phylogenetic signal with respect to the core genome phylogeny (λ = 0.43, P = 0.024), but not the accessory genome or Biolog phylogenies, which might explain the phylogenetic signal observed for some mutation–core genome phylogeny combinations.

Even when a factor does not individually explain a significant proportion of the variation in a response, it may still contribute in combination with other factors. To test for interactions between initial fitness, genetic and ecological similarity, and the effect of each introduced mutation, we performed partial least squares (PLS) regressions. Whereas principal component regression seeks to determine orthogonal combinations of variables to maximize the amount of variation explained in those variables, PLS determines combinations of variables that maximize the amount of variation in the response explained. This approach is robust to having few observations relative to the number of predictor variables, as was the case especially for the topA mutation datasets considered here. For the pykF, spoT, and topA mutations, growth rate explained the largest proportion of variance in fitness effects (Fig. 5).

PLS regression to determine contribution of growth rate, core and accessory genome distance, and biolog distance to fitness effects of each transferred mutation. The original strains are excluded from this analysis because they have a disproportionate influence on variation of genetic and ecological distance measures. Color of bars indicates the contribution of each strain attribute to that component. Only the first four components are shown.

SI Materials and Methods

Beneficial Mutations and Strain Construction.

Five beneficial mutations—occurring in the genes or gene regions rbs, topA, spoT, glmUS, and pykF—were identified as the first to fix in a long-term evolving population of E. coli REL606 (12). We individually introduced four of these mutations into a series of natural isolate recipient strains (Table S1), omitting the mutation upstream of glmUS because of low transmission efficiency. Mutations were introduced into the chromosome of natural isolate strains using a suicide vector-based approach (12). As well as the four beneficial mutations, a mutation conferring an Ara– phenotype (AraA 92D; the same mutation that distinguishes REL606 from its Ara+ derivative, REL607) was added into each natural isolate strain. In competition experiments, this marker allowed us to distinguish progenitor from constructed strains on TA indicator media.

The evolved mutation in pykF involved the insertion of an IS150 element (61). Multiple copies of this element exist in many E. coli genomes, making this a difficult mutation to transfer through the recombination-based approach that we used. For this reason, we instead used an in-frame 603-bp deletion pykF allele. These insertion and deletion mutations have the same effect on fitness in the genetic background in which the insertion allele actually occurred (12).

To control for the possibility that confounding secondary mutations occurred during strain constructions, we isolated a control strain that did not contain that replacement allele from the same construction lineage as the one that produced the successful allele replacement strain. This strain will have any mutation that occurred during construction up until the last step of the process: excision of the introduced suicide plasmid from the recipient chromosome, leaving behind the introduced (in constructed strains) or original (control strains) target allele. A strain containing the introduced allele was only used in this study if the fitness of its paired control strain was unchanged from the original progenitor strain.

Bacterial Strains.

Natural isolate strains used as recipients for introduced beneficial mutations were chosen from a collection of 96 strains collected and sequenced as part of a Broad Institute project and obtained from the Michigan State University Shiga Toxin-Producing Escherichia coli (STEC) Center, and from strains obtained and used in a previous study by one of the authors (www.broadinstitute.org/) (52) (Table S2). Genome sequences of strains were downloaded from the Broad Institute (https://olive.broadinstitute.org/projects/Escherichia%2520coli%2520Antibiotic%2520Resistance) or obtained by de novo Illumina sequencing (see Genome Sequencing). The recipient strains used in this study are detailed in Table S1.

We sought to focus on intergene epistasis by choosing recipient natural isolate strains that had the same amino acid sequence at the focal transferred gene as the donor REL606 strain. For this reason, the identity of potential recipient strains differed for the mutations in spoT and topA. The rbs and pykF mutations are large deletions that result in loss of function of target gene(s). In these cases, intragene interactions are not relevant, so these mutations were transferred into strains without regard to the original allele. The strains that were successfully constructed are a subset of the potential recipients because we were unable to successfully add either the focal beneficial mutation or the araA marker mutation into some target strains.

Fitness Competition Experiments.

The fitness of constructed strains was measured using direct competition assays in the same medium used in the evolution experiment in which the mutations manipulated in this work were originally selected (12, 53). All competitions were carried out between strains with distinct Ara markers, which allowed the two types to be distinguished on TA indicator media (53). Except for control competitions, the two strains otherwise differed only by the presence or absence of an introduced mutation. Competitions were carried out in Davis Mingioli medium supplemented with 25 μg/mL glucose (DM25). Before each fitness assay, the two competitors were acclimated to the competition environment by growing them separately for two 1-d growth cycles in the same environmental conditions used in the competition. Each competitor was then diluted 200-fold into fresh DM25 medium and a sample immediately plated on TA agar to estimate the initial densities of the competing strains. At the end of 1 d of competition, a further sample was plated on TA agar to obtain the final density of each competitor. The fitness of the evolved strain relative to the ancestor was calculated as ln[NC(f)/NC(i)]/ln[NWT(f)/NWT(i)], where NC(i) and NC(f) represent the initial and final density of the constructed clone, respectively, and NWT(i) and NWT(f) represent the initial and final density of the original wild-type clone, respectively. All assays were carried out with at least threefold replication. In some natural isolate strains, the introduced Ara– marker was not neutral. In these cases, we normalize relative fitness estimates to account for the marker effect. Control competitions to account for marker cost were carried out at the same time as competitions to measure the effect of the introduced mutation.

Genome Sequencing.

Genomic DNA was purified using Qiagen DNeasy Blood and Tissue and Promega Wizard Genomic DNA Purification kits. Genomic DNA was eluted in nuclease-free water. The concentration of dsDNA was determined using PicoGreen. Quantified DNA was prepared for multiplexed sequencing, using Illumina Nextera kits following a modified protocol (62). In most cases, raw reads were processed using Trimmomatic to remove Nextera PE adapter sequences, and quality control was performed on all sequences with fastQC (63, 64). Reads were assembled into contigs using VELVET and contigs ordered using MAUVE following the suggested approaches outlined in ref. 65.

Phylogeny Construction.

Core genes (“panorthologous” genes shared across all recipient strains) and accessory genes (genes shared among a subset of recipient strains) were identified using a previously described pipeline (66). Briefly, the pipeline first identifies putative core genes using National Center for Biotechnology Information BLASTP (release 2.2.16) to analyze the total pool of genes for sequence similarity. Homologs were identified as those gene pairs that had BLAST hits in both directions within a bit score threshold scaled by the bit score of the self-hit of the query gene. Pairs of homologous genes were grouped into families and panorthologs identified as genes from homolog families with exactly one gene from each genome. We scanned bit score thresholds of 0.1–0.9, choosing a value of 0.5, which maximized the number of panorthologs that were identified. This analysis identified a total of 1,648 panorthologs across the 27 genomes that were considered. A total of 7,017 accessory genes were identified as being present in at least one, but not all, strains. Adding subsequent stringency filters to the panortholog gene set to require a limit of <20 or <5 amino acid differences from the consensus sequence in the trimmed alignments of each gene across all genomes reduced the size of the core genome to 1,560 or 1,442 genes, respectively, but did not qualitatively affect the outcome of any subsequent analysis.

An independent approach to identify core and accessory genome regions on the basis of shared DNA sequence windows was implemented in PANSEQ (55). Core regions were defined as regions of 250 bp present in an arbitrary reference strain that were present at a match of >80% identity in all other strains. A phylogeny was built from the core genome by concatenating core regions for each strain and performing a multiple sequence alignment. Variable sites in this alignment were extracted as a SNP file. We used PANSEQ to identify core and accessory genomes for both the 28 strains used in our experiments (27 recipients and 1 donor) and for the broader collection of 99 strains described in Bacterial Strains. Results of the PANSEQ analysis resulted in relationships between strains that were largely unchanged from those determined by the whole gene analyses (55) (Fig. S7). Core and accessory genomes determined using PANSEQ were used for all analyses presented here.

Core and accessory genomes were used to build phylogenies that were used to test for a phylogenetic signal in determining the effect of mutations introduced to the different recipient strains. PhyML was used to build a maximum likelihood tree of the core genome. For the accessory genome, a binary input file indicating the presence/absence of each accessory gene in each strain was analyzed using default parameters of PARS in PHYLIP (56).

Biolog Respiration Profile.

The respiration profile of strains was measured using Biolog PM1 plates. These plates allow estimation of the respiratory activity of each strain on each of 95 distinct substrates (Biolog). Before growth in Biolog plates, each strain was grown overnight in lysogeny broth, then concentrated by centrifugation, and resuspended in PBS to an OD600 of 0.372. An aliquot of these cells was mixed with inoculating fluid containing a dye and transferred to each well of a PM1 plate. The plate was incubated at 37 °C, and its OD562 was measured at 2-h intervals until 12 h, 4-h intervals until 24 h, and then at 48 h. Respiration in each substrate was quantified as the area under the curve. A neighbor-joining tree was constructed using Biolog data using the program Neighbor in PHYLIP (56).

Growth Rate Analysis.

The maximum growth rate of each progenitor strain was estimated by growing strains in 96-well plates containing DM medium supplemented with 500 μg/mL glucose and measuring changes in OD450 in a VersaMax plate reader (Molecular Dynamics). All strains were preconditioned in the assay media for two growth cycles before estimation of their growth rate. The higher concentration of glucose used in these assays compared with the competition assays was necessary for strains to achieve sufficient growth for reliable spectrophotometric detection. A custom R script was used to estimate the maximum growth rate of each strain.

Statistical Analysis.

All analyses were carried out in R (version 3.1.1) (67). The nlme package was used to perform nonlinear regressions. Recipient strains and introduced mutations were treated as fixed effects because our focus was on identifying and explaining mutation–genetic background interactions between the specific strains and mutations considered here. We use PLS regression to account for correlation between combinations of variables tested to explain the effect of mutations introduced into recipient strains. Whereas principal component regression seeks to determine orthogonal combinations of variables to maximize the amount of variation explained in those variables, PLS determines combinations of variables that maximize the amount of variation in the response explained. PLS regression was implemented using the package Pcr. Phylogenetic general least squares regression analyses were performed using the functions gls and nlme in the packages Ape and Nlme, respectively. Four phylogenetic correlation matrix models were considered: (i) Brownian, which assumes that correlation of trait values is determined by shared evolutionary history; (ii) Pagel, a derivative of the Brownian model where trait correlations are scaled by a constant between 0 and 1, indicating the influence of shared, relative to independent, evolutionary history; (iii) Ornstein–Uhlenbeck, a derivative of the Brownian expectation where traits are “elastic,” having a tendency to return to a mean value; and (iv) Blomberg, a derivative of the Brownian model where the rate of trait evolution can change over time. Tests for phylogenetic signal were performed using the following functions: multiPhylosignal in the package Picante (Blomberg’s K), phylosig in the package Phytools (Pagel’s λ and Blomberg’s K), and abouheif.moran in the package Adephylo (Moran’s I and Cmean). The functions pd.calc and pd.bootstrap in the package Caper were used to compare the distance separating the strains used here to a distribution of distances between 1,000 randomly chosen sets of the same number of strains from the 96 sequenced strains contained in our overall phylogeny. For each mutation, we use the strain in which it occurred as the reference point for genetic and ecological distance regression analyses (12) (Table S1—rbs, REL606; topA, TC720; spoT, TC960; pykF, TC941). All attributes of recipient strains are presented in Table S3.

Discussion

Studies that have examined the effect of mutation interactions on potentially beneficial mutations have typically examined interactions that occur between a small set of focal mutations in the context of the same strain background (11, 12, 14, but see refs. 15, 35). We have extended this approach by estimating the effect of interactions between the broader genetic background and four mutations selected because they conferred a benefit in a population that was part of a long-term evolution experiment. We find that epistasis has a major influence on mutational effects. These effects depended strongly on the fitness of a recipient strain, following a pattern of diminishing returns epistasis, where the benefit of introducing a mutation declined with the fitness of the strain to which it was added. By contrast, mutational effects depend only weakly on genetic and ecological relationships between recipient strains.

We did not find a consistent phylogenetic signal in the effect of the mutations we considered, indicating that the genetic basis of interactions between mutations and recipient genetic backgrounds was not well predicted by the genetic relationships used to construct our core or accessory genome phylogenies. One explanation for this finding is that there may be a small number of genes, or perhaps just sites in those genes, that interact with the transferred mutations and similarity of these genes/sites is not represented by genome-level phylogenies (36). However, our finding that mutation effects can be explained by the fitness of recipient strains, which itself does not have a strong phylogenetic signal, suggests an alternative explanation: that mutation effects are determined in large part by interactions occurring at the scale of global processes.

Mutational interactions clearly affect the specific genetic and fitness trajectory a population can follow, and recent findings of general patterns of interactions suggest that they may cause aspects of adaptive trajectories to be predictable (16, 37, 38). Several studies have found a trend toward diminishing returns epistasis between focal beneficial mutations (11, 12, 23). That is, the benefit conferred by a mutation tends to decline as it is added to more fit backgrounds. This kind of interaction can explain patterns of fitness increase and mutation accumulation in a well-studied laboratory-evolved population (16, 37, 38). Our finding of diminishing returns epistasis in the effect of selected mutations added to divergent natural isolate strains supports the idea that mutation effects often depend on some global attribute of recipient strains, rather than being determined idiosyncratically by their specific genetic background.

What could account for a strong relationship between genotype growth rate and the benefit conferred by a transferred mutation? Several studies have proposed a mechanistic basis of mutation interactions as they affect a specific enzyme or biochemical pathway (11, 32, 33, 39, 40). The bases of the beneficial effects of the four mutations we consider are not known, so we cannot offer any specific explanation that takes into account the particular mutations and backgrounds used in our study. Nevertheless, it is worthwhile to consider a general explanation that follows from the principles of metabolic control theory: that a key target of selection has a saturating form such that further improvements confer diminishing benefits. A candidate target process is translation, which constitutes a major energetic investment for bacterial cells. Cells must balance the need for sufficient translational capacity to allow expression of necessary genes, while minimizing wasteful investment in unused capacity (41). Translation provides a means to mediate interactions between seemingly disparate mutations, for example, considering three of the mutations examined in this work: Deletion of rbs genes might provide an advantage by reducing energy expenditure and ribosome allocation to unnecessary gene expression; topA impacts gene expression by altering DNA supercoiling, thereby altering promoter activity and reallocating translational resources; and spoT changes gene expression patterns in a way consistent with an effect on translational activity (42).

Interactions between mutations and genetic backgrounds also impact the adaptation of populations by influencing their ability to benefit from horizontally transferred mutations. Transfer of DNA that replaces a corresponding piece of the recipient genome through homologous recombination is common in many bacteria (43), including E. coli (26, 27, 44), and can be a major determinant of adaptation (45, 46). Horizontal transfer disconnects a beneficial mutation from the particular genetic background in which it is initially selected, so that epistatic interactions can determine both its own fate and that of recipient lineages. This process is clearly relevant to the nature of bacterial species. One bacterial species concept proposes that ecotypes—groups of genotypes that have similar ecological distribution and capability—can be recognized by being subject to purifying selective sweeps (46). Clonal selective sweeps of entire haplotypes occurring within ecotypes can create a genetic cohesion that identifies a species grouping. Genetic interactions between horizontally transferred beneficial mutations and recipient genetic backgrounds can complicate this expectation. If beneficial mutations that arise in a focal strain tend to have similar effects in the strains that make up an ecotype, their transfer might slow clonal selective sweeps within ecotypes, instead allowing local genetic sweeps in the vicinity of the mutation and making it harder to recognize an ecotype as a genotypic cluster (47, 48). If a pattern of diminishing returns epistasis similar to the one found here applies in natural environments, this effect could be exaggerated, as less fit lineages benefit more than more fit lineages from transferred mutations, slowing clonal selective sweeps and at least transiently acting to increase ecotype genetic diversity. If a transferred mutation has beneficial effects across multiple ecotypes, its increase in frequency could produce a pattern of population-level genetic change that is hard to interpret with reference to ecotypes (49).

E. coli does appear to have some incipient ecotype structure, being composed of at least six clades (known as phylogroups) that are associated with a degree of environmental specialization (26, 50). The fact that these groups did not explain any of the variation in the fitness effects of our transferred mutations indicates the potential for gene transfer and subsequent selection to homogenize phylogroups. However, genetic diversity between sequenced strains is consistent with horizontal gene transfer being greater within than between phylogroups, and strongly selected lineages appear to have disproportionately low levels of gene transfer from other strains (27, 51). Explanations for the apparent discrepancy between potential and realized mutation transfer include a lower rate of successful gene transmission, perhaps due to lower rates of homologous recombination between more divergent strains, or ecologically mediated physical isolation of strains belonging to different phylogroups.

Our results provide direct evidence that the effects of beneficial mutations vary dramatically between divergent strains of the same bacterial species. These effects are more strongly predicted by the initial fitness of a recipient strain than by genetic or ecological similarity. This result supports an accumulating body of work consistent with an important role of global attributes, rather than specific genetic interactions, in mediating mutational effects, whereby more fit strains benefit less from a new beneficial mutation than do less fit strains. Our findings strengthen the hope of identifying general patterns of mutation interactions that lead to some predictability of adaptive evolution.

Materials and Methods

Beneficial Mutations and Strain Construction.

As well as the four beneficial mutations, a mutation conferring an Ara– was added into each natural isolate strain. In competition experiments, this marker allowed us to distinguish progenitor from constructed strains on tetrazolium arabinose (TA) indicator media. Details of strain constructions are presented in SI Materials and Methods.

Bacterial Strains and Fitness Estimates.

Natural isolate recipient strains were chosen from a collection of strains sequenced as part of a Broad Institute project and from strains used in a previous study (www.broadinstitute.org/) (52) (Table S1). We sought to focus on intergene epistasis by choosing recipients that had the same amino acid sequence at the focal transferred gene as the donor REL606 strain did. For this reason, the identity of potential recipient strains differed for the mutations in spoT and topA. The rbs and pykF mutations are large deletions that result in loss of function of target gene(s). In these cases, intragene interactions are not relevant. The strains that were successfully constructed are a subset of the potential recipients because we were unable to successfully add either the focal beneficial mutation or the araA marker mutation into some target strains, or secondary mutations were detected during control screens. Inability to introduce a focal mutation to a strain could indicate a lethal genetic interaction, but this was the case for only one strain–recipient strain combination, so any potential bias is unlikely to be strong. Fitness effects were estimated as the ratio of Malthusian parameters (53). An alternative recommended fitness estimate, the selection rate constant calculated as the difference of Malthusian parameters, is strongly correlated (54). Raw competition counts are available from the associated Dryad Repository.

Genetic and Diet Profile Phylogeny Construction.

Core and accessory genomes were estimated using whole genes and 250 base windows, identified using PANSEQ (55), as the unit of comparison. These approaches gave similar results, and PANSEQ results are used here (Fig. S7). Core and accessory genomes were used to build phylogenies that were then used to test for a phylogenetic signal in the effect of mutations introduced to the different recipient strains. PhyML was used to build a maximum likelihood tree of the core genome. For the accessory genome, a binary input file indicating the presence/absence of each accessory gene in each strain was analyzed using default parameters of PARS in PHYLIP (56). The diet profile of strains was measured using Biolog PM1 plates. These plates allow estimation of a combination of growth rate and respiratory activity of each strain on each of 95 distinct substrates (Biolog). A neighbor-joining tree was constructed using Biolog data using the program Neighbor in PHYLIP (56).

Comparison of phylogenies constructed from different approaches (“gene,” whole gene, and “panseq,” DNA windows of size 500 bp) and different datasets (“core” and “accessory” genomes). Approaches and datasets are described in SI Materials and Methods. Mantel tests were used to compare phylogenies. Significant correlations indicate that phylogenies have a higher degree of congruence than expected by chance.

Acknowledgments

This work was funded by National Science Foundation Grants DEB-0844355 (to T.F.C., V.S.C., and F.B.-G.M.) and DEB-1253650 (to T.F.C.).

Blood-sucking sand flies from disparate global regions have a predilection for feeding on the marijuana plant (Cannabis sativa), and the findings hint at a potential avenue for controlling sand flies, which can transmit leishmaniasis.