Abstract

Background

Genetic interactions pervade every aspect of biology, from evolutionary theory, where they determine the accessibility of evolutionary paths, to medicine, where they can contribute to complex genetic diseases. Until very recently, studies on epistatic interactions have been based on a handful of mutations, providing at best anecdotal evidence about the frequency and the typical strength of genetic interactions. In this study, we analyze a publicly available dataset that contains the growth rates of over five million double knockout mutants of the yeast Saccharomyces cerevisiae.

Results

We discuss a geometric definition of epistasis that reveals a simple and surprisingly weak scaling law for the characteristic strength of genetic interactions as a function of the effects of the mutations being combined. We then utilized this scaling to quantify the roughness of naturally occurring fitness landscapes. Finally, we show how the observed roughness differs from what is predicted by Fisher's geometric model of epistasis, and discuss the consequences for evolutionary dynamics.

Conclusions

Although epistatic interactions between specific genes remain largely unpredictable, the statistical properties of an ensemble of interactions can display conspicuous regularities and be described by simple mathematical laws. By exploiting the amount of data produced by modern high-throughput techniques, it is now possible to thoroughly test the predictions of theoretical models of genetic interactions and to build informed computational models of evolution on realistic fitness landscapes.

Keywords

EpistasisEvolutionFitness landscapesGenetic interactionsYeast

Background

Genetic interactions [1] have shaped the evolutionary history of life on earth. They have been found to limit the accessibility of evolutionary paths [2], to confine populations to suboptimal evolutionary states and, on larger time scales, to control the rate of speciation [3]. Epistatic interactions can also be relevant to the development of complex human diseases such as diabetes [4]. Complex traits and diseases are determined by a multiplicity of genomic loci [5], whose independent effects and interactions [6] are often necessary to understand the phenotype of interest. Despite the broad implications of epistatic interactions, a quantitative characterization of their typical strength is still lacking. In this study, we consider growth rate in yeast as an example of a complex trait modulated by genetic interactions.

Previous studies [7–10] on the relation between the growth effects of a mutation and its epistatic interactions have often been based on a handful of mutations, and only in recent years has anecdotal evidence started being replaced by robust statements based on large data sets. Perhaps the most impressive of these datasets is the one made publicly available with the publication of the article entitled 'The genetic landscape of a cell' by Costanzo et al. [11]. The genome of the budding yeast Saccharomyce cerevisiae includes approximately 6,000 genes, about 1,000 of which are essential. Viable mutants can be constructed by knocking out any of the approximately 5,000 non-essential genes, by reducing the expression of the essential genes, or by partially compromising the functionality of the gene products. The dataset (see Additional file 1, Figure S1) has been compiled with the growth rates of about 5.4 million double knockout mutants, a sizable fraction of all possible double knockout mutants in yeast. Supported by the Costanzo et al. dataset, we consider the fundamental question of whether mutations with larger effects have stronger genetic interactions.

Results and discussion

An unbiased definition of genetic interactions

A basic approach to study genetic interactions is to consider two mutations with known effects on a quantitative trait, and to measure their combined effect in the double mutant [12]. Given [11, 13] the growth rates of a wild type S. cerevisiae strain (g00 = 1) and of two single knockout mutants (g01 and g10), the growth rate of the double knockout mutant (g11) is adequately predicted by a multiplicative null model:

g11/g00=g01/g00g10/g00.

Equivalently, defining 'log growth' as the logarithm of the relative growth rate,

G=log2g/g00,

the log growth of the double knockout mutant is predicted by an additive null model (Figure 1a):

Figure 1

The log growth rates of two mutations combine additively. (a) The average effect of a double knockout (G11) as a function of the effects of the single knockouts (G01 and G10) is G11 = G01 + G10. Experimental mean +/- standard deviation (blue line and blue shaded area) and prediction of the additive null model (red line). (b) Given two mutations, there are four possible mutants with their corresponding log growth rates (black dots). If three of the four log growth rates are known, the fourth one can be predicted by a linear extrapolation (red plane), and epistasis can be defined as the linear deviation from such prediction (red arrow). The magnitude of the deviation is the same regardless of which three of four mutants are chosen.

G11=G01+G10.

Epistatic interactions are identified as deviations from the null model, but several non-equivalent alternatives exist for quantifying these deviations [14]. The most common definition of epistasis considers the difference between the measured and the predicted growth rates for the double knockout mutant [11]:

e=g11g00-g01g00g10g00

Importantly, this definition of e subtly constrains the possible values of epistasis. In fact, when combining very deleterious mutations, e cannot be large and negative even when the double knockout mutant is a synthetic lethal mutant:

e=0-g01/g00g10/g00≈0,ifg01<<g00andg10<<g00.

In order to avoid a priori constraints on the intensity of epistasis, genetic interactions can be defined as the ratio between the measured and predicted relative growth rates, leading to:

E=log2g11g00-log2g01g00-log2g10g00.

As an example, E = +1 indicates a double mutant whose growth rate is twice as large as would be expected based upon the multiplicative null model, whereas E = -1 indicates a double mutant whose growth rate is half as large as predicted. This definition of epistasis as fold deviation in the multiplicative model for growth rates is equivalent to a natural definition of epistasis as linear deviation in the additive model for log growth rates (Figure 1b):

E=(G00+G11)-G01+G10=(G11-G01-G10).

A second bias of the common definition of epistasis is that e depends on the choice of which genotype is labeled as 'wild type' or '00', a choice which is always arbitrary, but more obviously so when studying engineered organisms or populations evolving in alternating environments [15]. By contrast,

E=G00+G11-G01+G10

depends only on which pair of genes is considered, being a geometric measure for the 'curvature' of the fitness landscape (Figure 1b).

The definition of E has found some favor in the theoretical literature [7, 16], but it is not routinely used to analyze experimental data apart from rare exceptions [8, 17]. Its main drawback is that synthetic lethals have a log growth rate of -∞, and require a separate although simpler analysis in which lethal interactions can simply be counted. The definition of E proves instead to be extremely valuable when quantifying the strength of non-lethal genetic interactions.

Epistatic interactions scale weakly with mutational effects

With the appropriate definition of epistasis, a simple relation between the growth rate effects of two mutations and the expected strength of their interaction emerges.

Let us consider two groups of mutations; in the first group, all mutations have log growth effect G01, and in the second group, all mutations have log growth effect G10. We can then build all possible double mutants obtained by combining one mutation from each group. In the absence of epistasis, all the double mutants have a log growth rate

G11=G01+G10,

and the distribution of genetic interactions is sharply peaked at E = 0. When epistasis is present, the distribution of genetic interactions has, in general, non-zero mean and standard deviation. Experimentally, however, the mean of genetic interactions is close to zero (this is why the null model remains approximately valid) (Figure 1a; Figure 2d). Even when the mean interaction is vanishing, the difference between the experimental dataset and the ideal case without interactions can be quantified by the finite value of the experimental standard deviation σ(G01, G10), which provides a numerical estimate for the characteristic strength of epistatic interactions.

Figure 2

The strength of epistatic interactions scales with the log growth effects of the interacting knockouts. (a) Each dot represents the variance of several thousand epistatic interactions binned according to the log growth effects of the two single knockouts, G01 and G10. The blue surface is the phenomenological fit:varG01,G10=0.079×2G01G10/G01+|G10|.(b) Slices of the plot in (a) for G01 = constant. The dots are the same as in (a), and the solid lines represent the corresponding slice of the one-parameter fitting surface. (c) Diagonal slice of the plot in (a) with finer bins (G01 = G10 within 20%, G = mean(G01, G10)). The blue shaded area is the 25 to 75% confidence interval computed by bootstrap; the red line (var(G, G) = 0.079 G) is computed from the phenomenological model, and the dashed gray line, for which var(G, G) is proportional to G2, represents the lower bound to the slope predicted by the Fisher's geometric model. (c, inset) The epistatic interactions between beneficial mutations are vanishingly small, independently of the effect of the combined mutations. (d) Probability density functions p(E') for the strength of genetic interactions between two deleterious knockouts with similar log growth effects. Different colors correspond to knockouts with different effects: the growth rates effects of the single knockouts being combined are close to -38% (red), -22% (yellow), -12% (green), -6% (blue), and -3% (purple). Each curve has been rescaled so that all distributions have a standard deviation = 1. The left tail of the distributions displays a fat tail, describing the occurrence of strong negative genetic interactions (for comparison, the dashed-dotted black line is a normal distribution).

In order to produce reliable numerical results, thousands of growth rates are necessary to characterize the probability distribution of epistasis. We analyzed the Costanzo et al. dataset by binning pairs of mutations according to the log growth effects of their single knockouts G01 and G10, using the method described above to outline the probability distribution of epistasis. We chose bin sizes that grow exponentially with G in order to ensure an approximately constant number of data points in each bin (see Materials and Methods; see Additional file 1, Figure S2). Most bins contain from thousands to tens of thousands of data points. For each bin, we computed

var(EG01,G10),

that is, the variance of the random variable E relative to the bin labeled by growth rates G01 and G10. In the rest of the paper we will refer to such variance as var(G01, G10), emphasizing that the variance in the strength of epistatic interactions is, eventually, a function of G01 and G10 (Figure 2a). The square root of the variance, σ(G01, G10), then represents the expected strength of epistasis as a function of the independently varying effects of the two single knockouts. A natural expectation for the dependence of epistasis on the effect of the combined mutations comes from rescaling Figure 1a; if all the log growth effects of single and double knockouts increase by a factor of two, then the strength of epistasis should also increase by a factor of two. Unexpectedly, however, when combining deleterious mutations, the strength of epistatic interactions does grow with the effects of the mutations that are combined, but the dependence is much weaker; when the effect of both single knockouts is doubled, the strength of epistasis increases only by a factor of √2 (Figure 2).

In more detail, we observed that if the effect of the first knockout (G01) is held constant, the dependence of the variance of epistasis on the effect of the second knockout (G10) is well approximated by a Michaelis-Menten law (Figure 2b):

var(G10)=v|G10|K+|G10|.

When the effects of both knockouts are free to vary, the requirement that the variance is a symmetric function of its two variables, G01 and G10, implies that K = |G01| and that v is proportional to G01. A one-parameter function which fits the seen variance over the whole range of deleterious fitness effects (Figure 2a) is then:

varG01,G10=2c|G01G10||G01|+|G10|with c=0.079.

This functional form can also be obtained from a simple model based on diffusion in fitness space (see Additional file 1, Supplementary text 1). An even simpler phenomenological fit, although slightly less accurate, is:

varG01,G10=c√(G01G10)

(see Additional file 1, Figure S3). Importantly, these functions capture two major features of the data; first, epistasis vanishes when G01 or G10 = 0; second, when the effects of the two knockouts are similar (G01 = G10 = G along the diagonal of the surface in Figure 2a), the variance of epistasis is approximately proportional to G (Figure 2c):

var(G01,G10)=cG.

The scaling described above is seen only for deleterious knockouts. When combining the beneficial knockouts in the dataset instead, the strength of epistasis is close to zero (Figure 2c, inset). This might be because the slightly beneficial knockouts are not adaptive mutations, but simply remove genes that are not needed in the conditions chosen for the experiment, so that their interactions are likely to be negligible. However, in apparent contrast to this observation, recent studies [8, 18] on adaptive mutations in Escherichia coli suggest that genetic interactions between adaptive mutations are mostly negative. In fact, during adaptation, the prevalence of negative interactions is likely to be caused by biased sampling, because the mutations that fix in the population are likely to be the ones that solve environmental or biological challenges for an organism. Diminishing returns arise because the appearance of multiple 'solutions' to the same challenge is not necessarily preferable over the presence of a single solution. Rather than focusing on mutations that fix during a bout of adaptation, the Costanzo et al. dataset includes a large fraction of all possible pairs of genes in the yeast genome. Because for most pairs the two genes are involved in unrelated biological processes, interactions are often vanishingly small. We did observe, however, that the distribution of epistatic interactions is asymmetric, with a heavy tail of deleterious interactions (Figure 2d).

Experimental uncertainty generates spurious epistatic interactions

When inferring genetic interactions from experimental data, it is important to take into account that each measured growth rate is affected by some uncertainty, and that measurement errors in the growth rates could erroneously be interpreted as genetic interactions. Importantly, for each single and double mutant, the Costanzo et al. dataset provides the mean growth rate together with its estimated experimental uncertainty (the growth rate of each mutant being measured at least four times).

In order to quantify the effect of the experimental uncertainty on the inferred epistatic interactions, we constructed a number of mock datasets, assuming that the null model without epistatic interactions described biology exactly. In these datasets, each single knockout had the same growth rate as in the original dataset, and each double knockout had a growth rate equal to the product of the relative growth rates of the corresponding single knockouts. We then randomized the mock datasets by shifting each growth rate by a random amount sampled from a Student's t-distribution, with width depending on the corresponding experimental uncertainty reported in the original dataset (see Additional file 1, Supplementary text 3). As expected, analysis of these 'noisy' datasets revealed some epistasis, clearly caused by our addition of experimental noise rather than by any biological mechanism. We found that for pairs involving beneficial or neutral mutations, the variance computed in the mock datasets was comparable to or even greater than the variance observed in the original dataset (Figure 3a, black curves; Figure 3b, blue regions). This fact provides an important internal control, suggesting that the experimental noise has not been underestimated. In spite of this, for pairs of knockouts with substantially deleterious effects, experimental noise accounted for less than half of the total observed variance, with the rest representing genuine biological interactions (Figure 3a, red curves; Figure 3b, red regions).

Figure 3

Experimental noise does not account for all of the observed variance of epistasis. (a) Comparison of experimentally measured variance (solid lines; shaded areas: 25 to 75% confidence intervals) and variance caused by experimental noise (dashed lines). If one of the two mutations is neutral, noise accounts for all of the observed variance (black). When deleterious mutations are combined, noise accounts for less than half of the observed variance (red, G01 ≈ -0.7). (b) Ratio between total observed variance and noise-generated variance as a function of the log growth of the knockouts being combined. For deleterious knockouts, the ratio can be significantly greater than 1.

We then decomposed the variance observed in the original dataset into a contribution produced by experimental uncertainty and a contribution of biological origin; the strength of epistatic interactions was finally computed as the square root of the biological part of the variance. For deleterious knockouts, the relative difference between epistasis computed from the raw data and from the data after subtracting the experimental noise was less than 30%, emphasizing the significant but not overwhelming contribution of experimental noise to the observed variability. Figure 2(a-c) represents the 'biological' part of the observed epistasis; before subtracting the contribution of the experimental uncertainty, the plots are qualitatively similar, but quantitatively slightly different (see Additional file 1, Figure S4). Importantly, because variances are additive, the estimated contribution of the experimental uncertainty to epistasis is largely independent of the choice of the statistical distribution used to model experimental uncertainty. In two instances, however, the unknown details of the full distribution of experimental noise are important; when outlining the distribution of epistatic interactions (Figure 2d) and when describing the probability to observe sign epistasis (Figure 4b). In those two figures, we plotted the raw data, and did not attempt to deconvolve the contribution of experimental uncertainty.

Figure 4

Sign epistasis is less likely to occur between mutations with large effects. (a) Examples of a smooth landscape with paths of monotonically increasing fitness (left) and a rugged landscape characterized by reciprocal sign epistasis (right). (b) Experimentally measured probability of observing sign epistasis as a function of the log growth of two single knockouts with similar effects (G01 = G10 within 20%, G = mean(G01, G10)). The blue shaded area is the standard error of the mean computed by bootstrap.

Comparison between theory and experiment

The scaling of epistasis observed in the Costanzo et al. dataset (Figure 2) is in sharp contrast to the predictions of Fisher's geometric model [19], a popular model of epistasis in which genetic interactions emerge from geometry. As we saw, when the effects of the two knockouts are similar (G01 = G10 = G), the variance of epistasis is approximately proportional to G. By contrast, in the Fisher's model, the variance var(G, G) would grow faster than G2 (Figure 2c; see Additional file 1, Supplementary text 2), a much stronger dependence than the linear dependence observed experimentally.

A concrete numerical example can highlight the importance of the weaker-than-expected scaling of epistasis described in this study. Let us consider two gene knockouts, each of which reduces the relative growth rate by 5%, from 1.0 to 0.95. According to the multiplicative null model, the growth rate of the double knockout will be approximately 0.952, or approximately 0.90. The questions now are: What kind of deviations could be expected around 0.90? Would a growth rate of 0.85 be surprising? What about a growth rate of 0.50?. Let us use the analytic fit discussed in the previous section

g01=g10=0.95,

Then

G01=G10=log20.95=-0.074,

G11=G01+G10=-0.148,

and

σG01,G10=0.076.

A +/- one standard deviation interval for the growth rate of the double knockout is then

2-0.148 -0.076,2-0.148 +0.076=0.86,0.95.

Notice that it is not unlikely that epistasis will cancel the effect of the second mutation, so that the growth rate of the double knockout mutant is greater than 0.95, that is, greater than the growth rate of either of the single knockout mutants.

Let us now consider two gene knockouts with stronger effects, each of which reduces the growth rate from 1.0 to 0.60. Then

G01=G10=log20.60=-0.737,

about 10 times as large as the log growth of the single mutants in the previous example. The Fisher's model would predict a σ(G, G) at least 10 times larger than in the previous example (σ(G, G)≥0.76), and an interval of likely growth rates for the double knockout mutants at least as large as

2- 1.47-0.76,2-1.47+0.76=0.213,0.610.

Notice how, once again, it is not unlikely that owing to genetic interactions, the growth rate of the double knockout mutant is greater than 0.60, the growth rate of either of the two single knockout mutants. The analytic model derived from the experimental data leads to a strikingly different conclusion:

σG01,G10=0.241,

and the +/- one standard deviation interval for the growth rate of the double knockout becomes

2-1.47-0.241,2-1.47+0.241=0.305,0.425.

In this case, a deviation from the null model that is greater than three standard deviations would be needed for the double knockout mutant to have a growth rate greater than that of the single knockout (0.60), making the event extremely unlikely.

Epistasis constrains the evolutionary dynamics

The previous section provided two examples of reciprocal sign epistasis, realized when two deleterious mutations produce a double mutant that is fitter than either of the two single mutants (Figure 4a). In those cases, a fitness valley limits the evolutionary accessibility of the fitter double mutant, and only on longer time scales may the simultaneous appearance of two mutations [20, 21] drive a population to the new local fitness maximum. In this context, the scaling behavior of epistasis is of great importance, because it determines the number and the topology of the evolutionarily accessible paths [2, 22, 23], ultimately affecting the possible outcomes of the evolutionary process.

In order to describe how epistasis shapes the naturally occurring fitness landscapes, let us consider S(G, G), the probability to observe sign epistasis when combining two mutations with similar growth rate effects, G. Here, S(G, G) depends on the typical interaction strength,

σG,G=√varG,G.

In particular, if σ(G, G) is proportional to G, then the probability of observing sign epistasis is independent of G. The Fisher's model implies a super-linear dependence of σ(G, G) on G, thus predicting a greater probability of observing sign epistasis among mutations with strong effects. Instead, if the scaling of σ(G, G) is proportional to √G (Figure 2), then sign epistasis is more likely to occur among mutations with small effects (Figure 3b). When the relative growth rate effects of the single knockouts are small (<2 to 3%), experimental uncertainty prevents us from pinpointing which pairs of genes are epistatic. This does not mean, however, that mutations with small effects do not interact. Assuming that the scaling of epistasis we measured directly for mutations with intermediate and large effects extends to mutations with small effects, a consequence of the observed scaling of epistasis is the roughening of the local fitness landscape in the proximity of an evolutionary optimum; when the fitness effects of available mutations become small [24], epistatic interactions become increasingly relevant [25, 26], reducing the accessibility of evolutionary paths and further slowing down the rate of adaptation [27, 28]. The evolutionary dynamics on correlated fitness landscapes [10, 29] with the realistic correlations described here certainly deserves further experimental and theoretical investigation.

The scaling of genetic interactions may be generic

To date, our analysis has been limited to interactions between entire gene knockouts. Although mutations with extreme effects on gene regulation and horizontal gene transfer are biologically relevant mechanisms for the removal or acquisition of whole genes at once, organisms explore possible genetic variants largely through the accumulation of single point mutations. The Costanzo et al. data et contains thousands of double mutants for which the first mutation is a gene knockout and the second mutation consists of one or more point mutations in a different gene, causing the gene product to misfold in a temperature-sensitive way. Although the distribution of growth rate effects for point mutations is different than for single gene knockouts (see Additional file 1, Figure S2), the statistics of genetic interactions are remarkably similar when combining two single knockouts and when combining a single knockout with a point mutation (Figure 5). A similar scaling is also seen for the epistatic interactions between single gene knockouts and decreased abundance by mRNA perturbation [30] (DamP) perturbations of a second gene (see Additional file 1 Figure S5). The analysis of these hybrid double mutants suggests that the statistics of the interactions between any two genetic perturbations are determined only by their growth rate effects [31], and not by their biological origin in terms of point mutations or gene knockouts.

Figure 5

Point mutations have similar epistatic interactions to those of entire gene knockouts. (a) Comparison between the variance observed in double gene knockout mutants (rainbow dots, same as in Figure 2a) and the variance observed in mixed double mutants generated by combining a gene knockout with point mutations in a different gene (black dots). (b) The red curve is the diagonal slice of the plot in (a) (G01 = G10 within 20%, G = mean(G01, G10)), and the red shaded area is the 25 to 75% confidence interval for the mixed double mutant variance. For comparison, the blue curves describe the variance for double gene knockouts as in Figure 2c. As in Figure 2c, the red line has equation var(G, G) = 0.079 G.

A comparison between different definitions of epistasis

Importantly, any quantitative result on epistasis is a consequence of how epistasis is defined. Of particular interest is how strong an epistatic interaction is deemed to be, based upon its ranking when compared with that of other pairs of mutations. Although the 'traditional' definition

e=g11/g00-g01/g00g10/g00

and the 'geometric' definition

E=G11-G01+G10

agree about the sets of positive and negative interactions, they assign different strengths and, more importantly, different rankings to the same pair of interacting mutations. As an example, if the Costanzo et al. dataset is analyzed using the 'traditional' definition of genetic interactions, then the linear dependence of var(G, G) on G in Figure 2c is replaced by an oddly non-monotonic dependence, displaying weaker interactions for pairs of genes with either very small or very large fitness effects (Figure 6a). As mentioned previously, this decrease in the inferred strength of epistatic interactions for very deleterious mutations is a mathematical consequence of the traditional definition of epistasis, rather than a property of genetic interactions. The same bias would lead us to conclude that genes with strong effects on growth are almost non-interacting (Figure 6b, red line). However, because previous studies have determined that essential genes partake in more interactions than do non-essential genes [32], it is also reasonable to expect that non-lethal genes with large growth effects are involved in more interactions than genes with small growth effects. Indeed, according to the 'geometric' definition of epistasis, the fraction of genes with which a gene interacts steadily increases with the growth rate effect of the gene (Figure 6b, blue line). By contrast, the traditional definition of epistasis, consistently assigns low rankings to interactions between genes with large growth rate defects, as confirmed by a further analysis comparing the two definitions of epistasis against interactions inferred from the Gene Ontology (GO) database [33] (see Additional file 1, Figure S6). According to the geometric definition of epistasis, genetic networks [34] are denser than expected not only among essential gene [32], but also among genes with large growth effects.

Figure 6

Comparison between the traditional and the geometric definitions of epistasis (eandE, respectively). (a) Figure equivalent to Figure 2c, using the traditional definition of epistasis. (b) The fraction of genes interacting with a specific gene is a function of the growth rate effect of such gene. Only the 10,000 most interacting pairs the geometric definition (blue) and the traditional definition (red) are considered to be interactions.

Finally, it is important to emphasize that the traditional definition of epistasis remains slightly more successful at discovering the functional relations between genes, as cataloged in the GO database (see Additional file 1, Figure S6). Part of the reason for this could be that some of those functional characterizations were suggested by the traditional definition of epistasis in the first place. It is certainly true, however, that many of the top-ranking interactions according to the geometric definition of epistasis involve single and double mutants with small growth rates; for those mutants, experimental noise is relatively large, and this may cause a few weakly interacting pairs to be incorrectly ranked as strongly interacting. It is likely that the experimental protocols could be easily adjusted to reduce the relative uncertainty on the growth rate of especially slow-growing mutants to avoid this issue (for example, by allowing for a much longer time for growth or by measuring the growth rates of additional replicates).

Conclusions

We analyzed the growth rates of about five million double mutants in the dataset associated with the work by Costanzo et al. We characterized how the strength of genetic interactions depends on the growth effects of the mutations being combined, and found a weaker dependence than that predicted by current theoretical models. Although the results were obtained mainly from entire gene knockouts, there is some evidence that the observed scaling might extend to the interactions between single point mutations. The scaling of epistasis might or might not be generic [35, 36]; important drivers could be the harshness of the environment [37], details about the evolutionary history [38–40], sexual versus asexual reproduction [41] and, perhaps most importantly, metabolic [42–45] and genetic complexity [46, 47]. In general, the experimentally observed scaling suggests a previously unexplored class of correlated fitness landscapes with tunable roughness, in which epistasis depends explicitly on the effects of the mutations being combined.

A clear limitation of our discussion is that only pair interactions were considered. Although high-throughput experiments will provide data on higher-order interactions, a solid understanding of pair interactions remains necessary before addressing n-mutation interactions. A genuine three-mutation interaction, for instance, should be defined as the unexplained deviation from what can be computed by combining the effects of all relevant mutations and their pair interactions [10, 48], perhaps using linear fits within the additive null model for log growth rates.

The results we present here were based on a geometric definition of epistasis. We compared this definition with a more standard definition, highlighting the desirable mathematical properties of the geometric definition and the simple phenomenological relations it produces.

In conclusion, although each epistatic interaction between specific genes depends on biological details and remains largely unpredictable from first principles, we have shown that the statistical properties of an ensemble of interactions can display conspicuous regularities, and can be described by simple mathematical laws.

Materials and methods

The Costanzo et al. dataset is publicly available [49]. The file sgadata_costanzo2009_rawdata_101120.txt.gz was downloaded on August 17, 2010 and analyzed with Mathematica (code available at the Gore laboratory website [50]). We restricted our analysis to double knockout mutants whose growth rates were positive numerical values and for which the growth rates of both single mutants were numerical values (see Additional file 1, Figure S1). Some genes appear in the dataset both as query and array genes; care was taken to avoid double counting.

The exponentially growing intervals used for the binning of the log growth rate effects were defined as [-2n, -2n-1] for an appropriate range of integer n's. Owing to the rarity of extremely deleterious mutations, bins for positive n's contained only a few data points, while bins with large negative n's were extremely small. In the figures we reported only bins for n = -7 to 0, containing log growth rate effects ranging from -20 = -1 to -2-8 = -0.0039 or, alternatively, relative growth rate effects ranging from 2-1 = 0.5 to 2-0.0039 = 0.997. Different choices for the binning sizes and positions did not significantly alter the results of the analysis.

In order to quantify the contribution of experimental uncertainty to epistasis, we generated nine randomized mock datasets. The mean level of noise-generated epistasis in these nine datasets is reported in Figure 4 (dashed lines), and we provide an extensive discussion of the choice of Student's t-distributions to generate the mock datasets from the original dataset (see Additional file 1, Supplementary text 3).

The GO database go_201207-assocdb-tables.tar.gz was downloaded from the GO site [51] on July 19, 2012. The MySQL database was queried with Python and analyzed Mathematica (code available upon request).

Conflict of interest

The authors declare that they have no conflict of interest.

Abbreviations

DamP:

Decreased abundance by mRNA perturbation

GO:

Gene Ontology.

Declarations

Acknowledgements

We are grateful to Mingjie Dai for collaboration during the early stages of the study. We thank Kirill Korolev, Pankaj Mehta, and the members of the Gore laboratory for providing comments and advice on the manuscript. This research was funded by an NIH Pathways to Independence Award, NSF CAREER Award, Pew Biomedical Scholars Program, and Alfred P. Sloan Foundation Fellowship.

Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.