Abstract

Comprehensive protein interaction mapping projects are underway for many model species and humans. A key step in these projects is estimating the time, cost, and personnel required for obtaining an accurate and complete map. Here, we model the cost of interaction map completion across a spectrum of experimental designs. We show that current efforts may require up to 20 independent tests covering each protein pair to approach completion. We explore designs for reducing this cost substantially, including prioritization of protein pairs, probability thresholding, and interaction prediction. The best designs lower cost by four-fold overall and >100-fold in early stages of mapping. We demonstrate the best strategy in an ongoing project in Drosophila, in which we map 450 high-confidence interactions using 47 microtiter plates, versus thousands of plates expected using current designs. This study provides a framework for assessing the feasibility of interaction mapping projects and for future efforts to increase their efficiency.

Analysis of molecular networks has exploded in recent years. A wide variety of technologies have been introduced for mapping networks of gene and protein interactions, including yeast two-hybrid assays1–8, affinity purification coupled to mass spectrometry9–11, chromatin immunoprecipitation measurements of transcriptional binding12–14, synthetic-lethal and suppressor networks15,16, expression QTLs17–20, and many others. Using these technologies, network mapping projects are currently underway for many model species2–4,7–13,15, microbial21–23 and viral24,25 pathogens, and humans5,6. As an illustration of how pervasive networks have become, the U.S. National Institutes of Health currently funds 3076 active research grants covering the topic “protein-protein interactions” with 794 of these implementing the technique of “yeast two hybrid system”26.

Mapping a complete gene or protein network evokes similar challenges and considerations as mapping a complete genome sequence. In the case of the human and model genome projects, large-scale sequencing efforts were accompanied by a series of feasibility studies27,28 which used mathematical formulations and pilot projects to explore strategies for genome assembly and the work required for each. In the case of interaction networks, similar feasibility studies are just beginning29–31. Some of the questions to be addressed are: What is the cost of completing an interactome map and what is the best strategy for minimizing that cost? Given practical cost constraints, how can the quality and coverage of interaction data be maximized? How many independent assay types are needed? How should direct pairwise tests for interaction be combined with pooled screening? What is the effect of the test sensitivity on the final cost? How should interaction predictions be incorporated, and what is their effect on the mapping cost? Which specific improvements to experimental and computational methods are likely to have the largest effect?

To approach these questions, we modeled several standard and alternative strategies for using pairwise protein interaction experiments to determine the interactome of the fruit fly Drosophila melanogaster. Our analysis shows that completing the interactome using sequential pairwise or pooled screening is probably too costly to be practical. However, this cost can be reduced substantially using a strategy that combines pooling with prioritized testing and interaction prediction. We carry out several iterations of this strategy to efficiently map 450 new high confidence interactions in Drosophila.

RESULTS

Interactome mapping—problem definition

In contrast to a genome, the interactome has been more difficult to define. Some authors have argued32 that the “True Interactome” should be defined as all possible interactions encoded by a genome— i.e., the set of all pairwise protein interactions that occur in at least one biological condition or cell type. The assumption is that every true interaction will be detectable by some assay, and that given enough independent measurements, most of the interactome could be detected. Many assay types have been described for detecting protein-protein interactions, a few of which have been adapted to large-scale screening1,32–34. On the other hand, some interactions may be immeasurable by any available assay, or will not arise in the conditions surveyed. Therefore, we use the term “Mappable Interactome” for the subset of true pairwise interactions that are reproducibly detectable by any given assay method or combination of methods.

To define appropriate criteria for determining when an interactome map is “complete”, we distinguish between the terms saturation and coverage. Saturation measures the percentage of true interactions that have been experimentally observed at least once. Coverage we define more strictly to mean the percentage of true interactions that have been experimentally validated with high confidence such that the percentage of false interactions (i.e., the False Discovery Rate or FDR) is kept below a predetermined threshold. We treat “completion” as achieving 95% coverage of the Mappable Interactome at 5% FDR, which requires that the map include at least 95% of all true interactions with no more than 5% of the reported interactions being false.

A model of interactome coverage

We simulated a series of mapping strategies implementing a variety of basic and sophisticated features (Fig. 1; Flowcharts of each strategy are provided in Supplementary Fig. 1). All strategies were evaluated using a statistical model based on naïve Bayes to estimate saturation and coverage of the Drosophila interactome as a function of the number of interaction tests. We programmed this model with the assumption that the fly interactome contains approximately 105 interactions overall, along with estimates for the false positive rate (FPR—the probability that a non-interacting protein pair is reported as interacting) and the false negative rate (FNR—the probability that an interacting pair is reported as non-interacting). Although the magnitudes of these errors are still under debate, previous studies2,5,29,35,36 have reported Y2H error rates of FPR < 1% with FNR in the range 50–80% (note that several of these studies erroneously refer to FDR as FPR). Here, we used 0.2% FPR and 66% FNR.

Due to the high FNR of a particular assay, it becomes clear that multiple assay types will likely be needed to achieve complete coverage, and that these assays should be independent or at least only partially dependent. Although the precise correlations between different assay types have not been well studied, complementarity between assays has been widely assumed and occasionally demonstrated: For instance, protein interactions have been shown to be of substantially higher confidence if they are detected in different orientations (bait-prey vs. prey-bait)2; in different Y2H screens3,8,35; by different types of Y2H system such as LexA-based vs. Gal4-based36; or by both Y2H and co-affinity purification29.

Basic mapping strategies in current use

We first simulated a “Basic serial” strategy, in which all pairs of proteins are tested for interaction sequentially. Under this formulation, achieving a saturation of 95% required eight comprehensive screens, in which each protein pair was tested by eight independent assays, equivalent to ~7×108 pairwise tests assuming a total of 13,600 protein-encoding genes in fly37 (Table 1 and Fig. 2a). Moreover, 93% of all observed interactions in this network were false positives (including 99% of interactions observed exactly once and 21% of interactions observed twice—Fig. 2b). The false-positives predominate because, although the 0.2% FPR seems low, the number of non-interacting protein pairs is far in excess of the number of true interactions.

To ensure an overall FDR < 5%, we found that every interaction must be reported by at least three independent assays. After eight screens 55% of the interactome was covered under these conditions. The coverage goal of 95% was achieved only after 21 comprehensive pair-wise screens (Fig. 2c). This overall outcome—that the number of experiments required to reach full coverage is two to three times that required to reach saturation—was observed over a range of error parameters (Supplementary Table 1). Clearly, completing the interactome map under these conditions is impractical, as it would require testing 92 million protein pairs 21 separate times.

To reduce the number of tests, assays such as Y2H typically use pooled screens in which a single protein “bait” is tested for interaction against pools of protein “preys” (phase I)38. For pools that test positive, pairwise tests are immediately conducted between the bait and each prey in the pool (phase II—this second phase can also be conducted by sequencing3,5). The benefit of pooling is that large numbers of potential interactions can be sampled at relatively low cost. This comes at the expense of FNR, as the chance a true interaction is missed in the pool is higher than the chance it would be missed by direct pairwise tests38. Through simulation, we found that basic two-phase pooling (Pooling strategy) does indeed achieve a four- to five-fold reduction in coverage cost over pairwise testing (~4 million plates for Pooling compared to ~20 million plates for Basic-serial, Table 1). However, assuming the rate of interaction screening of a typical laboratory (e.g., ~2400 plate-matings per person per year), pooled screens would still require approximately 1700 person-years to achieve completion of the Drosophila protein network.

Advanced mapping strategies

We next considered an interaction mapping strategy that, rather than treat all protein pairs equally, maintains a rank-ordered list of pairs according to their probabilities of interaction (Thresholding strategy, Table 1). All probabilities start at the background frequency of interaction for random protein pairs (as for Basic-serial and Pooling). Protein interactions are initially tested using pooled screening, and after each two-phase pooled experiment the probabilities increase for interactions that are observed and decrease for interactions that are tested but not observed. Unlike previous strategies, however, protein pairs with probability greater than an upper threshold (i.e., 95%) are declared to be definite “interactors” and are removed from subsequent testing (Fig. 1b). Likewise, interactions with probability beneath a lower threshold are declared to be “non-interactors” and are also removed from further consideration. The motivation for thresholding is to more quickly exclude the overwhelming number of non-interacting protein pairs. Finally, candidate interactions are defined as those with probabilities between the upper threshold and background. When candidates are available they are always tested immediately using pairwise assays, before resorting to pooling, until their probabilities are pushed above the upper threshold or below background. The motivation for prioritizing candidate interactions is to more quickly cover the interactions likely to be positive. Overall, Thresholding resulted in more than a two-fold cost reduction compared to Pooling (Table 1 and Fig. 3a).

Fly and Human Interactome coverage costs for different experimental strategies

Lastly, we considered whether computational prediction of interactions, based on prior knowledge and data, could hasten the time to interactome completion. A variety of prediction methods have been proposed based on evolutionary conservation39–41—i.e., transfer of interaction measurements from one species to another—or based on integrating conservation with additional features such as co-expression and co-annotation42–47. Such predictions impact the experimental design by setting the prior probabilities of interaction for each protein pair in lieu of background probabilities. In the Prediction strategy, we explored the effect of setting these prior probabilities using theoretical prediction methods simulated over a range of predetermined prediction accuracies (a range of different values for FPR, FNR, and corresponding FDR of the predictions). We found that even predictors with very high FDRs could have a major impact on the mapping cost (Table 1 and Fig. 3b). For example, a predictor with 92.2% FDR gave a four-fold reduction in cost over Pooling, with a >50-fold reduction in cost to achieve 50% coverage and a savings of hundreds of fold in the early stages of mapping. Moreover, the 92.2% FDR means that even a predictor that makes 12 false predictions for every true one can lead to a major reduction in the cost of interactome completion. The best prediction method required approximately 385 person-years to achieve 95% coverage of the Drosophila protein network and 12 person-years to achieve 50% coverage. Thus, while obtaining full coverage of an interactome map may still be some years away, a draft scaffold providing half coverage might be feasibly achieved by a team of ~12 technicians working over a period of one year.

From theory to practice: An experimental proof-of-concept

Given the high performance of the Prediction strategy in simulations, we explored an experimental implementation in which Drosophila protein interactions were predicted using the cross-species method of Sharan et al.39 (Fig. 4a). According to this method, existing protein interaction networks in yeast, worm, and fly are aligned based on sequence similarity to identify conserved interaction clusters, and these alignments are used to transfer interactions that have been observed in some species but not yet in others (Fig. 4b). A total of 1,294 interactions were predicted using this method, each of which was prioritized as a candidate with high prior probability (92.4%) based on the FDR reported by Sharan et al.39 (7.6%).

Design and implementation of the Prediction strategy for mapping the Interactome

Since this prior was much greater than the background probability of other protein pairs (0.1%), we began by using the pairwise Lex-A Y2H assay48 to test all 606 predictions for which sequence-verified clones were available. Of these, 136 tested positive and 470 negative. After each 96-well plate of tests (seven plates total), the interaction probabilities were updated resulting in an increase to >99.9% for pairs testing positive and a decrease to 90.5% for pairs testing negative. Since the 136 positives now had probability greater than the upper threshold (95%), all of these could be added to the interactome map and removed from further testing.

Although the remaining 470 predictions had tested negative once, their high probability (90.5%) still prioritized them as candidate interactions. Therefore, as dictated by the Prediction strategy these pairs were tested again immediately using a second assay type.

For this second assay, Lex-A Y2H was run in a “reverse” orientation in which the two proteins cloned as bait and prey, respectively, were exchanged as prey and bait. We tested 251 of the 470 predictions for which sequence-verified clones were available in the opposite orientation. This resulted in 35 positives, elevating these interactions to probability >99.9% and adding them to the map. The pairs that tested negative in the reverse orientation were downgraded to 88.1% probability. Overall, after performing Y2H in both forward and reverse orientations, 171 new interactions were identified out of 606 protein pairs for a success rate of 28%. Although we ended our experimentation at this point, the Prediction strategy could be continued by next testing the “double negatives” (pairs testing negative in both orientations of Lex-A Y2H) using a third type of assay such as Gal4-based Y2H.

A means of predicting additional protein interactions is to probabilistically integrate many different lines of evidence into a single classifier42–47. Along these lines, we applied a machine-learning-based classifier for predicting interactions that combined many relevant features including gene expression, domain-domain interactions, conserved protein-protein interactions, genetic interactions, and shared gene annotations (Supplementary Methods). We used this approach to generate 24,798 high confidence predictions. We randomly selected 2,047 of these for testing using forward-orientation Y2H and, as above, retested the negative pairs using reverse-orientation Y2H (for which clones were available). In total, this procedure added 279 new high-confidence interactions to the map for a 13.6% success rate. Combined over both conservation-based and multiple-evidence-based predictions, 450 new protein-protein interactions were added to the Drosophila map using 47 96-well plates (Fig. 3a,b). To establish the background rate of interaction, we also tested 2,354 randomly chosen pairs, 72 of which were positive yielding a 3% background rate (Fig. 4b). These results show that both types of prediction are highly enriched for true interactions. Note that even if all predicted interactions were true, the expected confirmation rate would be limited by the false negative rate of the Y2H assay, equal to 1–FNR =33% in our model.

Testing the conditional independence between assay types

An underlying assumption of our simulations is that different assay types are conditionally independent—i.e., given that a tested protein pair is known to be positive or negative, the result of one assay is uncorrelated with that of another. To examine the extent to which this assumption holds, we compared Y2H data for protein pairs tested in both forward and reverse orientations—the two assay types used in our study. Overall, we obtained Y2H tests in both orientations for 309 conservation-based predictions (including data reported above combined with additional tests; Supplementary Data). Of these, we observed 58 positives in the forward orientation and 50 positives in the reverse orientation, for an average positive rate of 17% [(58 + 50)/(309 * 2)]. Fifteen positives were found in both orientations, representing 4.9% of the tests. Assuming all predictions are true interactions, this percentage is very close to that predicted by conditional independence, for which 3.1% of tests are expected to be positive in both orientations [17% ^ 2]. If some predictions are not true as expected, the percentages come into even better agreement—e.g., a prediction FDR of 20% predicts that 4.8% positives would arise in both orientations. A similar analysis was performed on a set of 1,572 combined-evidence predictions that were tested in both orientations, leading to similar agreement with the conditional independence assumption.

DISCUSSION

The interactions predicted by cross-species conservation were at least as accurate as we had assumed in our simulations. On the other hand, their power to prioritize interactions is dependent on the network coverage in other species, and the long-term viability of this approach will depend on obtaining greater numbers of predictions than the 1,294 that are currently available. As interactome maps progress across an ever-widening array of species, these maps might be dynamically cross-compared to continually generate sufficient numbers of candidate interactions for testing. The second set of predictions, made by integrating various lines of evidence, had a lower success rate than the predictions based solely on conservation. Their potential utility is higher, however, since the number of available predictions is nearly 20 times that of the conservation-based predictions and could be increased further by including lower confidence predictions. Even with a lower success rate, the performance of the integrated classifier was superior to the best theoretical predictor we simulated.

Predictions lead to a lower interactome mapping cost for two reasons. First, predicted protein pairs are much more likely than arbitrary pairs to be true. Second, protein pairs with high prior probabilities do not require repeated positive measurements to confirm them as true interactions. Both effects underlie the finding that 450 new predicted interactions could be added to the interaction map using just 47 microtiter plates. In contrast, the Pooling strategy would require nearly 105 plates to add this number of interactions to the map.

One might intuitively object that, rather than test predicted interactions, a better strategy would focus on the “novel” areas of the interactome that have never before been suggested by any species or data set. The problem with such an approach is that it would very quickly produce an interactome map with a very high error rate. Conversely, the rationale behind the Thresholding and Prediction strategies is that one should first clean up the map by validating predicted interactions using real experiments, and only then resort to testing random protein pairs in pools.

A second objection might be that prioritizing candidate interactions requires the corresponding Y2H baits and preys to be rearrayed in microtiter plates in different orders over the course of an interaction mapping project. While the cost of rearraying was not included in our analysis, in our lab (Finley) these costs are greatly alleviated through robotic transfer systems. Certainly, failure to rearray leads to a ~4-fold increase in cost and a ~10-fold increase in the early stages of mapping (compare Pooling versus Prediction in Table 1).

Regardless, mapping the Interactome remains a daunting task. Our study makes it clear that achieving 95% coverage of an interactome requires many more screens than one pass through all pools or over all protein pairs. If complete coverage is to be obtained in the near future, it will be necessary to invoke better strategies for experimental design, technologies reporting fewer false negatives, or both. In terms of experimental design, we have shown that the cost of completion is reduced substantially by careful ordering of pooled screens. In terms of technology, our study underscores the importance of decreasing the FNR or of different assays that provide independent samples of a protein pair. Even if the error rates are lower than assumed here, advanced mapping strategies are still likely to be worthwhile (Suppl. Table 1). Here we have used two types of Y2H assay, forward and reverse orientations, to obtain multiple samples which appear largely independent. If the assays were partially dependent, multiple tests might still be worth the cost as long as they were not perfectly correlated (and the dependence could be handled quantitatively using a statistical model). In the present study, the conditional independence assumption leads to a “best-case scenario” or lower-bound on the number of interaction tests that will likely be required to achieve full coverage of an interactome. Further work will be needed to better characterize the relative dependencies among the wide range of other interaction assays that are currently available— if the current assays are highly dependent, then the required number of tests will be greater than was estimated here.

METHODS

Simulation procedure

“True” reference interactomes for fly and human were generated by random sampling of interactions from the set of all possible pairs of proteins using the interaction probabilities in the String database46. Protein pairs not included in the String database were sampled using a low background probability, such that the total number of interactions in the sampled interactomes agreed with current estimates of interactome sizes30 (~100,000 fly interactions and ~260,000 human interactions). The detectability of each protein pair was independently sampled for each new assay type (representing a new type of measurement technology or new bait/prey orientation) using a 66% FNR for true interactions and 0.2% FPR for false interactions (corresponding to 82% FDR). Once an interaction was defined as “Detectable/Undetectable”, direct pairwise experiments were assumed to be 100% reproducible for a given protein pair and assay. For pooled assays, each detectable interaction in the sample space of a pool was assumed to be observed in the pool with probability equal to the pooling sensitivity (Table 1). Pools with at least one observed interaction were declared positive. For each strategy, after every 1000 experiments the mapped interactomes were compared to the “true” interactomes and the coverage and FDR were recorded.

Yeast two-hybrid test of predicted interactions

We used the LexA-based yeast two-hybrid mating assay48 using sequence-verified clones as previously described36 (Supplementary Methods). All new protein interactions have been submitted to the IMEx consortium (http://imex.sf.net) through IntAct49 and assigned the identifier IM-9552. The data are also available at DroID (www.droidb.org).

Additional Methods

Detailed descriptions of the interaction probability model, the combined-evidence method for interaction prediction, the computation of thresholds, and the yeast two-hybrid test protocol appear in the Supplementary Methods.

Acknowledgments

We thank S. Bandyopadhyay for critical reading of the manuscript and I. Bronner, K. Gulyas, B. Mangiola, and H. Zhang for expert technical assistance with the two-hybrid assays. We thank R. Karp and R. Sharan for discussions on earlier versions of this work. This work was supported by National Institutes of Health grants RR018627, GM070743, and HG001536.

Footnotes

AOP

Different experimental designs for protein interaction mapping were modeled to compare their efficiency in completing an interactome map. The strategy that minimized the final cost was tested in an ongoing Drosophila melanogastor interactome project where it found 450 high-confidence interactions using only 47 microtiter plates.

ISSUE

Different experimental designs for protein interaction mapping were modeled to compare their efficiency in completing an interactome map. The strategy that minimized the final cost was tested in an ongoing Drosophila melanogastor interactome project where it found 450 high-confidence interactions using only 47 microtiter plates.