Abstract

Efforts to map the Escherichia coli interactome have identified several hundred macromolecular complexes, but direct binary protein-protein interactions (PPIs) have not been surveyed on a large scale. Here we performed yeast two-hybrid screens of 3,305 baits against 3,606 preys (∼70% of the E. coli proteome) in duplicate to generate a map of 2,234 interactions, which approximately doubles the number of known binary PPIs in E. coli. Integration of binary PPI and genetic-interaction data revealed functional dependencies among components involved in cellular processes, including envelope integrity, flagellum assembly and protein quality control. Many of the binary interactions that we could map in multiprotein complexes were informative regarding internal topology of complexes and indicated that interactions in complexes are substantially more conserved than those interactions connecting different complexes. This resource will be useful for inferring bacterial gene function and provides a draft reference of the basic physical wiring network of this evolutionarily important model microbe.

Quality assessment and comparison of Y2H interactions with data from the literature

(a) Immunoblots of 6 of the 114 randomly selected PPIs identified by Y2H assay that were tested by co-immunoprecipitation. The interacting pairs and negative controls (vector) were co-transformed into E. coli and protein binding was detected by immunoblotting (Online Methods). (b) The same 114 randomly selected interacting protein pairs were tested by LUMIER assay . Interactions were scored as positive when exhibiting a luminescence intensity ratio (LIR) value > 3 and a p-value <0.05. These two values are plotted against each other in the graph in logarithmic scale. The LIR and p-value thresholds are indicated by dashed lines: (log (LIR) >0.477; log (p) <−1.30). (c) Number of PPIs validated by Co-IP, LUMIER or both assays. (d) Overlap between interactions detected in this large-scale Y2H (pink) and in the literature. The manually curated literature-binary PPIs (purple) are compiled from the microbial protein interaction database (MPIDB; http://jcvi.org/mpidb/about.php) which consists of 1,941 manually curated binary PPIs. The combined-AP-MS PPIs (beige) were predicted from large scale AP-MS studies– and consists of 20,425 PPIs. (e) Pipeline used to detect direct interactions within co-complexes by Y2H matrix screening.

Nodes (representing proteins) are colored according to the availability of structural data. Edges (representing PPIs) are colored according to their source (literature, our Y2H experiment or both). The sub-networks (enlarged parts) are based on a previous AP-MS study and for which Y2H and literature binary interactions provided the topology for a sub-complex of at least three components. Protein structural data is overlaid. For example, protein complex 42 (bottom left) contains seven proteins; the Y2H binary interactions suggest a topology for five subunits in the complex, and all the subunits have either a complete structure or a model. Furthermore, for two interactions between three different components we have structures of the binary sub-complexes. For complexes 72, 100, and 103 the Y2H binary interactions identified in our screens suggest an almost complete topology.

(a) The degree of a node in a network (degree distribution) involving essential E. coli protein pairs from the combined-binary (data from this study and literature binary interactions) and combined AP-MS networks, . (b) Frequency distribution involving protein pairs from the combined-binary and combined AP-MS PPI networks at different path lengths. For panels A and B, the data were sampled by considering the same number of interactions among the same number of nodes in the two datasets. (c) Distribution of the Pearson correlation coefficient (PCC) between the GI profiles for gene pairs encoding interacting proteins derived from combined-binary or combined-AP-MS network versus random gene pairs. (d) Distribution of co-expression and (e) condition-dependent phenotypic correlation profiles with corresponding interacting proteins, shown as in (c). The p-values for panels (c,d,e) were computed (i.e., AP-MS vs. random (blue) and combined-binary vs. random (brown)) using the Student's t- test. (f) Average semantic similarity of the combined-binary and combined-AP-MS PPI networks is shown for Gene Ontology (GO) categories.

Examples of sub-networks showing the physical and genetic connectivity among the components of various bioprocesses. Sub-networks containing Y2H physical interactions (shown as grey edges) among 1,269 proteins are derived using a Markov clustering approach. The GIs (red edges for negative interactions and green for positive interactions) from the published large scale eSGA surveys, , were overlaid on the PPI network. Sub-networks with positive and/or negative interactions are highlighted with shaded ovals: (a) secretion components (cluster ID: 5); (b) flagellum or motility components (cluster ID: 9); (c) subunits of ATP-dependent protease complexes (cluster ID: 14); and (d) ycfM and pbpG (cluster ID: 8). For details on GIs in each of the sub-networks and clusters IDs see . Large nodes in each sub-network indicated that genetic associations with the indicated protein subunits are known; small nodes indicate an absence of GIs.

Conservation of physical interactions between and within protein complexes

(a) Phylogenetic tree based on the comparison of complete proteomes of 20 different bacteria that are closely related to E. coli. Using bacterial proteins that had orthologs in E. coli, we determined the sets of interologs (that is, conserved interacting homologs, Ncons.PPI) in each organism. Normalizing the number of interologs by the corresponding number of orthologs in each organism (Northo), we observed a declining trend with increasing distance from E. coli in the phylogenetic tree. (b) Bacteria were sorted according to their numbers of predicted interologs and the data suggest that PPIs in E. coli were more conserved in evolutionarily closer organisms. However, we found the opposite trend when we considered the fraction of conserved interactions between essential E. coli proteins (left panel, red). We determined the number of interactions between proteins in other species that are in the same complex or in different complexes compared to protein complexes in E. coli (right panel) and found that interactions are more conserved in species closely related to E. coli, but interactions within complexes are more often conserved than interactions between complexes.