Figure 2.

Validation of interactions within interaction datasets. (a) The fraction of interactions in each dataset supported by multiple validations (that
is, different publications or types of experimental evidence). (b) The fraction of interactions in each indicated dataset supported by more than one
publication or type of experimental evidence. (c) Better studied proteins or genes, as defined by the number of supporting publications
relative to node connectivity (designated bias, see Materials and methods), tend to
be more highly connected within the physical or genetic networks. (d) The study bias towards essential genes in each dataset. (e) The distribution of conserved proteins in interaction datasets. Frequency refers to
fraction of the dataset in each bin. Orthologous eukaryotic clusters for seven standard
species (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Saccharomyces cerevisiae, Schizosaccharomyces pombe, and Encephalitozoon cuniculi) were obtained from the COG database [96]. Sc refers to all budding yeast proteins
as a reference dataset; non-LC refers to all HTP interactions except those that overlap
with the LC datasets; X refers to yeast genes that were not assigned to any of the
COG clusters and contains yeast-specific genes in addition to genes that have orthologs
in only one of the other six species.