Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37232-8340, USA.

Abstract

Protein assembly and biological interpretation of the assembled protein lists are critical steps in shotgun proteomics data analysis. Although most biological functions arise from interactions among proteins, current protein assembly pipelines treat proteins as independent entities. Usually, only individual proteins with strong experimental evidence, that is, confident proteins, are reported, whereas many possible proteins of biological interest are eliminated. We have developed a clique-enrichment approach (CEA) to rescue eliminated proteins by incorporating the relationship among proteins as embedded in a protein interaction network. In several data sets tested, CEA increased protein identification by 8-23% with an estimated accuracy of 85%. Rescued proteins were supported by existing literature or transcriptome profiling studies at similar levels as confident proteins and at a significantly higher level than abandoned ones. Applying CEA on a breast cancer data set, rescued proteins coded by well-known breast cancer genes. In addition, CEA generated a network view of the proteins and helped show the modular organization of proteins that may underpin the molecular mechanisms of the disease.

Breast cancer specific sub-networks. Different sub-networks are shown in different colors and identified by IDs from ‘a' to ‘n'. Proteins shared by multiple sub-networks are colored in red. The most enriched Gene Ontology (GO) biological process annotations for each sub-network are labeled. The IDs accompanying the GO annotations match those of corresponding sub-networks. Triangle vertices represent the proteins rescued by the clique-enrichment approach (CEA). Vertex size represents different levels of publication support: the large size indicates support from breast cancer-related publications, the middle size indicates support from cancer-related publications, and the small size indicates no support from existing cancer-related publications.