Complexes of physically interacting proteins constitute fundamental functional units that drive almost all biological processes within cells. A faithful reconstruction of the entire set of protein complexes (the "complexosome") is therefore important not only to understand the composition of complexes but also the higher level functional organization within cells. Advances over the last several years, particularly through the use of high-throughput proteomics techniques, have made it possible to map substantial fractions of protein interactions (the "interactomes") from model organisms including Arabidopsis thaliana (a flowering plant), Caenorhabditis elegans (a nematode), Drosophila melanogaster (fruit fly), and Saccharomyces cerevisiae (budding yeast). These interaction datasets have enabled systematic inquiry into the identification and study of protein complexes from organisms. Computational methods have played a significant role in this context, by contributing accurate, efficient, and exhaustive ways to analyze the enormous amounts of data. These methods have helped to compensate for some of the limitations in experimental datasets including the presence of biological and technical noise and the relative paucity of credible interactions.

In this book, we systematically walk through computational methods devised to date (approximately between 2000 and 2016) for identifying protein complexes from the network of protein interactions (the protein-protein interaction (PPI) network). We present a detailed taxonomy of these methods, and comprehensively evaluate them for protein complex identification across a variety of scenarios including the absence of many true interactions and the presence of false-positive interactions (noise) in PPI networks. Based on this evaluation, we highlight challenges faced by the methods, for instance in identifying sparse, sub-, or small complexes and in discerning overlapping complexes, and reveal how a combination of strategies is necessary to accurately reconstruct the entire complexosome.

About the Author(s)

Sriganesh Srihari, University of Queensland Institute for Molecular BioscienceSriganesh Srihari is a Senior Research Fellow with the Institute for Molecular Bioscience at The University of Queensland, Australia. He has a background in computer science (having received a Ph.D. in 2012 from National University of Singapore) and has worked extensively on graph (network) and combinatorial algorithms and in applying these to large omics datasets in biomedicine. He has devised systems-biology models to integrate "multiomics" datasets spanning genomics, RNAseq, and proteomics (protein-protein interaction) with clinical profiles to decipher molecular-clinical associations and identify new therapeutic targets in cancers. He has published in leading journals in the field including Bioinformatics, BMC Systems Biology, Biology Direct, Molecular Biosystems, and Nucleic Acids Research. He has closely collaborated with experimental biologists and has contributed to joint publications in Oncogene (Nature Publishing), Trends in Pharmacological Sciences (Cell Press), and Molecular Oncology. His postdoctoral work on cancer network models was highlighted in International Innovation (Healthcare issue, 2014), a Research Media periodical. His recent computational approach MutExSL (Biology Direct, 2015), co-authored with Limsoon Wong, for predicting synthetic-lethal targets by mining mutually exclusive genetic alterations in cancers was presented at the San Antonio Breast Cancer Symposium 2015 (San Antonio, Texas, USA), for which he won an American Association for Cancer Research (AACR) - Susan G.Komen for the Cure Scholar-in-training Award. He serves on the Editorial Board for the cancer bioinformatics theme of Scientific Reports, and is a Guest Editor for Methods. Srihari has recently moved to the South Australian Health and Medical Research Institute, Australia, as a Senior Research Scientist. He is also an Adjunct Senior Lecturer with the School of Computer Science, Engineering, and Mathematics at Flinders University, Australia.

Chern Han Yong, Duke - National University of Singapore Medical SchoolChern Han Yong is a Research Fellow in the Program in Cancer and Stem Cell Biology and the Centre for Computational Biology at the Duke-NUS Medical School, Singapore. He currently works on cancer genomics and epigenomics, and is particularly interested in the role of aberrant DNA methylation in carcinogenesis. He obtained his Ph.D. in computational biology from the National University of Singapore, where he researched the challenges of predicting protein complexes from high-throughput protein-protein interaction data. He obtained his M.Sc. in 2004 and B.Sc. in 2000 in computer science from the University of Texas at Austin, where he worked on neural networks, genetic algorithms, and the evolution of multi-agent cooperative behavior.

Limsoon Wong, National University of Singapore Limsoon Wong is the Kwan-Im-Thong-Hood-Cho-Temple Chair Professor in the Department of Computer Science and a professor in the Department of Pathology at the National University of Singapore. Before that, he was the Deputy Executive Director for Research at A*STAR's Institute for Infocomm Research. He currently works mostly on knowledge discovery technologies and their application to biomedicine. He has also done, especially in the earlier part of his career, significant research in database query language theory and finite model theory, as well as significant development work in broad-scale data integration systems. He is a Fellow of the ACM, inducted for his contributions to database theory and computational biology. Some of his awards include the 2003 FEER Asian Innovation Gold Award, for his work on treatment optimization of childhood leukemias, and the ICDT 2014 Test of Time Award, for his work on naturally embedded query languages. He serves/served on the editorial boards of Journal of Bioinformatics and Computational Biology, Bioinformatics, Biology Direct, Drug Discovery Today, IEEE/ACMTransactions on Computational Biology and Bioinformatics, Genomics Proteomics & Bioinformatics, Journal of Biomedical Semantics, Methods, Scientific Reports, Information Systems, and IEEE Transactions on Big Data. He is also an ACM Books Area Editor. He received his B.Sc. (Eng.) in 1988 from Imperial College London and his Ph.D. in 1994 from the University of Pennsylvania.