Using protein-protein interaction data that have only recently become available, we composed and analyzed interactome networks from 1,840 species across the tree of life, expanding the number of species from about 5 in previous studies to 1,840. This unique dataset allowed us to conduct the largest ever study of protein interactomes and quantify the resilience of interactomes--a critical property as the breakdown of proteins may lead to cell death or disease.

Our study reveals that evolution leads to more resilient interactomes, providing evidence for a longstanding hypothesis that interactomes evolve favoring robustness against protein failures. We show that a highly resilient interactome has an astonishingly beneficial impact on the organism to survive in complex, variable, and competitive habitats, a finding that draws attention to a previously unknown critical role of evolution in mediating the effects of the interactome on the ability of a species to thrive in specific habitats.

The interactome network of protein-protein interactions captures the structure of molecular machinery and gives rise to a bewildering degree of life complexity. We composed and analyzed interactome networks from 1,840 species across the tree of life, expanding the number of species from about 5 in previous studies to 1,840. This unique dataset allowed us to conduct the largest ever study of protein interactomes and quantify the resilience of interactomes—a critical property as the breakdown of proteins may lead to cell death or disease.

By studying interactomes from 1,840 species across the tree of life, we find that evolution leads to more resilient interactomes, providing evidence for a longstanding hypothesis that interactomes evolve favoring robustness against protein failures. We find that a highly resilient interactome has an astonishingly beneficial impact on the organism to survive in complex, variable, and competitive habitats. Our findings reveal how interactomes change through evolution and how these changes impact their response to environmental unpredictability.

We propose a deep multi-task learning approach for biomedical named entity recognition, which is a fundamental task in the mining of biomedical text data. The new approach saves human efforts and frees biomedical experts from the need to painstakingly generate entity features by hand. Furthermore, it achieves excellent performance using only a limited amount of training data.

The approach can help scientists to better exploit knowledge buried in vast biomedical literature. I have enjoyed working on this project with researchers from Stanford, USC, and UIUC.

This paper is intended for computer scientists and biomedical researchers who are curious about recent developments and applications of machine learning to biology and medicine and its potential for advancing biomedicine given the vast amounts of heterogeneous data being generated today.

BioSNAP aims to bring biological and medical datasets closer to computer scientists who develop new exciting algorithms. It is often very difficult for computer scientists who typically do not have any background in bioinformatics or biostatistics to obtain and construct high-quality biomedical datasets. Because of that, biomedical datasets are rarely used in ML algorithm development and benchmarking, even though biomedicine is one of the most exciting domains for ML with a unique set of challenges, hard important problems, and huge potential impact. BioSNAP aims to close this gap by providing a number of ready-to-use network datasets.

BioSNAP contains many large biomedical networks that are ready-to-use for method development, algorithm evaluation, benchmarking, and network science analyses. In this first release, BioSNAP has a few tens of network datasets that describe a dozen different entity types (e.g., genes, proteins, cells, drugs, diseases, side-effects, tissues). These datasets can be used for standard prediction tasks (node clustering, link prediction, node classification) as well as relatively new tasks (graph-level classification, multi-relational link prediction, higher-order association prediction). Many datasets contain weighted networks and can be used to define multi-layer/heterogeneous graphs with attributes.

I look forward to seeing more biomedical network data considered in machine learning and data science research.

Technical noise in experiments is unavoidable, but it introduces inaccuracies into the biological networks we infer from the data.

In this Nature Communications paper, we introduce a diffusion-based method for denoising undirected, weighted networks, and show that it improves the performances of downstream analyses, including prediction of gene functions, interpretation of noisy Hi-C contact maps, and fine-grained identification of species.

We just presented a tutorial on Deep Learning for Network Biology at ISMB 2018 in Chicago, IL, USA. If you are interested in these topics and would like to learn more about graph neural networks and/or their biomedical applications but could not attend the tutorial because it was sold out, check out our tutorial website. All materials, including slides, network tools, examples, and code bases are available for download from the tutorial website.

In this tutorial, we cover the key conceptual foundations of representation learning, from approaches relying on network propagation to very recent advancements in deep representation learning for networks. In addition to a broad high-level overview, we spend a considerable amount of time describing the algorithmic and implementation aspects of recent advancements in deep representation learning and discussing many biomedical applications.

In this review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. We also discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

Community detection allows one to decompose a network into its building blocks. While communities can be identified with a variety of methods, their relative importance cannot be easily derived.

In this Nature Communications paper, we introduce an algorithm to identify modules which are most promising for further analysis. Our method allows for more efficient evaluation of hypotheses brought forward by the analysis of complex networks and thus speeding-up scientific discovery process in experimental network sciences.

Many patients take multiple drugs at the same time to treat complex diseases, such as heart failure, or co-occurring diseases, such as diabetes and epilepsy. The use of combinations of drugs is a common practice. In fact, 25 percent of people ages 65 to 69 take at least five prescription drugs to treat chronic conditions, a figure that jumps to nearly 46 percent for those between 70 and 79.

However, a major consequence of drug combinations for a patient is a much higher risk of side effects. These side effects emerge because of drug-drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. These side effects are extremely difficult to identify manually because there are combinatorically many ways in which a given combination of drugs clinically manifests and each combination is valid in only a certain subset of patients. It is also practically impossible to test all possible pairs of drugs and observe side effects in relatively small clinical testing.

In our latest research published in Bioinformatics, we develop an approach for computational screening of drug combinations. The approach predicts what side effects a patient might experience when taking multiple drugs simultaneously.

Technically, this work defines a novel approach that blends deep learning for graphs with network science to achieve benefits from each. See the paper and project website for details!

With this research topic, we aim to provide a broad coverage of single-cell data analytic studies.

We encourage contributions in the form of original research articles, short communications, reviews, and perspectives, addressing the major needs and challenges in the single-cell data analytics including (but not limited to): statistical models, algorithms, and software packages to analyze single-cell data; visualization tools for interpreting single-cell data; methods to relate single-cell data with disease classification and prognosis; methods and tools to discover spatial/temporal organization of tissues at a single-cell level; models of cell-cell communication; scalable mathematical and computer-science approaches for analysis of mega-scale single-cell data; methods for combining mixed platform data, noise filtering, and robust normalization.

Networks are ubiquitous in biology where they encode connectivity patterns at all scales of organization, from molecular to the biome. This tutorial investigates key advancements in representation learning for networks over the last few years, with an emphasis on fundamentally new opportunities in network biology enabled by these advancements.

We describe a general graph convolutional neural network approach for multirelational link prediction in heterogeneous graphs. In computational pharmacology, this approach creates, for the first time, an opportunity to use large molecular, pharmacological, and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal studies.

The lecture introduces biological networks and their analysis to the CS and engineering students. It describes statistical enrichment tests and several important prediction problems in biology, such as disease pathway detection and gene function prediction. It also explains some of the most successful methods for solving these problems.

I am especially happy to see how my machine learning and computational biology methods can help discover new biology! We used my recent methods for data fusion and gene network inference to generate predictions, which we then validated in the wet laboratory. Using these novel algorithms, we integrated all data and created a comprehensive NUDIX enzyme profile map. This map reveals novel insights into substrate selectivity and biological functions of NUDIX hydrolases and poses a platform for expanding the use of NUDIX as biomarkers and potential novel cancer drug targets.

Discovering disease pathways, which can be defined as sets of proteins associated with a given disease, is an important problem that has the potential to provide clinically actionable insights for disease diagnosis, prognosis, and treatment. Computational methods aid the discovery by relying on protein-protein interaction (PPI) networks. They start with a few known disease-associated proteins and aim to find the rest of the pathway by exploring the PPI network around the known disease proteins.

However, the success of such methods has been limited, and failure cases have not been well understood. In the paper we study the PPI network structure of disease pathways. We find that pathways do not correspond to single well-connected components in the PPI network. These results counter one of the most frequently used assumptions in network medicine, which posits that disease pathways are likely to correspond to highly interconnected groups of proteins. Instead, we show that proteins associated with a single disease tend to form many separate connected components/regions in the network.

Furthermore, we show that state-of-the-art disease pathway discovery methods perform especially poorly on diseases with disconnected pathways. These results suggest that integration of disconnected regions of disease proteins into a broader disease pathway will be crucial for a holistic understanding of disease mechanisms.

In addition to new insights into the PPI network connectivity of disease proteins, our analysis leads to important implications for future disease protein discovery that can be summarized as:

We move away from modeling disease pathways as highly interlinked regions in the PPI network to modeling them as loosely interlinked and multi-regional objects with two or more regions distributed throughout the PPI network.

Understanding functions of proteins in specific human tissues is essential for insights into disease diagnostics and therapeutics, yet surprisingly little is known about protein functions in different biological contexts, and prediction of tissue-specific function remains a critical challenge in biomedicine.

OhmNet predicts tissue-specific protein functions by representing tissue organization with a rich multiscale tissue hierarchy and by modeling proteins through neural embedding-based representation of a multi-layer network. For the first time, we can systematically pinpoint tissue-specific functions of proteins across more than 100 human tissues. OhmNet accurately predicts protein functions, and also generates actionable hypotheses about protein actions specific to a given biological context.