Abstract

Motivation:
Inferring how humans respond to external cues such as drugs, chemicals, viruses or hormones is an essential question in biomedicine. Very often, however, this question cannot be addressed because it is not possible to perform experiments in humans. A reasonable alternative consists of generating responses in animal models and 'translating' those results to humans. The limitations of such translation, however, are far from clear, and systematic assessments of its actual potential are urgently needed. sbv IMPROVER (systems biology verification for Industrial Methodology for PROcess VErification in Research) was designed as a series of challenges to address translatability between humans and rodents. This collaborative crowd-sourcing initiative invited scientists from around the world to apply their own computational methodologies on a multilayer systems biology dataset composed of phosphoproteomics, transcriptomics and cytokine data derived from normal human and rat bronchial epithelial cells exposed in parallel to 52 different stimuli under identical conditions. Our aim was to understand the limits of species-to-species translatability at different levels of biological organization: signaling, transcriptional and release of secreted factors (such as cytokines). Participating teams submitted 49 different solutions across the sub-challenges, two-thirds of which were statistically significantly better than random. Additionally, similar computational methods were found to range widely in their performance within the same challenge, and no single method emerged as a clear winner across all sub-challenges. Finally, computational methods were able to effectively translate some specific stimuli and biological processes in the lung epithelial system, such as DNA synthesis, cytoskeleton and extracellular matrix, translation, immune/inflammation and growth factor/proliferation pathways, better than the expected response similarity between species.

Contact:
pmeyerr@us.ibm.com or Julia.Hoeng@pmi.com

Supplementary information:
Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.

5

Overview of the STC: (…

Fig. 1.

13

Overview of the STC: ( A ) Schematic of predictions to be made…

Fig. 1.

Overview of the STC: (A) Schematic of predictions to be made for each sub-challenge. Each sub-challenge required the prediction of the different sets of responses, indicated in red. (B) Schematic of SC4 to indicate utilization of a provided reference network with species-specific information from the training dataset to generate species-specific networks through the addition and removal of edges. Though cytokine measurements were made available to participants, they were not used in scoring, and for simplicity, were not included in this overview figure

Fig. 2.

5

Scores and computational methods used…

Fig. 2.

13

Scores and computational methods used for solving the STC. The null hypothesis simulation…

Fig. 2.

Scores and computational methods used for solving the STC. The null hypothesis simulation was used to compute and plot team Z-scores of AUPR curve, balance accuracy (BAC) and PCC for SC1 (A), SC2 (B) and SC3 (C). Z-scores are used to compare the apparent difficulty of each of the sub-challenges. Panels (C–G) reflect actual performance differences—as measured by overall rank of three metrics—for different methodological approaches. Teams’ rank distributions are plotted separately by the type of approach for SC1 (D), SC2 (E) and SC3 (F). (G) In SC2, teams’ rank distribution is separated by usage of solely protein phosphorylation data or in combination with gene expression data. SVM: support vector machines, Trees: random forest and other tree-based methods, NN: neural networks, GA: genetic algorithm

Fig. 3.

5

Predictability versus species similarity for…

Fig. 3.

13

Predictability versus species similarity for stimuli. ( A ) The y-axis indicates for…

Fig. 3.

Predictability versus species similarity for stimuli. (A) The y-axis indicates for each stimulus the mean predictability Prs of all team predictions when considering gene set activation in SC3. The x-axis is species similarity Ss of gene set activation. In red are stimuli where Prs > Ss > 0. (B) The y-axis indicates for each stimulus the mean predictability Prs of all team predictions when considering protein phosphorylation activation in SC2. The x-axis is Sp of phosphoprotein activation. In red are stimuli where Prs > Ss > 0. (C, D) Plots showing the percentage of teams where Prs > Ss for each stimulus when predicting gene set activation (C) or phosphoprotein activation (D). Stimuli are ordered by percentage of teams and the number of activated gene sets or phosphorylated proteins is indicated on top of each stimulus. The number of active calls per gene set is shown on the top of the graph. Nineteen stimuli are not shown in (B) and (D) because no proteins were measured as phosphorylated

Fig. 4.

5

Predictability versus species similarity for…

Fig. 4.

13

Predictability versus species similarity for gene sets and phosphoproteins. ( A ) The…

Fig. 4.

Predictability versus species similarity for gene sets and phosphoproteins. (A) The y-axis indicates for each gene set the mean Prg of all team predictions when considering response to 26 stimuli in SC3. The x-axis is Sg of gene set activation. In red are stimuli where Prg > Sg > 0. (B) The y-axis indicates for each protein the mean Prp of all team predictions when considering response to 26 stimuli in SC2. The x-axis is Sp for phosphoprotein activation. (C and D) Plots showing the percentage of teams where Prg > Sg (C) and Prp > Sp gene sets and phosphoproteins are ordered by number of active calls, indicated on top of each black dot

Fig. 5.

5

Best translated gene sets representative…

Fig. 5.

13

Best translated gene sets representative of different pathways. ( A ) Histogram of…

Fig. 5.

Best translated gene sets representative of different pathways. (A) Histogram of the percentage of active gene set/stimulus pairs [560 pairs from 6396 (246 gene sets × 26 stimuli)] correctly predicted by N teams. Blue line represents the cumulative of the histogram values. (B) Distribution of teams’ Prg (blue) and Prs (red) values. (C and D) Best predicted gene sets as measured by Prg. (C) Barplot of 25 gene sets having a Prg Z-score ≥ 1.9. Blue star indicates a Sg Z-score ≥ 1.5. All gene sets are originally derived from Reactome unless otherwise indicated, according to MSigDB. (D) Hierarchical clustering of gene sets and genes that are present in at least 4 of the top 25 best predicted gene sets. Each cell is valued according to gene set membership and frequency the gene is found as part of that gene set’s GSEA CORE enrichment set. Gene/gene set pairs are assigned a 0 if the gene is not a member, 1 if only a member or 1 + C, where C is the number of stimuli under which the gene is found to be part of the CORE enrichment. Cells have a theoretical maximum value of 27. Cells are represented by a blue scale ranging from dark blue for 0 to white for the maximum value reached, here 7. Significantly overrepresented genes among these gene sets are labeled red (P-value < 0.01) or yellow (P-value < 0.05)