Bottom Line:
We then focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network.In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in the literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway.Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.

ABSTRACTThe Connectivity Map (CMAP) project profiled human cancer cell lines exposed to a library of anticancer compounds with the goal of connecting cancer with underlying genes and potential treatments. Since the therapeutic goal of most anticancer drugs is to induce tumor-selective apoptosis, it is critical to understand the specific cell death pathways triggered by drugs. This can help to better understand the mechanism of how cancer cells respond to chemical stimulations and improve the treatment of human tumors. In this study, using CMAP microarray data from breast cancer cell line MCF7, we applied a Gaussian Bayesian network modeling approach and identified apoptosis as a major drug-induced cellular-pathway. We then focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network. In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in the literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway. Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.

fig1: Heat map and PCA plots of drug-perturbed profiles in CMAP of 22,277 informative probe sets. Nonvariant probes across all samples are filtered out by IQR < 0.5. The heat map of distances (a) and the PCA plot (b) between profiles of CMAP data including randomly selected 100 control and 100 drug-perturbed samples.

Mentions:
To identify drug-responsive signature genes at a transcriptional level in cancer cells, one approach is to perform differential gene expression analysis by comparing drug-perturbed samples with controls. However, since the dataset contains samples tested with over 1,000 chemical perturbations, it is important that we take into account the diverse mechanisms of actions of the different compounds. One solution would be to perform differential expression analysis for each compound separately and then combine the results together using a P value-based Fisher's method or Stouffer's z-score approach to obtain the overall differential expression level for each gene across all compounds. However, a limitation with this type of analysis has to do with the fact that each compound only has a limited number of perturbed samples and even smaller number of control samples. This would cause the statistical power to be extremely low for individual compound analysis and would result in an inaccurate estimation of parameters and a high false positive rate. Additionally, another known issue with this type of “Separate-then-Combine” analysis is a low precision rate, which means there is a high occurrence of false positives among the most differentially expressed genes or top-hits. One way to overcome this drawback is to combine all compounds together at the beginning, as known as a “complete pooling” method. Although different drugs may have distinct mechanisms of action and different target proteins, it may still be reasonable to group them together. One reason is that there are a relatively limited number of pathways or mechanisms through which cells respond to chemical stimulations. Also, compounds tested for cancer treatment are known to share some common characteristics. For example, a large number of anticancer drugs are known to induce cell death or repress cell growth programs. In addition, the combination or “complete pooling” strategy increases the sample size from less than 5 to thousands, dramatically increasing the statistical power for inferring true responsive genes across all compounds. This assumption is also confirmed by the fact that most perturbed profiles are clustered together as shown in Figure 1. These results indicate that the variability of transcriptional profile for the same type of cell (MCF7 in this study) due to drug heterogeneity is much smaller than that caused by different chemical stimulations.

fig1: Heat map and PCA plots of drug-perturbed profiles in CMAP of 22,277 informative probe sets. Nonvariant probes across all samples are filtered out by IQR < 0.5. The heat map of distances (a) and the PCA plot (b) between profiles of CMAP data including randomly selected 100 control and 100 drug-perturbed samples.

Mentions:
To identify drug-responsive signature genes at a transcriptional level in cancer cells, one approach is to perform differential gene expression analysis by comparing drug-perturbed samples with controls. However, since the dataset contains samples tested with over 1,000 chemical perturbations, it is important that we take into account the diverse mechanisms of actions of the different compounds. One solution would be to perform differential expression analysis for each compound separately and then combine the results together using a P value-based Fisher's method or Stouffer's z-score approach to obtain the overall differential expression level for each gene across all compounds. However, a limitation with this type of analysis has to do with the fact that each compound only has a limited number of perturbed samples and even smaller number of control samples. This would cause the statistical power to be extremely low for individual compound analysis and would result in an inaccurate estimation of parameters and a high false positive rate. Additionally, another known issue with this type of “Separate-then-Combine” analysis is a low precision rate, which means there is a high occurrence of false positives among the most differentially expressed genes or top-hits. One way to overcome this drawback is to combine all compounds together at the beginning, as known as a “complete pooling” method. Although different drugs may have distinct mechanisms of action and different target proteins, it may still be reasonable to group them together. One reason is that there are a relatively limited number of pathways or mechanisms through which cells respond to chemical stimulations. Also, compounds tested for cancer treatment are known to share some common characteristics. For example, a large number of anticancer drugs are known to induce cell death or repress cell growth programs. In addition, the combination or “complete pooling” strategy increases the sample size from less than 5 to thousands, dramatically increasing the statistical power for inferring true responsive genes across all compounds. This assumption is also confirmed by the fact that most perturbed profiles are clustered together as shown in Figure 1. These results indicate that the variability of transcriptional profile for the same type of cell (MCF7 in this study) due to drug heterogeneity is much smaller than that caused by different chemical stimulations.

Bottom Line:
We then focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network.In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in the literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway.Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.

ABSTRACTThe Connectivity Map (CMAP) project profiled human cancer cell lines exposed to a library of anticancer compounds with the goal of connecting cancer with underlying genes and potential treatments. Since the therapeutic goal of most anticancer drugs is to induce tumor-selective apoptosis, it is critical to understand the specific cell death pathways triggered by drugs. This can help to better understand the mechanism of how cancer cells respond to chemical stimulations and improve the treatment of human tumors. In this study, using CMAP microarray data from breast cancer cell line MCF7, we applied a Gaussian Bayesian network modeling approach and identified apoptosis as a major drug-induced cellular-pathway. We then focused on 13 apoptotic genes that showed significant differential expression across all drug-perturbed samples to reconstruct the apoptosis network. In our predicted subnetwork, 9 out of 15 high-confidence interactions were validated in the literature, and our inferred network captured two major cell death pathways by identifying BCL2L11 and PMAIP1 as key interacting players for the intrinsic apoptosis pathway and TAXBP1 and TNFAIP3 for the extrinsic apoptosis pathway. Our inferred apoptosis network also suggested the role of BCL2L11 and TNFAIP3 as "gateway" genes in the drug-induced intrinsic and extrinsic apoptosis pathways.