Abstract

The interest toward omics data is growing in the field of toxicology owing to the diverse knowledge they generate, which can improve prediction and dosage profiling for more accurate safety assessment.

An integration methodology is presented where high-throughput omics data are enriched with biological-pathway information to produce a novel set of biological (BIO) descriptors by decomposing omics data to meaningful clusters in terms of both their mechanistic interpretation and correlation affinity.

A generalized simulated annealing algorithm is employed to estimate the optimal partition of the enriched data and accordingly produce novel descriptors based on gene content similarity. BIO descriptors are characterized by the pathway information fused to the data; thereby, they refer to groups of genes with similar biological implications rather than specific genes, which could vary across studies.

The methodology is applied to an extensive proteomics data set and demonstrates that BIO descriptors are beneficial for modeling prediction, outperforming the prediction accuracy of the original omics data, and offering a readily available biological interpretation of the findings.