BackgroundA challenge in precision medicine is the transformation of genomic data into knowledge that can be used to stratify patients into treatment groups based on predicted clinical response. Although clinical trials remain the only way to truly measure drug toxicities and effectiveness, as a scientific community we lack the resources to clinically assess all drugs presently under development. Therefore, an effective preclinical model system that enables prediction of anticancer drug response could significantly speed the broader adoption of personalized medicine.

ResultsThree large-scale pharmacogenomic studies have screened anticancer compounds in greater than 1000 distinct human cancer cell lines. We combined these datasets to generate and validate multi-omic predictors of drug response. We compared drug response signatures built using a penalized linear regression model and two non-linear machine learning techniques, random forest and support vector machine. The precision and robustness of each drug response signature was assessed using cross-validation across three independent datasets. Fifteen drugs were common among the datasets. We validated prediction signatures for eleven out of fifteen tested drugs 17-AAG, AZD0530, AZD6244, Erlotinib, Lapatinib, Nultin-3, Paclitaxel, PD0325901, PD0332991, PF02341066, and PLX4720.

ConclusionsMulti-omic predictors of drug response can be generated and validated for many drugs. Specifically, the random forest algorithm generated more precise and robust prediction signatures when compared to support vector machines and the more commonly used elastic net regression. The resulting drug response signatures can be used to stratify patients into treatment groups based on their individual tumor biology, with two major benefits: speeding the process of bringing preclinical drugs to market, and the repurposing and repositioning of existing anticancer therapies.

An erratum to this article is available at http:-dx.doi.org-10.1186-s12864-015-1630-1.