Implemented ADMET Predictions

The implemented Absorption, Distribution, Metabolism, Excretion and Toxicity (ADMET) prediction models, including their performance measures, are available
in our paper online.1
The 15 models cover a diverse set of ADMET endpoints. Some of the models have already been
published, including those for Maximum Recommended Therapeutic Dose (MRTD),2
chemical mutagenicity,3 human liver microsomal (HLM),4
Pgp inhibitor/substrates.5
We also present several new models, which we make available here for the first time.

Liver Toxicity

DILI: Drug-induced liver injury (DILI) has been one of the most commonly cited reason for drug
withdrawals from the market. This application predicts whether a compound could cause DILI.
The dataset of 1,431 compounds was obtained from four sources used by Xu et al.8
This dataset contains both pharmaceuticals and non-pharmaceuticals; we classified a compound as causing DILI
if it was associated with a high risk of DILI and not if there was no such risk. Download DILI dataset or view model performance

Cytotoxicity (HepG2): Cytotoxicity is the degree to which a chemical causes damage to cells.
We developed a cytotoxicity prediction model, using in vitro data on toxicity against HepG2 cells
for 6,000 structurally diverse compounds, which we collected from ChEMBL. In developing our
model, we considered compounds with an IC50 ≤ 10 μM in the in vitro assay as
cytotoxic. Download Cytotoxicity dataset or view model performance

Metabolism

HLM: The human liver microsomal (HLM) stability assay is commonly used to identify and
exclude compounds that are too rapidly metabolized. For a drug to achieve effective therapeutic
concentrations in the body, it cannot be metabolized too rapidly by the liver. Compounds with a
half-life of 30 minutes or longer in an HLM assay are considered as stable; otherwise they are
considered unstable. We retrieved HLM data from the ChEMBL database, manually curated the data,
and classified compounds as stable or unstable based on the reported half-life (T1/2 > 30 min
was considered stable, and T1/2 < 30 min unstable. The final dataset contained 3,654 compounds.
Of these, we classified 2,313 as stable and 1,341 as unstable.4Download HLM dataset or view model performance

BBB: The blood-brain barrier (BBB) is a highly selective barrier that separates the
circulating blood from the central nervous system. We developed a vNN-based BBB model, using
352 compounds whose BBB permeability values (log⁡BB) were obtained from the literature
respectively.6,7
We classified compounds with log⁡BB values of less than –0.3 and greater than +0.3 as BBB
non-permeable and permeable. Download BBB dataset or view model performance

Pgp Substrates and Inhibitors: P-glycoprotein (Pgp) is an essential cell membrane protein
that extracts many foreign substances from the cell. Cancer cells often overexpress Pgp, which
increases the efflux of chemotherapeutic agents from the cell and prevents treatment by
reducing the effective intracellular concentrations of such agents—a phenomenon known as multidrug
resistance. For this reason, identifying compounds that can either be transported out of the cell
by Pgp (substrates) or impair Pgp function (inhibitors) is of great interest. We have developed
models to predict both Pgp substrates and Pgp inhibitors.5
The Pgp substrate dataset was collected by Hou and co-workers.11
This dataset consists of measurements of 422 substrates and 400 non-substrates. To generate a large
Pgp inhibitor dataset, we combined two datasets,12,13 and
removed duplicates to form a combined dataset consisting of a training set of
1,319 inhibitors and 937 non-inhibitors. Download Pgp Substrates dataset or view model performanceDownload Pgp Inhibitors dataset or view model performance

Others

hERG (Cardiotoxicity): The human ether-à-go-go-related gene (hERG) codes for a potassium
ion channel involved in the normal cardiac repolarization activity of the heart. Drug-induced
blockade of hERG function can cause long QT syndrome, which may result in arrhythmia and death.
We retrieved 282 known hERG blockers from the literature and classified compounds with an IC50
cutoff value of 10 μM or less as blockers.9
We also collected a set of 404 compounds with IC50 values greater than
10 μM from ChEMBL and classified them as non-blockers. Download hERG dataset or view model performance

MMP (Mitochondrial Toxicity): Given the fundamental role of mitochondria in cellular
energetics and oxidative stress, mitochondrial dysfunction has been implicated in cancer,
diabetes, neurodegenerative disorders, and cardiovascular diseases. We used the largest dataset
of chemical-induced changes in mitochondrial membrane potential (MMP), based on the assumption
that a compound that causes mitochondrial dysfunction is also likely to reduce the MMP. We developed
a vNN-based MMP prediction model, using 6,261 compounds collected from a previous study that screened
a library of 10,000 compounds (~8,300 unique chemicals) at 15 concentrations, each in triplicate,
to measure changes in the MMP in HepG2 cells.10 The study
found that 913 compounds decreased the MMP, whereas 5,395 compounds had no effect. Download MMP dataset or view model performance

Mutagenicity (AMES Test): Mutagens are chemicals that cause abnormal genetic mutations leading
to cancer. A common way to assess a chemical’s mutagenicity is
the Ames test. We developed the prediction model, using a literature dataset of 6,512 compounds, of
which 3,503 were Ames-positive. We provide further details of the model and its performance in Reference 2. Download AMES Test dataset or view model performance

MRTD: The Maximum Recommended Therapeutic Dose (MRTD) is an estimated upper daily dose that is
safe. We built a prediction model based on a dataset of MRTD values publically disclosed by the
FDA, mostly of single-day oral doses for an average adult with a body weight of 60 kg, for 1,220
compounds (most of which are small organic drugs). We excluded organometallics, high-molecular
weight polymers (>5,000 Da), nonorganic chemicals, mixtures of chemicals, and very small molecules
(<100 Da). We used an external test set of 160 compounds that were collected by the FDA for
validation. The total dataset for our model contained 1,185 compounds.2 The predicted MRTD value
is reported in mg/day unit based upon an average adult weighing 60 kg.Download MRTD dataset or view model performance

Performance measures of vNN models in 10-fold cross validation using a restricted or unrestricted applicability domain