<html>
<head>
</head>
<body>
<span>Artificial Intelligence Expert Predictive System (AIEPS) model for
acute fish (fathead minnow) toxicity</span>
</body>
</html><html>
<head>
</head>
<body>
<p>
18 December, 2015
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
9 November 2012
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Neither the model nor training set is proprietary.
</p>
<p>
The setup involves installation of Accelrys Chemistry Control 6.0.1
Runtime and Accord SDK 6.1 Runtime. Consult with Accelrys/Biovia on any
legal obligations or limitations.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Fathead minnow (Pimephales promelas)
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Fathead minnow 96h LC50 - concentration of test chemical that kills 50%
of the test subjects following a 96-h exposure
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
mmol/L or mg/L
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The relationship between fathead minnow 96h LC50 and selected molecular
fragment descriptors is implemented through a basic Probabilistic Neural
Network (PNN) with Gaussian kernel (statistical corrections included).
Atoms and fragment information is generated directly from molecular
structure using fragment chemistry data mining. The model may handle
both inorganic and organic compounds. All data modeling is performed at
the level of Log (mmol/L) units.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Not specified
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Mainly, the data has been secured from the AQUIRE database, which is
part of ECOTOX knowledge database (US EPA http://cfpub.epa.gov/ecotox/).
</p>
<p>
Data was accepted as presented in the database for 835 chemical
compounds randomly selected through computational generation from 921
compounds.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Probabilistic Neural Network with Gaussian kernel (statistical
corrections) included
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
see Attachment
</p>
<p>
Details on PNN methodology may be found here:
</p>
<p>
Masters T (1993) Practical Neural Network Recipes in C++. Academic
Press, San Diego
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
78 descriptors were chosen in the final model. A random initial
descriptor list was molecular weight as well as atoms, fragments and
functional groups. The initial data set for fathead minnow consisted of
921 compounds, from which 86 were randomly selected through
computational generation to form the external test set. To select the
descriptors, the atoms or groups poorly represented or absent in the
structures of the 835 study fathead minnow training dataset were
eliminated from the list. Partial neural network learning experiments
were conducted to identify the influence of these descriptors. Through
the neural network training and statistical analysis, it was evident
which descriptors had no impact on the resulting models behavior or
resulted in a model with weaker overall generalization capability and
these descriptors were removed.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
See attachment (AIEPS 3.0 - Fathead minnow 96hr LC50 PNN Model
Validation Study.doc), section 3, for the discussion of the derivation
and refinement of the PNN algorithm. As a starting point the
multivariate Bayesian density estimator is used in combination with a
mapping tool similar to the Maximum Likelihood Estimation method. The
best probability density associated with the accumulative distribution
of the cases in the training is determined using Meisels' algorithm.
Details can be found in Masters T (1993) Practical Neural Network
Recipes in C++. Academic Press, San Diego.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The number of chemicals in training set to descriptors ratio is 835/78 =
10.71
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Based on the continuity of the mathematical functions involved in the
model&#8217;s computation algorithm, predictions are expected to be reliable
when the values of the model input values are in the range between the
minimum and maximum values of the corresponding descriptors encountered
in the model&#8217;s training data set, or outside close to them.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The substance of interest should have chemical descriptors which fall
within the minimum or maximum values of those used in the training set.
In addition, the model provides means to compare the substance of
interest to those in the training set through Tanimoto indices. In other
words, a prediction may be deemed acceptable when the Tanimoto maximum
similarity indicator with the compounds in the model&#8217;s training set is
higher than a professionally determined value. For each prediction, the
AIEPS provides the functionality of generating a similarity with the
model&#8217;s training dataset report, where the 10 most similar compounds are
identified and the corresponding measured information reported in table
format. Another table allows comparison between the values used as model
input with the ranges of the corresponding training set descriptors. So,
all necessary elements to judge the reliability of the predictions are
made available to the user. Based on this information, is up to the user
to decide if the predicted value is reliable or not.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The model targets only small molecules consisting of less than 200
atoms. It is not recommended to use it for larger structures.
</p>
<p>
The model is limited to organics.
</p>
<p>
As the available data on organometallics was very limited, caution is
recommended when using the model for predicting the endpoint value for
organometallics.
</p>
<p>
With few exceptions the model cannot account for the differences between
structural isomers. The exceptions occur when the combination of the
model fragment descriptors is able to recognize them.
</p>
<p>
Predictions may not be accurate when the target structure involves
active fragments not accounted for by the existing model descriptors.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The training set of 835 organic substances was randomly selected,
through computer generation, from 921 substances having measured data
for acute 96 hr LC50s with fathead minnow, Pimephales promelas.
</p>
</body>
</html><html>
<head>
</head>
<body>
There has been no preprocessing of data before modelling
</body>
</html><html>
<head>
</head>
<body>
<p>
Minimum Residuals -2.3179
</p>
<p>
Maximum Residuals 1.5963
</p>
<p>
Average Residuals 0.0000
</p>
<p>
Standard Deviation of Residuals 0.5018
</p>
<p>
Sum of Square Residuals 210.0455
</p>
<p>
Average Square Residuals 0.2516
</p>
<p>
Coefficient of Determination Between Measured and Predicted Values 0.8844
</p>
</body>
</html><html>
<head>
</head>
<body>
</body>
</html><html>
<head>
</head>
<body>
<p>
86 substances were used as the external validation set for the acute
fish toxicity PNN model.
</p>
<p>
These were randomly determined thorugh a standard computational
algorithm from the whole dataset of 921 substances.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Experimental data was randomly set aside before modeling
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
Minimum Residuals -3.1563
</p>
<p>
Maximum Residuals 1.3460
</p>
<p>
Average Residuals -0.0614
</p>
<p>
Standard Deviation of Residuals 0.7511
</p>
<p>
Sum of Square Residuals 48.2808
</p>
<p>
Average Square Residuals 0.5614
</p>
<p>
Determination Coefficient Between Measured and Predicted Values 0.7766
</p>
<p>
Correlation Coefficient Between Measured and Predicted Values 0.8812
</p>
<p>
Shapiro-Wilk W Test Statistic 0.9222
</p>
<p>
Prob&lt;W &lt;0.0001
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The Shapiro-Wilk W Test rejects the null hypothesis that the
distribution of the residuals on the external test set of 86 compounds
is normal at ?=0.05 significance level. Computation of confidence
intervals cannot be implemented. The compound 899 (CAS RN 1484135 i.e.
N-Vinylcarbazole) is not properly represented inside the training set
and is a structural outlier.
</p>
</body>
</html><html>
<head>
</head>
<body>
</body>
</html><html>
<head>
</head>
<body>
<p>
The mechanistic approach of the present model involves the
identification of presence or absence of the chemical descriptors (76)
used to train the model with those in the substance of interest. The
algorithm from the trained model is applied with the appropriate weights
assigned to each factor reflecting the influence of those factors on the
endpoint of interest. The result provides a large scope prediction of
the 96hr LC50 for fathead minnow.
</p>
</body>
</html><html>
<head>
</head>
<body>
<p>
The mechanistic interpretation was determined a posteriori by
interpreting and modifying the final set of descriptors which
contributed to the best fit descriptors.
</p>
</body>
</html>Q52-55-56-5212016/11/11Artificial Intelligence Expert Predictive System, AIEPS, daphnia magna, fish, fathead minnow, acute toxicity