The thermodynamic integration (TI) and expanded ensemble (EE) methods are used here to calculate the hydration free energy in water, the solvation free energy in 1-octanol, and the octanol-water partition coefficient for a six compounds of varying functionality using the optimized potentials for liquid simulations (OPLS) all-atom (AA) force field parameters and atomic charges. Both methods use the molecular dynamics algorithm as a primary component of the simulation protocol, and both have found wide applications in fields such as the calculation of activity coefficients, phase behavior, and partition coefficients. Both methods result in solvation free energies and 1-octanol/water partition coefficients with average absolute deviations (AAD) from experimental data to within 4 kJ/mol and 0.5 log units, respectively. Here, we find that in simulations the OPLS-AA force field parameters (with fixed charges) can reproduce solvation free energies of solutes in 1-octanol with AAD of about half that for the solute hydration free energies using a extended simple point charge (SPC/E) model of water. The computational efficiency of the two simulation methods are compared based on the time (in nanoseconds) required to obtain similar standard deviations in the solvation free energies and 1-octanol/water partition coefficients. By this analysis, the EE method is found to be a factor of nine more efficient than the TI algorithm. For both methods, solvation free energy calculations in 1-octanol consume roughly an order of magnitude more CPU hours than the hydration free energy calculations.

Recently the authors published a robust QSPR model of aqueous solubility which exploited the computationally derived molecular descriptor topographical polar surface area (TPSA) alongside experimentally determined melting point and logP. This model (the "TPSA model") is able to accurately predict to within $\pm$ one log unit the aqueous solubility of 87% of the compounds in a chemically diverse data set of 1265 molecules. This is comparable to results achieved for established models of aqueous solubility e.g. ESOL (79%) and the General Solubility Equation (81%). Hierarchical clustering of this data set according to chemical similarity shows that a significant number of molecules with phenolic and/or phenol-like moieties are poorly predicted by these equations. Modification of the TPSA model to additionally incorporate a descriptor pertaining to a simple count of phenol and phenol-like moieties improves the predictive ability within $\pm$ one log unit to 89% for the full data set (1265 compounds -8.48 < logS < 1.58) and 82% for a reduced data set (1160 compounds 6.00 < logS < 0.00) which excludes compounds at the sparsely populated extremities of the data range. This improvement can be rationalized as the additional descriptor in the model acting as a correction factor which acknowledges the effect of phenolic substituents on the electronic characteristics of aromatic molecules i.e. the generally positive contribution to aqueous solubility made by phenolic moieties.

Aqueous solubility is one of the most important ADMET properties to assess and to optimize during the drug discovery process. At present, accurate prediction of solubility remains very challenging and there is an important need of independent benchmarking of the existing in silico models such as to suggest solutions for their improvement. In this study, we developed a new protocol for improved solubility prediction by combining several existing models available in commercial or free software packages. We first performed an evaluation of ten in silico models for aqueous solubility prediction on several data sets in order to assess the reliability of the methods, and we proposed a new diverse data set of 150 molecules as relevant test set, SolDiv150. We developed a random forest protocol to evaluate the performance of different fingerprints for aqueous solubility prediction based on molecular structure similarity. Our protocol, called a "multimodel protocol", allows selecting the most accurate model for a compound of interest among the employed models or software packages, achieving r(2) of 0.84 when applied to SolDiv150. We also found that all models assessed here performed better on druglike molecules than on real drugs, thus additional improvement is needed in this direction. Overall, our approach enlarges the applicability domain as demonstrated by the more accurate results for solubility prediction obtained using our protocol in comparison to using individual models.

Using molecular docking between organic chemicals and lipid membrane to revise the well known octanol-water partition coefficient of the mixture.
Wang, Ting and Zhou, Xianghong and Wang, Dali and Yin, Daqiang and Lin, Zhifen
Environmental toxicology and pharmacology, 2012, 34(1), 59-66
PMID: 22445871
doi: 10.1016/j.etap.2012.02.008

The octanol-water partition coefficient of a mixture has been widely used to predict the baseline toxicity of non-polar narcotic chemical mixtures, since toxic effects are usually generated by multiple mixtures. However, it remains unclear whether the validity of log Kowmix can be demonstrated, because experimental methods cannot be used to determine this parameter. The invalidity and the further revision of log Kowmix were therefore studied by using molecular docking between non-polar narcotic chemicals and lipid membrane (E(binding)). The results show E(binding) is a feasible substitute parameter for log Kow because their relationship is linear. Based on a molecular docking and QSAR model, a new calculated method of log Kowmix was proposed as follows: log(Kowmix)

Polychlorinated azobenzenes (PCABs) can be found as contaminant by products in 3,4-dichloroaniline and its derivatives and in the herbicides Diuron, Linuron, Methazole, Neburon, Propanil and SWEP. Trans congeners of PCABs are physically and chemically more stable and so are environmentally relevant, when compared to unstable cis congeners. In this study, to fulfill gaps on environmentally relevant partitioning properties of PCABs, the values of n-octanol/water partition coefficients (log K(OW)) have been determined for 209 congeners of chloro-trans-azobenzene (Ct-AB) by means of quantitative structure-property relationship (QSPR) approach and artificial neural networks (ANN) predictive ability. The QSPR methods used based on geometry optimalization and quantum-chemical structural descriptors, which were computed on the level of density functional theory (DFT) using B3LYP functional and 6-311++G basis set in Gaussian 03 and of the semi-empirical quantum chemistry method (PM6) of the molecular orbital package (MOPAC). Polychlorinated dibenzo-p-dioxins (PCDDs), -furans (PCDFs) and -biphenyls (PCBs), to which PCABs are related, were reference compounds in this study. An experimentally obtained data on physical and chemical properties of PCDD/Fs and PCBs were reference data for ANN predictions of log K(OW) values of Ct-ABs in this study. Both calculation methods gave similar results in term of absolute log K(OW) values, while the models generated by PM6 are considered highly efficient in time spent, when compared to these by DFT. The estimated log K(OW) values of 209 Ct-ABs varied between 5.22-5.57 and 5.45-5.60 for Mono-, 5.56-6.00 and 5.59-6.07 for Di-, 5.89-6.56 and 5.91-6.46 for Tri-, 6.10-7.05 and 6.13-6.80 for Tetra-, 6.43-7.39 and 6.48-7.14 for Penta-, 6.61-7.78 and 6.98-7.42 for Hexa-, 7.41-7.94 and 7.34-7.86 for Hepta-, 7.99-8.17 and 7.72-8.20 for Octa-, 8.35-8.42 and 8.10-8.62 for NonaCt-ABs, and 8.52-8.60 and 8.81-8.83 for DecaCt-AB. These log K(OW) values shows that Ct-ABs are compounds of relatively low environmental mobility (log K(OW) > 4.5) and of significant bioaccumulation potential.

The possible molecular geometries of 134 halogenated methyl-phenyl ethers were optimized at B3LYP/6-31G(*) level with Gaussian 98 program. The calculated structural parameters were taken as theoretical descriptors to establish two new novel QSPR models for predicting aqueous solubility (-lgS(w,l)) and n-octanol/water partition coefficient (lgK(ow)) of halogenated methyl-phenyl ethers. The two models achieved in this work both contain three variables: energy of the lowest unoccupied molecular orbital (E(LUMO)), most positive atomic partial charge in molecule (q(+)), and quadrupole moment (Q(yy) or Q(zz)), of which R values are 0.992 and 0.970 respectively, their standard errors of estimate in modeling (SD) are 0.132 and 0.178, respectively. The results of leave-one-out (LOO) cross-validation for training set and validation with external test sets both show that the models obtained exhibited optimum stability and good predictive power. We suggests that two QSPR models derived here can be used to predict S(w,l) and K(ow) accurately for non-tested halogenated methyl-phenyl ethers congeners.

A new possibility for estimating the octanol/water coefficient (log P) was investigated using only one descriptor, the semi-empirical electrotopological index (I(SET)). The predictability of four octanol/water partition coefficient (log P) calculation models was compared using a set of 131 aliphatic organic compounds from five different classes. Log P values were calculated employing atomic-contribution methods, as in the Ghose/Crippen approach and its later refinement, AlogP; using fragmental methods through the ClogP method; and employing an approach considering the whole molecule using topological indices with the MlogP method. The efficiency and the applicability of the I(SET) in terms of calculating log P were demonstrated through good statistical quality (r > 0.99; s < 0.18), high internal stability and good predictive ability for an external group of compounds in the same order as the widely used models based on the fragmental method, ClogP, and the atomic contribution method, AlogP, which are among the most used methods of predicting log P.

Polychlorinated azoxybenzenes (PCAOBs) theoretically consist of 798 congeners with 399 in cis and 399 in trans configuration. PCAOBs in trans configuration are largerly planar compounds and some are highly toxic and environmentally relevant compared to cis congeners. Trans-PCAOBs can be found as by-side products in 3,4-dichloroaniline and some herbicides. To fulfill gaps in physical and chemical properties of PCAOBs, the values of log K(OW) were determined for 399 congeners of t-CAOB using a computational approach. We used the semi-empirical RM1 in MOPAC and DFT B3LYP in Gaussian 03 methods, artificial neural net (ANN) predictions, and the standardized variables with and without the normal varimax rotation. The models created predicted the values of log K(OW) of all 399 chlorinated derivatives of trans-azoxybenzenes (C-t-AOBs). The values of log K(OW) of C-t-AOBs varied between 5.08 and 5.42 for Mono-, 5.16 and 5.96 for Di-, 5.79 and 6.73 for Tri-, 6.26 and 7.18 for Tetra-, 6.65 and 7.54 for Penta-, 7.13 and 7.94 for Hexa-, 7.20 and 8.20 for Hepta-, 7.96 and 8.32 for Octa-, 8.32 and 8.43 for Nonachloro-t-AOBs and 8.55 and 8.97 for Decachloro-t-AOB. These log K(OW) values were similar per chloro-t-AOB congener and independent of the calculation method. C-t-AOBs have log K(OW) values above 4.5, and what relates to contaminants of low or very low environmentally mobility but a high predilection to the soil and sediment particles and with potential for bioaccumulation. The models that used the standardized variables had smallest errors and higher correlation coefficients compared to the models that based on the normal varimax rotation of standardized structural descriptors. In light of these data, the semi-empirical RM1 calculations in MOPAC software and followed by ANN were a much less time consuming and less expensive compared to the DFT B3LYP method.

Quantitative structure-property relationships for predicting the water-octanol partition coefficient, logP(OW), are reported. The models are based on local properties calculated at the standard isodensity surface using semiempirical molecular orbital theory and use descriptors obtained as the areas of the surface found in each bin in a predefined binning scheme. The effect of conformation is taken into account but was found to have little effect on the predictive power of the models. A detailed error analysis suggests that the accuracy of the models is limited by that of the experimental data and that the best possible performance is approximately $\pm$0.5 log units. The models yield a local hydrophobicity function at the surface of the molecules.

A key challenge in many drug discovery programs is to accurately assess the potential value of screening hits. This is particularly true in fragment-based drug design (FBDD), where the hits often bind relatively weakly, but are correspondingly small. Ligand efficiency (LE) considers both the potency and the size of the molecule, and enables us to estimate whether or not an initial hit is likely to be optimisable to a potent, druglike lead. While size is a key property that needs to be controlled in a small molecule drug, there are a number of additional properties that should also be considered. Lipophilicity is amongst the most important of these additional properties, and here we present a new efficiency index (LLE(AT)) that combines lipophilicity, size and potency. The index is intuitively defined, and has been designed to have the same target value and dynamic range as LE, making it easily interpretable by medicinal chemists. Monitoring both LE and LLE(AT) should help both in the selection of more promising fragment hits, and controlling molecular weight and lipophilicity during optimisation.

2010

Solubility plays a very important role in the selection of compounds for drug screening. In this context, a QSAR model was developed for predicting water solubility of drug-like compounds. First, a set of relevant parameters for establishing a drug-like chemical space was defined. The comparison of chemical structures from the FDAMDD and PHYSPROP databases allowed the selection of properties that were more efficient in discriminating drug-like compounds from other chemicals. These filters were later on applied to the PHYSPROP database and 1174 chemicals fulfilling these criteria and with experimental solubility information available at 25$\,^{\circ}$C were retained. Several QSAR solubility models were developed from this set of compounds, and the best one was selected based on the accuracy of correct classifications obtained for randomly chosen training and validation subsets. Further validation of the model was performed with a set of 102 drugs for which experimental solubility data have been recently reported. A good agreement between the predictions and the experimental values confirmed the reliability of the QSAR model.

Log P(OW), the negative logarithm of the octanol-water partition coefficient, is omnipresent in computational drug design. Here, we present a surface-integral model for calculating log P(OW). The model is based on local properties calculated using AM1 semiempirical molecular orbital theory. These are the molecular electrostatic potential (MEP), local ionization energy (IE(L)), local electron affinity (EA(L)), local hardness (HARD), local polarizability (POL), and the local field normal to the surface (FN). We have developed a new scheme to calculate a local hydrophobicity based on binning the range of local surface properties instead of using polynomial expansions of the base terms. The model has been trained using approximately 9500 compounds available from the literature. It was validated on approximately 1350 compounds from the literature and an in-house validation set of 768 compounds from Boehringer-Ingelheim. The model performs similarly to or slightly better than the best commercially available models. We also introduce a model based purely on conformationally rigid compounds that performs well for flexible compounds if the Boltzmann weighted predictions for the different conformers are used. This is the first 3D QSPR model based on such a large databasis that is able to benefit from using conformational ensembles.

Polychlorinated diphenyl ethers (PCDEs) are a group of important persistent organic pollutants. In the present study, geometrical optimization and electrostatic potential calculations have been performed for all 209 PCDE congeners at the HF/6-31G(*) level of theory. A number of statistically-based parameters have been obtained. Linear relationships between gas-chromatographic relative retention time (RRT), n-octanol/water partition coefficient (log K(OW)), 298 K supercooled liquid vapour pressures (log p(L)), aqueous solubilities (logS(w,L)) and the immunotoxicity values (log ED(50)) of PCDEs and the structural descriptors have been established by multiple linear regression method. The result shows that the quantities derived from electrostatic potential V(s,min), SigmaV(s)(+),V(s,av)(-),Pi,sigma(tot)(2),sigma(+)(2),nu, and N(v)(+), together with the number of the chlorine atoms on the two phenyl rings (N(Cl)) can be well used to express the quantitative structure-property (activity) relationships of PCDEs. Good predictive capabilities have also been demonstrated by leave-group(1/5)-out cross-validation and external test set. Based on these equations, the predicted values have been presented for those PCDE congeners whose experimentally determined physico-chemical properties are unavailable.

A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models.

Parameterization of an empirical model for the prediction of n-octanol, alkane and cyclohexane/water as well as brain/blood partition coefficients.
Zerara, Mohamed and Brickmann, Jurgen and Kretschmer, Robert and Exner, Thomas E.
Journal of computer-aided molecular design, 2009, 23(2), 105-111
PMID: 18818882
doi: 10.1007/s10822-008-9243-2

Quantitative information of solvation and transfer free energies is often needed for the understanding of many physicochemical processes, e.g the molecular recognition phenomena, the transport and diffusion processes through biological membranes and the tertiary structure of proteins. Recently, a concept for the localization and quantification of hydrophobicity has been introduced (Jäger et al. J Chem Inf Comput Sci 43:237-247, 2003). This model is based on the assumptions that the overall hydrophobicity can be obtained as a superposition of fragment contributions. To date, all predictive models for the logP have been parameterized for n-octanol/water (logP(oct)) solvent while very few models with poor predictive abilities are available for other solvents. In this work, we propose a parameterization of an empirical model for n-octanol/water, alkane/water (logP(alk)) and cyclohexane/water (logP(cyc)) systems. Comparison of both logP(alk) and logP(cyc) with the logarithms of brain/blood ratios (logBB) for a set of structurally diverse compounds revealed a high correlation showing their superiority over the logP(oct) measure in this context.

We first review the state-of-the-art in development of log P prediction approaches falling in two major categories: substructure-based and property-based methods. Then, we compare the predictive power of representative methods for one public (N

A large variety of log P calculation methods failed to produce sufficient accuracy in log P prediction for two in-house datasets of more than 96000 compounds contrary to their significantly better performances on public datasets. The minimum Root Mean Squared Error (RMSE) of 1.02 and 0.65 were calculated for the Pfizer and Nycomed datasets, respectively, in the 'out-of-box' implementation. Importantly, the use of local corrections (LC) implemented in the ALOGPS program based on experimental in-house log P data significantly reduced the RMSE to 0.59 and 0.48 for the Pfizer and Nycomed datasets, respectively, instantly without retraining the model. Moreover, more than 60% of molecules predicted with the highest confidence in each set had a mean absolute error (MAE) less than 0.33 log units that is only ca. 10% higher than the estimated variation in experimental log P measurements for the Pfizer dataset. Therefore, following this retrospective analysis, we suggest that the use of the predicted log P values with high confidence may eliminate the need of experimentally testing every other compound. This strategy could reduce the cost of measurements for pharmaceutical companies by a factor of 2, increase the confidence in prediction at the analog design stage of drug discovery programs, and could be extended to other ADMET properties.

2008

Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P.
Hughes, Laura D and Palmer, David S and Nigsch, Florian and Mitchell, John B O
Journal of chemical information and modeling, 2008, 48(1), 220-232
PMID: 18186622
doi: 10.1021/ci700307p

This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE

The molecular geometries of 209 polybrominated diphenyl ethers (PBDEs) were optimized at the B3LYP/6-31G level with Gaussian 98 program. The calculated structural parameters were taken as theoretical descriptors to establish two novel QSPR models for predicting supercooled liquid vapor pressures (P(L)) and octanol/air partition coefficients (K(OA)) of PBDEs based on the theoretical linear solvation energy relationship (TLSER) model, respectively. The two models achieved in this work both contain three variables: most negative atomic partial charge in molecule (q(-)), dipole moment of the molecules (mu) and mean molecular polarizability (alpha), of which R(2) values are both as high as 0.997, their root-mean-square errors in modeling (RSMEE) are 0.069 and 0.062 respectively. In addition, the F-value of two models are both evidently larger than critical values F(0.05) and the variation inflation factors (VIF) of variables herein are all less than 5.0, suggesting obvious statistic significance of the P(L) and K(OA) predicting models. The results of Leave-One-Out (LOO) cross-validation for training set and validation with external test set both show that the two models obtained exhibited optimum stability and good predictive power. We suggest that the QSPRs derived here can be used to predict accurately P(L) and K(OA) for non-tested PBDE congeners from Mono-BDEs to Hepta-BDEs and from Mono-BDEs to Hexa-BDEs, respectively.

Halogenated methyl-phenyl ethers (anisoles) are ubiquitous organic compounds in the environment. In the present study, geometrical optimization and electrostatic potential calculations have been performed for 134 halogenated anisoles at the HF/6-31G* level of theory. A number of statistically based parameters have been obtained. Linear relationships between sub-cooled liquid vapor pressures (lgp(L)), n-octanol/water partition coefficient (lgK(ow)) and aqueous solubilities (-lgS(w,L)) of halogenated anisoles and the structural descriptors have been established by multiple regression method. The result shows that the quantities derived from electrostatic potential V(min), V(s,max), summation operatorV(s)(+), summation operatorV(s)(-), V(s,av)(-) and nu, together with the molecular volume (V(mc)) and E(HOMO) can be well used to express the quantitative structure-property relationships of halogenated anisoles, which proves the general applicability of this parameter set to a great extent. Good predictive capabilities have also been demonstrated. Based on these excellent equations, the predicted values have been presented for those halogenated anisoles whose experimentally determined physicochemical properties are unavailable.

QSPR modeling of octanol/water partition coefficient for vitamins by optimal descriptors calculated with SMILES.
Toropov, A A and Toropova, A P and Raska, I
European journal of medicinal chemistry, 2008, 43(4), 714-740
PMID: 17629592
doi: 10.1016/j.ejmech.2007.05.007

Simplified molecular input line entry system (SMILES) has been utilized in constructing quantitative structure-property relationships (QSPR) for octanol/water partition coefficient of vitamins and organic compounds of different classes by optimal descriptors. Statistical characteristics of the best model (vitamins) are the following: n

A shake-flask method was employed to determine the water solubility (-lgS (w)) and n-octanol/water partition coefficient (lgK (ow)) of 20 substituted phenols at 298.15 K. And optimized calculation was carried out at B3LYP/6-311G** level with DFT method. Afterwards the obtained parameters were taken as theoretical descriptors to establish the QSPR models for predicting -lgS (w) and lgK (ow), in which the conventional correlation coefficients (R (2)) are 0.9800 and 0.9941, respectively. The two models were further validated by variance inflation factors (VIF) and t-test. Upon comparison, the stability and predictive power are more advantageous than those based on AM1 molecular orbital method and molecular connectivity method.

Theoretical molecular descriptors were tested against logK(OW) values for polybrominated diphenyl ethers (PBDEs) using the Partial Least-Squares Regression method which can be used to analyze data with many variables and few observations. A quantitative structure-property relationship (QSPR) model was successfully developed with a high cross-validated value (Q(cum)(2)) of 0.961, indicating a good predictive ability and stability of the model. The predictive power of the QSPR model was further cross-validated. The values of logK(OW) for PBDEs are mainly governed by molecular surface area, energy of the lowest unoccupied molecular orbital and the net atomic charges on the oxygen atom. All these descriptors have been discussed to interpret the partitioning mechanism of PBDE chemicals. The bulk property of the molecules represented by molecular surface area is the leading factor, and K(OW) values increase with the increase of molecular surface area. Higher energy of the lowest unoccupied molecular orbital and higher net atomic charge on the oxygen atom of PBDEs result in smaller K(OW). The energy of the lowest unoccupied molecular orbital and the net atomic charge on PBDEs oxygen also play important roles in affecting the partition of PBDEs between octanol and water by influencing the interactions between PBDEs and solvent molecules.

Using SciTegic's extended connectivity fingerprint as raw descriptors, a robust partial least-squares model for logP prediction was developed. The PLS model is based on 39 latent variables. An additional 8 correction factors are employed to account for effects such as intramolecular hydrogen bonding. The model performs similarly to ClogP for compounds with molecular weight in the 250-400 range but significantly better than ClogP for molecules with molecular weight over 400. Considering modern drug discovery tends to generate larger candidate compounds, the PLS model is better suited for drug discovery applications. The good performance of the simple PLS model indicates that the molecular fingerprints encode detailed structure information. When used properly they outperform conventional descriptors in QSPR model development.

The aqueous solubility (logW) and n-octanol/water partition coefficient (logP(OW)) are important properties for pharmacology, toxicology and medicinal chemistry. Based on an understanding of the dissolution process, the frontier orbital interaction model was suggested in the present paper to describe the solvent-solute interactions of organohalogen compounds and a general three-parameter model was proposed to predict the aqueous solubility and n-octanol/water partition coefficient for the organohalogen compounds containing nonhydrogen-binding interactions. The model has satisfactory prediction accuracy. Furthermore, every item in the model has a very explicit meaning, which should be helpful to understand the structure-solubility relationship and may be provide a new view on estimation of solubility.

2007

QSPR modeling of n-octanol/water partition coefficients and water solubility of PCDEs by the method of Cl substitution position.
Chen, Shu-Da and Zeng, Xiao-Lan and Wang, Zun-Yao and Liu, Hong-Xia
The Science of the total environment, 2007, 382(1), 59-69
PMID: 17531292
doi: 10.1016/j.scitotenv.2007.04.014

The number of Cl substitution positions (N(PCS)) of all 209 possible molecular structure patterns of polychlorinated diphenyl ethers (PCDEs) were correlated with their partition properties n-octanol/water partition coefficient (lgK(ow)) and sub-cooled liquid water solubilities (-lgS(w,l)). The correlation coefficients (R) and the leave-one-out (LOO) cross-validation correlation coefficients (R(cv)) of all the 6-descriptor models for lgK(ow) and -lgS(w,l) are more than 0.98. By using stepwise multiple regression (SMR), the best two models of lgK(ow) with three descriptors (R

The logarithmic n-octanol/water partition coefficient (logK(ow)) is a very important property which concerns water-solubility, bioconcentration factor, toxicity and soil absorption coefficient of organic compounds. Quantitative structure-property relationship (QSPR) model for logK(ow) of 133 polychlorinated biphenyls (PCBs) is analyzed using heuristic method (HM) implemented in CODESSA. In order to indicate the influence of different molecular descriptors on logK(ow) values and well understand the important structural factors affecting the experimental values, three multivariable linear models derived from three groups of different molecular descriptors were built. Moreover, each molecular descriptor in these models was discussed to well understand the relationship between molecular structures and their logK(ow) values. The proposed models gave the following results: the square of correlation coefficient, R(2), for the models with one, two and three molecular descriptors was 0.8854, 0.9239 and 0.9285, respectively.

A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structures of 150 drug organic compounds to their n-octanol-water partition coefficients (logP(o/w)). Molecular descriptors derived solely from 3D structures of the molecular drugs. A genetic algorithm was also applied as a variable selection tool in QSPR analysis. The models were constructed using 110 molecules as training set, and predictive ability tested using 40 compounds. Modeling of logP(o/w) of these compounds as a function of the theoretically derived descriptors was established by multiple linear regression (MLR). Four descriptors for these compounds molecular volume (MV) (geometrical), hydrophilic-lipophilic balance (HLB) (constitutional), hydrogen bond forming ability (HB) (electronic) and polar surface area (PSA) (electrostatic) are taken as inputs for the model. The use of descriptors calculated only from molecular structure eliminates the need for experimental determination of properties for use in the correlation and allows for the estimation of logP(o/w) for molecules not yet synthesized. Application of the developed model to a testing set of 40 drug organic compounds demonstrates that the model is reliable with good predictive accuracy and simple formulation. The prediction results are in good agreement with the experimental value. The root mean square error of prediction (RMSEP) and square correlation coefficient (R2) for MLR model were 0.22 and 0.99 for the prediction set logP(o/w).

We have developed a new method, i.e., XLOGP3, for logP computation. XLOGP3 predicts the logP value of a query compound by using the known logP value of a reference compound as a starting point. The difference in the logP values of the query compound and the reference compound is then estimated by an additive model. The additive model implemented in XLOGP3 uses a total of 87 atom/group types and two correction factors as descriptors. It is calibrated on a training set of 8199 organic compounds with reliable logP data through a multivariate linear regression analysis. For a given query compound, the compound showing the highest structural similarity in the training set will be selected as the reference compound. Structural similarity is quantified based on topological torsion descriptors. XLOGP3 has been tested along with its predecessor, i.e., XLOGP2, as well as several popular logP methods on two independent test sets: one contains 406 small-molecule drugs approved by the FDA and the other contains 219 oligopeptides. On both test sets, XLOGP3 produces more accurate predictions than most of the other methods with average unsigned errors of 0.24-0.51 units. Compared to conventional additive methods, XLOGP3 does not rely on an extensive classification of fragments and correction factors in order to improve accuracy. It is also able to utilize the ever-increasing experimentally measured logP data more effectively.

The logarithmic n-octanol/water partition coefficient (logK(ow)) is an important property for pharmacology, toxicology and medicinal chemistry. Quantitative structure-property relationship (QSPR) model for the lipophilic behaviour (logK(ow)) of the data set containing 133 polychlorinated biphenyl (PCB) congeners is analyzed using the conceptual density functional theory based global reactivity parameter such as electrophilicity index (omega) along with energy of lowest unoccupied molecular orbital (E(LUMO)) and number of chlorine substituents (N(Cl)) as descriptors. A reasonably good coefficient of determination (r(2)

Comparison of QSPR models of octanol/water partition coefficient for vitamins and non vitamins.
Raska, I and Toropov, A
European journal of medicinal chemistry, 2006, 41(11), 1271-1278
PMID: 16920228
doi: 10.1016/j.ejmech.2006.06.006

Comparison of QSPR models for vitamins and for various substances, which are not vitamins indicates that vitamins have less number of molecular features (topologic and kinds of atomic orbitals), but, most probably, more complex mechanisms of interactions with molecules of water and octanol.

Hundred ninety three drugs of different pharmacological activity were studied. Lipophilicity of a drug is one of the parameters, which influence its biological activity. The n-octanol-water partition coefficients were calculated for these compounds by use of different theoretical procedures (AlogPs, IAlogP, miLogP, ClogP, logP(Kowwin), and xlogP). Particular theoretical partition coefficients were compared with experimental n-octanol-water partition coefficients (logP(exp)) for all studied drugs. It was shown that experimental partition coefficients correlate the best with theoretical partition coefficients calculated by use of logP(Kowwin) and AlogPs methods. It was shown that it exists the possibility of the prediction of experimental n-octanol-water partition coefficients on the basis of logP(Kowwin), AlogPs, and ClogP for fifteen drugs (adrenalin, clobazam, 5,5-dimethylbarbituric acid, ethyl nicotinate, fluphenazine, ibuprofen, methyllorazepam, pimozide, prednisolone, promethazine, spironolactone, surital, theophylline, triamterene, and trimethoprim).

... offers a quantitative three- dimensional description of lipophilicity . At a given point in space, the MLP value represents the results of the intermolecular interactions between all fragments and the solvent system at that point. Two compo- nents are necessary to calculate the MLP ...

... offers a quantitative three- dimensional description of lipophilicity . At a given point in space, the MLP value represents the results of the intermolecular interactions between all fragments and the solvent system at that point. Two compo- nents are necessary to calculate the MLP ...

A new strategy for the calculation of n-octanol/water partition coefficients is presented. Log P calculations of unknown chemicals are based on their closest structural analogues from a database of molecules with known experimental log P values. The contribution of the differing molecular parts is then estimated from a compilation of fragment contributions. Such a strategy is found to be superior to conventional group contribution methods and promises an overall enhancement of the prediction's accuracy.

The Solvation Free Energy Density (SFED) model, a solvation model proposed by No et al. was modified to give better solvation free energies of the molecules having high polarizable groups. The SFED at a point around the molecule was represented by a linear combination of four basis functions, the contribution from the cavitation free energy of a solvent, and a constant. As an application of the SFED model, the linear expansion coefficients of the Hydration Free Energy Density (HFED) and the 1-Octanol Free Energy Density (1-OFED) were determined. Both calculated hydration free energy and 1-octanol solvation free energy of selected 95 organic molecules agreed well with experimental values. The standard errors were 0.47 and 0.39 kcal/mol, respectively. 1-Octanol/water partition coefficients (P) of the molecules were calculated from the difference of the HFE and 1-OFE of the molecules. At the same time, the logP density (LPD) of a molecule was represented by the same basis functional form with the SFED model. The logP of a molecule can be obtained by the integration of the LPD of the molecule. The coefficients of the basis functions were determined by using experimental logP as constraints through an optimization procedure. Both logPs calculated from the free energy difference and from the LPD agreed well with the experimental data. The absolute mean errors were obtained as 0.34 and 0.32, respectively.

Recent methodologies for the estimation of n-octanol/water partition coefficients and their use in the prediction of membrane transport properties of drugs
Klopman, G and Zhu, H
Mini Reviews in Medicinal Chemistry, 2005, 5(2), 127-133
PMID: 15720283

The lipophilicity of drug molecules (represented as the logarithm of the n-octanol / water partition coefficient) often strongly correlates with their pharmacological and toxic activities. It is therefore, not surprising that there is considerable interest in developing mathematical models capable to accurately predict their value for new drug candidates. In this review, current major approaches for estimating partition coefficients are described and some of their advantages and disadvantages are discussed. Recent uses of these partition coefficient algorithms in the development of membrane transport models are also discussed.

A New Group Contribution Approach to the Calculation of LogP
Zhu, Hao and Sedykh, Aleksander and Chakravarti, Suman K and Klopman, Gilles
Current Protein & Peptide Science, 2005, 1(1), 3-9
doi: 10.2174/1573409052952323

A new improved group contribution model that predicts the n-octanol/water partition coefficient (logP) is described. A combined parameter set that contains 153 basic parameters, 41 extended parameter and 14 molecular surface/property descriptors was generated from a training database of 8320 chemicals. The model achieved significant improvement after modifying the traditional group contribution equation by using a three dimensional steric hindrance modulator. The predictive ability of this model was accessed by calculating the logP values of a test set of 1667 ordinary organic chemicals and a set of 137 drug-like chemicals that were not included in the training database.

Solvation models, based on fundamental chemical struc- ture theory, were developed in the SPARC mechanistic tool box to predict a large array of physical properties of organic compounds in water and in non-aqueous solvents strictly from molecular structure. The SPARC self-inter- action solvation models that describe the intermolecular interactionbetweenlikemolecules(solute-soluteorsol- vent-solvent) were extended to quantify solute-solvent interaction energy in order to estimate the activity coefficient in almost any solvent. Solvation models that include dispersion, induction, dipole-dipole and hydrogen bonding interactions are used to describe the intermolec- ular interaction upon placing an organic solute molecule in any single or mixed solvent system. In addition to estimation of the activity coefficient for 2674 organic compounds, these solvation models were validated on solubility and liquid/liquid distribution coefficient in more than163solventsincludingwater.TheRMSdeviationsof the calculated versus observed activity coefficients, solu- bilities and liquid/liquid distribution coefficients were 0.272 log mole fraction, 0.487 log mole fraction and 0.44 log units, respectively.

Evaluation of the ALOGPS, ACD Labs LogD, and PALLAS PrologD suites to calculate the log D distribution coefficient resulted in high root-mean-squared error (RMSE) of 1.0-1.5 log for two in-house Pfizer's log D data sets of 17,861 and 640 compounds. Inaccuracy in log P prediction was the limiting factor for the overall log D estimation by these algorithms. The self-learning feature of the ALOGPS (LIBRARY mode) remarkably improved the accuracy in log D prediction, and an rmse of 0.64-0.65 was calculated for both data sets.

The ALOGPS 2.1 was developed to predict 1-octanol/water partition coefficients, logP, and aqueous solubility of neutral compounds. An exclusive feature of this program is its ability to incorporate new user-provided data by means of self-learning properties of Associative Neural Networks. Using this feature, it calculated a similar performance, RMSE

A novel method for fast and accurate evaluation of the generalized Born radii in macromolecular solvation electrostatics calculations is proposed, based on the solvent accessibility of the first two solvation layers around an atom. The reverse generalized Born radii calculated by the method have correlation coefficient of 98.7% and RMSD of 0.031\AA}1 with the values obtained using a precise but significantly slower numerical boundary element solution. The method is applied to derive an estimate of the free solvation energy difference between octanol and water and to predict LogP octanol-water. A nine-parameter model is optimized on an 81 compound training set and applied to predict LogPow for an external evaluation set of 19 drug molecules with RMSD of 0.9. The new GB approximation is also tested in Monte Carlo docking simulations of the fully flexible p53 peptide fragment to MDM2. The best energy solution found in the simulations has RMSD of 2.8\AA} to the X-ray structure.

An artificial neural network based approach using Atomic5 fragmental descriptors has been developed to predict the octanol-water partition coefficient (logP). We used a pre-selected set of organic molecules from PHYSPROP database as training and test sets for a feedforward neural network. Results demonstrate the superiority of our non-linear model over the traditional linear method.

Predictive models for octanol/water partition coefficient (logP), aqueous solubility (logS), blood-brain barrier (logBB), and human intestinal absorption (HIA) were built from a universal, generic molecular descriptor system, designed on the basis of atom type classification. The atom type classification tree was trained to optimize the logP predictions. With nine components, the final partial least-squares (PLS) model predicted logP of 10850 compounds in Starlist with a regression coefficient (r2) of 0.912, cross-validated r2 (q2) of 0.892, and root-mean-square error of estimation (RMSEE) of 0.50 log units. The PLS models for solubility (logS), blood-brain barrier (logBB), and a PLS-DA (discrimination analysis) model for HIA were established from the same atom type descriptors. The seven-component PLS model derived from a diverse set of 1478 organic compounds predicted a 21-compound test set designed by Yalkowsky with r2

The first conformer-specific experimental partition coefficients are presented for octanol/water, the most widespread solvent system to predict lipophilicity of drugs. Rotamer populations in octanol and water were elucidated from 1H NMR vicinal coupling constants and were combined with classical partition coefficients to obtain the conformer-specific ones. Feasibility of the determination of conformer-specific partition coefficients is exemplified on amphetamine and clenbuterol, two flexible drug molecules. Partition capacities of the amphetamine rotamers have been proven to be essentially equal. The conformers of clenbuterol, however, have been found to be greatly different in partition properties, which could be interpreted in terms of intramolecular interactions between the vicinal polar sites and the solvent-accessibility of the groups. The conformers could be put into order of their membrane-influx and -outflow propensities. Deviations between experimental and calculated log P values could also be interpreted in view of the species-specific partition coefficients.

We compared experimental and calculated logP values using a data set of 235 pesticides and experimental values from four different sources: The Pesticide Manual, Hansch Manual, ANPA and KowWin databases. LogP were calculated with four softwares: HyperChem, Pallas, KowWin and TOPKAT. Crossed comparison of the experimental and calculated values proved useful, especially for pesticides. These are harder to study than simpler organic compounds. Structurally they are complex, heterogeneous and similar to drugs from a chemical point of view. They offer an interesting way to verify the goodness of the different methods. Other studies compared several logP predictors using a single set of experimental values taken as a reference. Here we discuss the utility of the different logP predictors, with reference to experimental data found in different databases. This offers three advantages: (1) it avoids bias due to the assumption that one single data set is correct; (2) a given predictor can be developed on the same data set used for evaluation; (3) it takes account of experimental variability and can compare it with the predictor's variability. In our study Pallas and KowWin gave the best results for prediction, followed by TOPKAT.

QSPR models for physicochemical properties of polychlorinated diphenyl ethers.
Yang, Ping and Chen, Jingwen and Chen, Shuo and Yuan, Xing and Schramm, K-W and Kettrup, A
The Science of the total environment, 2003, 305(1-3), 65-76
PMID: 12670758
doi: 10.1016/S0048-9697(02)00467-9

Partial least squares regression together with 17 theoretical molecular structural descriptors was successfully used to develop QSPR models on sub-cooled liquid vapor pressures (P(L)), n-octanol/water partition coefficients (K(OW)) and sub-cooled liquid water solubilities (S(W,L)) of polychlorinated diphenyl ethers (PCDEs). Only a few theoretical molecular descriptors were included in the QSPR models, including average molecular polarizability, molecular weight, total energy and standard heat of formation, which implies that intermolecular dispersive forces play an important role in governing the magnitude of P(L) and K(OW). The models were tested as acceptable for prediction of P(L) and K(OW) by validation set. The consistency between observed and predicted values for P(L) is the best, followed by K(OW) and S(W,L). The Q(2)(cum) values of the PLS models obtained are higher than 0.95, indicating high robustness of the models. Since P(L), K(OW) and S(W,L) values for many PCDE congeners are not available, the developed models can be used for estimation.

Several models have been published for calculating blood-air, tissue-air, or tissue-blood partition coefficients of volatile organic chemicals in human or rat tissues, from functions of their octanol-water partition coefficients or solubilities in vegetable oil and water. In this work, the relative accuracy, strengths, and limitations of the various models are examined. Comparison of predicted human tissue-air and tissue-blood partition coefficients with experimental values has been made for 12 chemicals, covering a wide range of lipophilicity (acetone, isopropanol, diethylether, methylene dichloride, benzene, toluene, trichloroethylene, trichloroethane, n-pentane, cyclohexane, n-hexane, and n-heptane). Seven published models for human tissue-air and 10 models for tissue-blood partition coefficients have been compared. Fewer models are available for predicting rat tissue-air and rat tissue-blood partition coefficients, but a similar comparison has been made. The ratio of predicted to experimental partition coefficients and their mean, R(mean), and the mean magnitude of the difference between predicted and experimental values of log(10) P, E, were used to assess the accuracy of each model. For the test set the most accurate for human blood-air partition coefficients were the empirical equations of Meulenberg and Vijverberg (R(mean)

Correct QSAR analysis requires reliable measured or calculated logP values, being logP the most frequently utilized and most important physico-chemical parameter in such studies. Since the publication of theoretical fundamentals of logP prediction, many commercial software solutions are available. These programs are all based on experimental data of huge databases therefore the predicted logP values are mostly acceptable - especially for known structures and their derivatives. In this study we critically reviewed the published methods and compared the predictive power of commercial softwares (CLOGP, KOWWIN, SciLogP / ULTRA) to each other and to our recently developed automatic QS(P)AR program. We have selected a very diverse set of 625 known drugs (98%) and drug-like molecules with experimentally validated logP values. We have collected 78 reported ``outliers'' as well, which could not be predicted by the ``traditional'' methods. We used these data in the model buildings and validations. Finally, we used an external validation set of compounds missing from public databases. We emphasized the importance of data quality, descriptor calculation and selection, and presented a general, reliable descriptor selection and validation technique for such kind of studies. Our method is based on the strictest mathematical and statistical rules, fully automatic and after the initial settings there is no option for user intervention. Three approaches were applied: multiple linear regression, partial least squares analysis and artificial neural network. LogP predictions with a multiple linear regression model showed acceptable accuracy for new compounds therefore it can be used for ``in-silico-screening'' and / or planning virtual / combinatorial libraries.

This article provides a systematic study of several important parameters of the Associative Neural Network (ASNN), such as the number of networks in the ensemble, distance measures, neighbor functions, selection of smoothing parameters, and strategies for the user-training feature of the algorithm. The performance of the different methods is assessed with several training/test sets used to predict lipophilicity of chemical compounds. The Spearman rank-order correlation coefficient and Parzen-window regression methods provide the best performance of the algorithm. If additional user data is available, an improved prediction of lipophilicity of chemicals up to 2-5 times can be calculated when the appropriate smoothing parameters for the neural network are selected. The detected best combinations of parameters and strategies are implemented in the ALOGPS 2.1 program that is publicly available at http://www.vcclab.org/lab/alogps.

We here propose the program VEGA, that was developed to create a bridge between the most popular molecular software packages. In this tool some features are implemented some features to analyze, display and manage the three dimensional (3D) structure of the molecules. The most important features are (1) file format conversion (with assignment of the atom types and atomic charges), (2) surface calculation and (3) trajectory analysis. The executable and the source code can be free downloaded from http://users.unimi.it/∼ddl.

A new approach for predicting the lipophilicity (log P), solubility (log Sw), and oral absorption of drugs in humans (FA) is described. It is based on structural and physicochemical similarity and is realized in the software program SLIPPER-2001. Calculated and experimental values of log P, log Sw, and FA for 42 drugs were used to demonstrate the predictive power of the program. Reliable results were obtained for simple compounds, for complex chemicals, and for drugs. Thus, the principle of "similar compounds display similar properties" together with estimating incremental changes in properties by using differences in physicochemical parameters results in "structure - property " predictive models even in the absence of a precise understanding of the mechanisms involved.

Novel methods for predicting logP, pKa, and logD values have been developed using data sets (592 molecules for logP and 1029 for pKa) containing a wide range of molecular structures. An equation with three molecular properties (polarizability and partial atomic charges on nitrogen and oxygen) correlates highly with logP (r2 ) 0.89). The pKas are estimated for both acids and bases using a novel tree structured fingerprint describing the ionizing centers. The new models have been compared with existing models and also experimental measurements on test sets of common organic compounds and pharmaceutical molecules.

The molecular weight and electrotopological E-state indices were used to estimate by Artificial Neural Networks aqueous solubility for a diverse set of 1291 organic compounds. The neural network with 33-4-1 neurons provided highly predictive results with r(2)

Lipophilicity is a major determinant of pharmacokinetic and pharmacodynamic properties of drug molecules. Correspondingly, there is great interest in medicinal chemistry in developing methods of deriving the quantitative descriptor of lipophilicity, the partition coefficient P, from molecular structure. Roughly, methods for calculating log P can be divided into two major classes: Substructure approaches have in common that molecules are cut into atoms (atom contribution methods) or groups (fragmental methods); summing the single-atom or fragmental contributions (supplemented by applying correction rules in the latter case) results in the final log P. Whole molecule approaches inspect the entire molecule; they use for instance molecular lipophilicity potentials (MLP), topological indices or molecular properties to quantify log P. In this review, representative members of substructure and whole molecule approaches for calculating log P are described; their advantages and shortcomings are discussed. Finally, the predictive power of some calculation methods is compared and a scheme for classifying calculation methods is proposed.

QSARs based on molecular polarizability (a)and H-bond acceptor factors as in- dependent variables provided good predictability of octanol/water partition coefficients (€`) for chemicals and drugs. However, for some molecules containing few functional groups, the calculated values deviated significantlyfrom those observed. This approach gave good results when applied to a set of 138 chemicals and drugs previously studied by Mannhold and Dross who compared other methods to calculate log P values.
At the same time, three variations on a molecular similarity approach were pursued. In this study, a large training set with experimentally determined octanol/water partition coefficients (P)was searched for structures closely related to the compound-of-interest. The most successful of these variations took the mean log P value of few most closely related compounds after each was adjusted for differences between their and the compound-of-interest's polarizabilities ( a ) and H-bond acceptor capacities.

A new method, ALOGPS v 2.0 (http://www.lnh.unil.ch/∼itetko/logp/), for the assessment of n-octanol/water partition coefficient, log P, was developed on the basis of neural network ensemble analysis of 12 908 organic compounds available from PHYSPROP database of Syracuse Research Corporation. The atom and bond-type E-state indices as well as the number of hydrogen and non-hydrogen atoms were used to represent the molecular structures. A preliminary selection of indices was performed by multiple linear regression analysis, and 75 input parameters were chosen. Some of the parameters combined several atom-type or bond-type indices with similar physicochemical properties. The neural network ensemble training was performed by efficient partition algorithm developed by the authors. The ensemble contained 50 neural networks, and each neural network had 10 neurons in one hidden layer. The prediction ability of the developed approach was estimated using both leave-one-out (LOO) technique and training/test protocol....

2000

This study describes the development of the ACD/Log P calculation method. Analysis of 14 calculation methods revealed that the most accurate calculations are obtained when correction factors are used. We evaluated the correction factors used by Hansch and Leo in CLOGP in order to simplify their method. Most of the CLOGP structural factors are included in our fragmental increments. Aliphatic and aromatic factors are replaced with addit- ive interfragmental increments. Missing increments are estimated by two empirical equations with simple physical interpretation. The final method uses three simple equations with several types of parameters. The training set included 3601 compounds and the correlation between experimental and calculated Log P values gave R

Prediction of Properties from Simulations: Free Energies of Solvation in Hexadecane, Octanol, and Water
Duffy, Erin M and Jorgensen, William L
Journal of the American Chemical Society, 2000, 122(12), 2878-2888
doi: 10.1021/ja993663t

Monte Carlo (MC) statistical mechanics simulations have been carried out for more than 200 organic solutes, including 125 drugs and related heterocycles, in aqueous solution. The calculations were highly automated and used the OPLS-AA force field augmented with CM1P partial charges. Configurationally averaged results were obtained for a variety of physically significant quantities including the solute−water Coulomb and Lennard-Jones interaction energies, solvent-accessible surface area (SASA), and numbers of donor and acceptor hydrogen bonds. Correlations were then obtained between these descriptors and gas to liquid free energies of solvation in hexadecane, octanol, and water and octanol/water partition coefficients. Linear regressions with three or four descriptors yielded fits with correlation coefficients, r2, of 0.9 in all cases. The regression equation for log P(octanol/water) only needs four descriptors to provide an rms error of 0.55 for 200 diverse compounds, which is competitive with the best fr...

Hydrophobicity: is LogP(o/w) more than the sum of its parts?
Eugene Kellogg, G and Abraham, D J
European journal of medicinal chemistry, 2000, 35(7-8), 651-661
PMID: 10960181

The empirically calculated parameter LogP(o/w), the log(10) of the coefficient for solvent partitioning between 1-octanol and water, has been used to provide the key data for a unique non-covalent interaction force field called HINT (Hydropathic INTeractions). This experimentally-derived force field encodes entropic as well as enthalpic information and also includes some representation of solvation and desolvation energetics in biomolecular associations. The theoretical basis for the HINT model is discussed. This review includes: 1) discussion of calculational representation of the hydrophobic effect, 2) the rationale for describing the experimental LogP(o/w) based descriptors used in the HINT force field and model as free energy-like, 3) the relationship between hydrophobic fragment constants and partial group electrostatic charge, and 4) the implications of structurally-conserved water molecules on free energy of molecular association. Several recent applications of HINT in structure-based and ligand-based drug discovery are reviewed. Finally, future directions in the HINT model development are proposed.

A method for predicting log P values for a diverse set of 1870 organic molecules has been developed based on atom-type electrotopological-state (E-state) indices and neural network modeling. An extended set of E-state indices, which included specific indices with a more detailed description of amino, carbonyl, and hydroxy groups, was used in the current study. For the training set of 1754 molecules the squared correlation coefficient and root-mean-squared error were r2

QSPR models for logP and vapor pressures of organic compounds based on neural net interpretation of descriptors derived from quantum mechanical (semiempirical MO; AM1) calculations are presented. The models are cross-validated by dividing the compound set into several equal portions and training several individual multilayer feedforward neural nets (trained by the back-propagation of errors algorithm), each with a different portion as test set. The results of these nets are combined to give a mean predicted property value and a standard deviation. The performance of two models, for logP and the vapor pressure at room temperature, is analyzed, and the reliability of the predictions is tested.

We present an RP-HPLC method, for the determination of logPoct values for neutral drugs, which combines ease of operation with high accuracy and which has been shown to work for a set of 36 molecules comprised largely of drugs. The general features of the method are as follows: (i) compound sparing (e1 mL of a 30-50 $\mu$g/mL solution needed), (ii) rapid determinations (20 min on average), (iii) low sensitivity to impurities, (iv) wide lipophilicity range (6 logPoct units), (v) good accuracy, (vi) excellent reproducibility. A linear free energy relationship (LFER) analysis, based on solvation parameters, shows that the method encodes the same information obtained from a shake-flask logPoct determination. To the best of our knowledge a similar performance, on a set of noncongeneric drugs, has not been previously reported. We refer to the value generated via this method as ElogPoct.

A new atom-additive method is presented for calculating octanol/water partition coefficient (log P) of organic compounds. The method, XLOGP v2.0, gives log P values by summing the contributions of component atoms and correction factors. Altogether 90 atom types are used to classify carbon, nitrogen, oxygen, sulfur, phosphorus and halogen atoms, and 10 correction factors are used for some special substructures. The contributions of each atom type and correction factor are derived by multivariate regression analysis of 1853 organic compounds with known experimental log P values. The correlation coefficient (r) for fitting the whole set is 0.973 and the standard deviation (s) is 0.349 log units. Comparison of various log P calculation procedures demonstrates that our method gives much better results than other atom-additive approaches and is at least comparable to fragmental approaches. Because of the simple methodology, the `missing fragment' problem does not occur in our method.

An atom/fragment contribution method that predicts log P is described. Coef- ficient values for 150 atom/fragments and 250 correction factors have been derived from a training set of 2473 compounds. When applied to an independent validation set of 10 589 compounds, the method estimates log P with excellent accuracy (correlation coefficient r2 of 0.943, standard deviation of 0.473, and absolute mean error of 0.354). A method that predicts water solubility from log P is also described.

Calculating log P (oct) with no missing fragments; The problem of estimating new interaction parameters
Leo, A J and Hoekman, D
Perspectives in drug discovery and design, 2000, 18(1), 19-38

The solvation forces which determine the equilibrium of a solute between water and a non-polar solvent, such as octanol, cannot be assigned on an atom-by-atom basis in the solute structure. The program CLOGP defines the hydrophobic hydrocarbon portions of any structure in such a way that the remaining polar fragments are unambiguously defined and of a manageable size. Early versions required that each polar fragment thus defined be present in a measured solute before it could be used in calculations of log P(oct), but in versions 4.0 or greater, these can be calculated ab initio; i.e. `from scratch'. An equally important step in calculating log P(oct) for solutes with unmeasured fragments is estimating their propensity for electronic, steric, and/or hydrophobic interactions with other polar fragments which may also be present. The combined error of estimation of a new fragment value and its interaction with others appears to be less than $\pm$ 0.5.

The aim of this study was to determine the efficacy of atom-type electrotopological state indices for estimation of the octanol-water partition coefficient (log P) values in a set of 345 drug compounds or related complex chemical structures. Multilinear regression analysis and artificial neural networks were used to construct models based on molecular weights and atom-type electrotopological state indices. Both multilinear regression and artificial neural networks provide reliable log P estimations. For the same set of parameters, application of neural networks provided better prediction ability for training and test sets. The present study indicates that atom-type electrotopological state indices offer valuable parameters for fast evaluation of octanol-water partition coefficients that can be applied to screen large databases of chemical compounds, such as combinatorial libraries.

In an earlier study Quant. Struct.-Act. Relat. 1996, 15, 403-409 comparing the performance of 14 logP predictors it was concluded that predictions of logP values were significantly better for simple organic molecules than for drugs. Since the publication of this benchmark study, a logP predictor, VLOGP, has been developed in our group. In the work presented here, VLOGP is used to assess the logP values of the same 48 drugs as included in the benchmark comparison. VLOGP returned 79.2% "acceptable", 18.6% "disputable", and only 2.2% "unacceptable" logP values. the "acceptable", "disputable", and "unacceptable" logP values from the 14 other predictors, respectively, ranged between 27.1% and 72.9%, 16.7% and 41.7%, and 2.2% and 37.5%. Further, VLOGP resulted in a much tighter fit (mean squared deviation, m.s.d.,

A new addition method is described in this study for calculating the partition coefficients of peptides. LogP and logD values of peptides are calculated by summing the contributions of the compo- nent amino acids. The final models are derived from a multivariate linear regression analysis of 219 peptides with known experimental data. The standard errors in a leave-one-out cross-validation are 0.23 and 0.24 log units for the logP and logD values, respectively. The predictive ability of the model is tested by an extra set of ten peptides, and the self-consistency of the model is further demonstrated by a new validation procedure called the evolution test. The parameters obtained in regression could be used as hydrophobicity scales for amino acids. The application of such hydrophobicity scales has also been discussed.

Octanol-water partition coefficients are frequently used in quantitative structure-activity relationships. A correlation based on computed theoretical descriptors is presented for the prediction of octanol-water partition coefficients (Pow). An ab initio SCF approach was used to compute the molecular descriptors at the HF/6-31G* level. It was shown that only three theoretical parameters representing a cavity term, a dipolarity/polarisability term and a hydrogen bonding term were needed for the correlation. The corresponding parameters were deduced from the molecular surface area, the surface electrostatic potential and the spatial minima of the electrostatic potential, respectively. The predictive power of log Pow was demonstrated on a number of molecules which have biological activity.

Prediction of the n-Octanol/Water Partition Coefficient, logP, Using a Combination of Semiempirical MO-Calculations and a Neural Network
Breindl, Andreas and Beck, Bernd and Clark, Timothy and Glen, Robert C
Journal of Molecular Modeling, 1997, 3(3), 142-155
doi: 10.1007/s008940050027

A back-propagation artificial neural net has been trained to estimate logP values of a large range of organic molecules from the results of AM1 and PM3 semiempirical MO calculations. The input descriptors include molecular properties such as electrostatic potentials, total dipole moments, mean polarizabilities, surfaces, vol- umes and charges derived from semiempirical calculated gas phase geometries. These properties can be related to the molecule's solubility in hydrophilic or lipophilic media. The input descriptors were selected with the help of a multiple linear regression analysis. The resulting net estimates the logP values of 105 organic compounds with a standard deviation of 0.53 units from the experimental logP values for AM1 and 0.67 units in the case of PM3.

A new method is presented for the calculation of partition coefficients of solutes in octanol/water. Our algorithm, XLOGP, is based on the summation of atomic contributions and includes correction factors for some intramolecular interactions. Using this method, we calculate the log P of 1831 organic compounds andanalyzethederivedparametersbymultivariateregressiontogeneratethefinalmodel. Thecorrelation coefficient for fitting this training database is 0.968, and the standard deviation is 0.37. The result shows that our method for log P estimation is applicable to quantitative structure-activity relationship studies and gives better results than other more complicated atom-additive methods.

Octanol-water partition coefficients are pre- dicted by building local models in a database. The struc- ture space is spanned by uniform-length molecular de- scriptors derived from connectivity or three-dimensional structure, and from the occurrence of selected substruc- tures. Individual models are derived for each structure cluster. In this contribution, various structure representa- tions, clustering algorithms and mathematical models are investigated and the most appropriate procedure is selected.

1995

Atom/fragment contribution values, used to estimate the log octanol-water partition coefficient (log P) of organic compounds, have been determined for 130 simple chemical substructures by a multiple linear regression of 1120 compounds with measured log P values. An additional 1231 compounds were used to determine 235 "correction factors" for various substructure orientations. The log P of a compound is estimated by simply summing all atom/fragment contribution values and correction factors occurring in a chemical structure. For the 2351 compound training set, the correlation coefficient (r2) for the estimated vs measured log P values is 0.98 with a standard deviation (SD) of 0.22 and an absolute mean error (ME) of 0.16 log units. This atom/fragment contribution (AFC) method was then tested on a separate validation set of 6055 measured log P values that were not used to derive the methodology and yielded an r2 of 0.943, an SD of 0.408, and an ME of 0.31. The method is able to predict log P within +/- 0.8 log units for over 96% of the experimental dataset of 8406 compounds. Because of the simple atom/fragment methodology, "missing fragments" (a problem encountered in other methods) do not occur in the AFC method. Statistically, it is superior to other comprehensive estimation methods.

Group contribution methods to estimate water solubility have been studied on the basis of a test set of 694 organic nonelectrolytes, consisting of 351 liquids and 343 solids with experimental data taken from literature after critical evaluation. Derivation of a new fragmentation scheme leads to a squared correlation coefftcient of 0.95 and an average absolute calculation error of 0.38 log units, which is superior to other group contribution methods currently available. Differences in performance for individual classes of compounds are discussed in detail. Solubility prediction for liquids is generally better than for solids, and the results support the inclusion of a melting point term to account for the entropy of fusion of solids.

1994

SmilogP: A Program for a Fast Evaluation of Theoretical Log P from the Smiles Code of a Molecule
Convard, Thierry and Dubost, Jean-Pierre and Le Solleu, H and Kummer, E
Qsar & Combinatorial Science, 1994, 13(1), 34-37
doi: 10.1002/qsar.19940130107

We present here a software that generates an extended connectivity matrix from the SMILES code of a molecule. This extended connectivity matrix allows the determination of the atomic code for an atomic fragment and then the attribution of its lipophilicity contribution fi. Then, log P can be easily computed by summing the fi values. This program which runs on IBM PC or compatible systems, can be used by chemists or pharmaco chemists interested in the fast evaluation of the lipophilicity of a series of molecules.

1990

A recent concept connecting the lipophilicity of organic chemicals with their genotoxicity on a chromo- somal level implies that the lipophilic character of organic chemicals determines a certain background of chromo- somal genotoxicity that can be addressed as ``non-speciWc''. This is opposed to compounds with more ``speciWc'' modes of action. Such mechanisms inXuence the processes of karyokinesis and cytokinesis. A critical partial process for the chromosomal segregation is the dynamics of assembly and disassembly of microtubules. To broaden the present database for such interactions, chemicals were selected based on their lipophilicity (log P between!'}1.5 and +1.0) and on hints from the literature pointing to possibilities of interaction with the tubulin-microtubule system. Thus, acetamide, acrylamide, methylmethane sulfonate, acetoni- trile, acrylonitrile and cyclohexanone were assessed as to their potencies to inXuence the dynamic processes of microtubule assembly and disassembly in a cell-free system in vitro. These compounds covered a range of log P between!'}1.5 and 1.0, complementary to compounds investigated earlier. The entire body of data supports the general concept that hydrophobic interactions are con- nected with non-speciWc processes, which contribute to a background genotoxicity on a chromosomal level. It also points to the dynamics of microtubule assembly and disas- sembly as a decisive partial process involved.

In an earlier article8 the need was demonstrated for atomic physicochemical properties for three dimensional structure directed quantitative structure-activity relationships, and it was shown how atomic parameters can be developed for successfully evaluating the molecular octanol-water partition coefficient, which is a measure of hydrophobicity. In this work we report more refined atomic values of octanol-water partition coefficients derived from nearly twice the number of compounds. Carbon, hydrogen, oxygen, nitrogen, sulfur and halogens are divided into 110 atom types of which 94 atomic values are evaluated from 830 molecules by least squares. These values gave a standard deviation of 0.470 and a correlation coefficient of 0.931. These parameters predicted the octanol-water partition coefficient of 125 compounds with a standard deviation of 0.520 and a correlation coefficient of 0.870. There is only a correlation coefficient of 0.432 between the atomic octanol-water partition coefficients and the atomic contributions to molar refractivity over the 93 atom types used for both the properties. This suggests that both parameters can be used simultaneously to model intermolecular interactions. We evaluated the CNDO/2 gross atomic charge distribution over several molecules to check the validity of our classification. We found that the charge density on the heteroatoms in conjugated systems is strongly affected by the presence of similar atoms in the conjugation which suggests it should be incorporated as a separate parameter in evaluating the partition coefficient.