COMP 273

Training pKa and logP prediction

pKa and logP prediction methods are based only on a limited number of molecule types in the training set. The accuracy of these models is not always satisfactory. Practically in most cases only those types of structures will be predicted correctly which were present in the training set. We decided to develop a training method for the pKa and the logP calculations to allow users to build models relevant for their structures.

The identification of acidic and basic ionization centers is defined in our default pKa prediction modul. 120 predefined atom types are implemented in the logP prediction model. The learning algorithm is based on a linear regression method called as Single Value Decomposition (SVD). The training set, a collection of experimental pKa or logP values, should be provided by the user. The collected data should be imported as an SDF or MRV file, which can be compiled for example using Instant JChem.

The training algorithm of pKa prediction creates a correction library containing correction values for interacting functional groups. In the case of logP prediction, a full set of atomic contributions is calculated.