Abstract:

A key problem in QSAR is the selection of appropriate descriptors to form accurate regression equations for the compounds under study. Inductive logic programming (ILP) algorithms are a class of machine-learning algorithms that have been successfully applied to a number of SAR problems. Unlike other QSAR methods, which use attributes to describe chemical structure, ILP uses relations. This gives ILP the advantages of not requiring explicit superimposition of individual compounds in a dataset, of dealing naturally with multiple conformations, and of using a language much closer to that used normally by chemists. We unify ILP and standard regression techniques to give a QSAR method that has the strength of ILP at describing steric structure with the familiarity and power of regression methods. Complex pharmacophores, correlating with activity, were identified and used as new indicator variables, along with the comparative molecular field analysis (CoMFA) prediction, to form predictive regression equations. We compared the formation of 3D-QSARs using standard CoMFA with the use of ILP on the well-studied thermolysin zinc protease inhibitor dataset and a glycogen phosphorylase inhibitor dataset. In each case the addition of ILP variables produced statistically better results (P <0.01 for thermolysin and P <0.05 for GP datasets) than the CoMFA analysis. Moreover, the new ILP variables were not found to increase the complexity of the final QSAR equations and gave possible insight into the binding mechanism of the ligand−protein complex under study.