PLSR statistical analysis module
performs model construction and prediction of activity/property using the Partial
Least Squares (PLS) regression technique [1-3]. It is based on linear transition
from a large number of original descriptors to a small number of orthogonal
factors (latent variables) providing the optimal linear model in terms of
predictivity (characterized by the Q2 value). More detailed explanation of method and algorithms is available.

It is well known that Partial Least
Squares (PLS) regression is quite sensitive to the noise created by the excessive
irrelevant descriptors. To achieve the best model quality, two-step descriptor
selection procedure [4] is applied. The first step consists in the elimination of the
low-variable (almost constant) descriptors that are different from a constant
only for a few (2-3) compounds in the training set. Such descriptors cannot
provide useful statistical information and simply help to fit these particular
compounds, thus decreasing the predictivity. At the second step, the descriptor
subset is optimized using Q2-guided descriptor selection by
means of a genetic algorithm. Despite the stochastic nature of this technique,
computational experiments demonstrate reasonable stability of the results.

The same code base is successfully
employed in software implementing the Molecular Field Topology Analysis (MFTA)
technique proposed by us [5] for QSAR studies of organic compounds.

This software was developed by E.V. Radchenko, V.A. Palyulin and N.S. Zefirov, Department of Chemistry, Moscow State University,
Moscow 119992 Russia. The data input format is described here.