Abstract—In this contribution a multi-objective genetic programming algorithm (MOGP) is used to perform symbolic regression. The genetic programming (GP) algorithm used is specifically designed to evolve mathematical models of predictor response data that are “multigene” in nature, i.e.linear combinations of low order non-linear transformations of the input variables. The MOGP algorithm simultaneously optimizes the dual (and competing) objectives of maximization of ‘goodness-of-fit’ to data and minimization of model complexity in order to develop parsimonious data based symbolic models. The functionality of the multigene MOGP algorithm is demonstrated by using it to generate an accurate,compact QSAR (quantitative structure activity relationship)model of existing toxicity data in order to predict the toxicity of chemical compounds.

Charles Hii is with the School of Chemical Engineering and Advanced Materials at the University of Newcastle, Newcastle-upon-Tyne, UK.
Dominic P. Searson is with the School of Chemical Engineering and Advanced Materials at the University of Newcastle, Newcastle-upon-Tyne,UK. (e-mail: d.p.searson@ncl.ac.uk).
Mark J. Willis is with the School of Chemical Engineering and Advanced Materials at the University of Newcastle, Newcastle-upon-Tyne, UK. (phone+44 191 222 7242; e-mail mark.willis@ncl.ac.uk).