Generalisation and Domain Adaptation in GP with Gradient Descent for Symbolic Regression

@InProceedings{Chen:2015:CEC,
author = "Qi Chen and Bing Xue and Mengjie Zhang",
title = "Generalisation and Domain Adaptation in GP with
Gradient Descent for Symbolic Regression",
booktitle = "Proceedings of 2015 IEEE Congress on Evolutionary
Computation (CEC 2015)",
year = "2015",
editor = "Yadahiko Murata",
pages = "1137--1144",
address = "Sendai, Japan",
month = "25-28 " # may,
publisher = "IEEE Press",
keywords = "genetic algorithms, genetic programming",
DOI = "doi:10.1109/CEC.2015.7257017",
abstract = "Genetic programming (GP) has been widely applied to
symbolic regression problems and achieved good success.
Gradient descent has also been used in GP as a
complementary search to the genetic beam search to
further improve symbolic regression performance.
However, most existing GP approaches with gradient
descent (GPGD) to symbolic regression have only been
tested on the conventional symbolic regression problems
such as benchmark function approximations and
engineering practical problems with a single (training)
data set only and the effectiveness on unseen data sets
in the same domain and in different domains has not
been fully investigated. This paper designs a series of
experiment objectives to investigate the effectiveness
and efficiency of GPGD with various settings for a set
of symbolic regression problems applied to unseen data
in the same domain and adapted to other domains. The
results suggest that the existing GPGD method applying
gradient descent to all evolved program trees three
times at every generation can perform very well on the
training set itself, but cannot generalise well on the
unseen data set in the same domain and cannot be
adapted to unseen data in an extended domain. Applying
gradient descent to the best program in the final
generation of GP can also improve the performance over
the standard GP method and can generalise well on
unseen data for some of the tasks in the same domain,
but perform poorly on the unseen data in an extended
domain. Applying gradient descent to the top 20percent
programs in the population can generalise reasonably
well on the unseen data in not only the same domain but
also in an extended domain.",
notes = "1105 hrs 15342 CEC2015",
}