Look into the future with genetic programming

With predictive modeling techniques, it is possible to predict anything from clients’ shopping habits and illnesses to a golfer’s handicap. The only prerequisite is to have enough examples. In a doctoral thesis from the University of Borås in Sweden, Rikard König has adapted the technique of genetic programming so it can be used for such purposes.

The doctoral thesis, Enhancing Genetic Programming for PredictiveModeling, is about machine learning, more specifically predictive modeling, a field of computer science. Machine learning entails getting a computer to learn something, to become intelligent. Predictive modeling is a broad area of machine learning where a computer learns things on the basis of positive and negative examples, finds connections and explains why things turn out in a certain way.

Within predictive modeling, there is an array of techniques that are used to produce models that can predict practically anything, for instance, how people might be expected to respond to advertisements. Since these are general techniques, it is possible to predict just about anything as long as there are enough previous examples, i.e. sufficient information. The goal of predictive modeling is to find an accurate model and preferably one that explains something that was not previously known.

Genetic programming (GP) is a general optimization technique that is based on Darwin’s theories on evolution and natural selection. It is a technique that was not really designed for predictive modelling.

”In my thesis, I present several improvements that increase the accuracy and comprehensibility of models created with GP. There are many researchers who work with GP but my solutions are unique,” says Rikard König, PhD student at the School of Business and IT at the University of Borås.

In order to produce a model with the help of GP, you start off with, say, a thousand randomly chosen models and let them compete with each other. You work out how many errors the models make on known examples and then base a natural selection on the results. The most accurate models have a greater chance of surviving and having “children” – you pair off two models. These “children” are then a combination of their parents and form a new generation which is hopefully stronger. A small number of models can also be subjected to mutation, just like in nature.

”The new generation is assessed in the same way, using the known examples. They compete, pair off and give rise to an even stronger new generation. The process is repeated until a sufficiently accurate model has been found. The fascinating thing is that evolution is such a powerful way of searching through all possible solutions,” says Rikard König.

GP has several properties that make it suitable for predictive modeling. One example is that the search is independent of the representation of the model. This means that the exact representation and way of measuring errors can be adapted to individual problems. This is not normally the case with traditional predictive techniques. At the same time, the technique is problematic when a highly complex model is needed since the search goes through all possible solutions and the number of solutions increases exponentially with the complexity of the models.

”One of my improvements is a hybrid technique for creating an accurate and comprehensible model when the search space is extremely large, i.e. when a model with high complexity is required. The solution is to send relatively strong models created by a traditional predictive technique into a generation to guide the search in a promising direction.”

As part of his research, Rikard König has also produced an application that realizes his research results. The programme can be downloaded from www.grex.se

Rikard König is working on several research projects where these solutions may be put to use. For instance, one project is in collaboration with Scania where data from tens of thousands of lorries have been saved and will be analysed in order to explain what effect the driver has on fuel consumption. Another example, which also shows how generic the technique is, is a new project where golf swings from 500 golfers will be analysed. Here, the aim is to find general explanations for what distinguishes good swings from bad swings. Another aim is to be able to automatically recommend exercises for individual golfers on the basis of each person’s particular needs.