On this blog I regularly publish articles with tips and tricks for the programming language C# .Net.
C# is a modern, object-oriented programming language, which fully ultilizes the possibilites of the .Net framework. I also write about app programming for Android via C#, as well as PHP and Matlab. The difficulty of this blog is supposed to be variable, I hope for beginners and experts there is always something to look for.
If you have questions or suggestions, I am happy about your emails.

Tuesday, July 19, 2016

Genetic Algorithms with C#

In this post I want to present so called genetic algorithms, which belong to the larger class of evolutionary algorithms. Of course we will arrive at a complete, ready-to-use C# implementation in the end, but this post is also a step in the direction to broaden the topics of this blog, for example also including theoretical computer science.

What are genetic algorithms? Genetic algorithms are stochastical search heuristics which mimic the process of natural selection. For finding a solution they use a group of individuals, which all represent single solutions of the problem. In every round of the algorithm via certain operators from the old group a new population is created.
These operators are:

Survival of the fittest (the best individuals from the previous round are taken in the next one)

Crossover (two individuals of the previous round take the role of "parents" of an individual of the new population)

Mutation (an individual mutates random properties)

We hope, that the individuals of the populations get "fitter" over time, and after several rounds we get a good individual, and thus a good solution.

Genetic algorithms are or can be used to solve complex problems, usually NP-hard ones, for which the search space is too big to completely traverse it and one generally has little clues about how to solve the problem. Genetic algorithms are a good general approach to solve problems of all kinds.

But in this generality lies also a weakness of this heuristic: For many problems, also hard ones, there are very good problem specific solving approaches with which genetic algorithms cannot compute. Further it is not clear, how the idea of two parents translates to the algorithmic principle. Procedures like Simulated Annealing inherently use one parent, when staying to the analogy, and thus might make more sense.

Readers interested in this wide topic are referred to the existing literature. Here I now want to continue with the implementation of the genetic algorithm and solve instances of the Travelling Salesman Problem (TSP) with it.
The Travelling Salesman Problem is a relatively well-known NP-hard problem. It is about finding a cheap round trip between a set of cities - that means, every city has to be visited exactly once, and the salesman has to return to his starting city in the end. With n cities there are (n - 1)! possible orders to visit them. This is an incredibly fast growing function, for example for 10 cities we get 360.000 combinations, for 15 cities there are already 87 billion combinations and for 70 cities more than there are atoms in the entire universe.
Thus in principle genetic algorithms seem like a good candidate to get a relative good approximate solution. But on the other hand there are very good approximation algorithms for this problem, so think of the introductory words.

We manage a TSP instance in the class TSPInstance. For n cities the n x n integer matrix Connections is created, in which entry (i, j) describes the distance from city i to j. If there is no connection between these cities, the entry is set to infinity (Integer.MAX_VALUE).
Such an instance then is passed to the genetic algorithm. Starting function is the function Go() of the class GeneticAlgorithm.
In this we see the basic flow of the algorithm: First we initialize the starting population. Then we run for a certain number of iterations, in which we call the function Breed(), which creates the new population out of the previous one.
We represent a solution of the TSP instance simply as a permutation of the cities, which describes the order in which these are visited. So ADBC for example stands for the route A-D-B-C-A. In the program the cities are numbered, so we use an integer array, the content would then for example be 0, 3, 1, 2. The "fitness" of an individual are the costs of the tour (infinity if the tour is not possible).

At the beginning of the algorithm we create an array of individuals and fill this with random tours, so random permutations of the city. The size of the array corresponds to the size of the population. As a size we take 30. This parameter, as the following ones, has a huge impact on the quality of the solution. For every problem different parameters should be tried. For the population size a size of 30 is often used.

In the function Breed() we first iterate over the current population and calculate the fitness of the individuals. Then we sort them by ascending fitness.
This way we can simply take the best individuals from the previous population into the new one. The percentage of these is expressed in percent, here we use 10, which means that the 3 best individuals of the previous population are taken into the new one.
We then have to fill the remaining 27 spots in the new population, which we do via crossover and mutation.

When iterating over the old population we create the array SummedFitness, in which cumulated fitness values are saved. So at position i the sum of the fitness values of the individuals 0 - i is saved. With this we can do a so called fitness proportionate selection, in which individuals with a higer fitness are chosen preferably as parents. For this we create a random number between 0 and the last value of SummedFitness, the individual, in whose interval the number falls, is selected.
This way we choose two parents. Now we again let a random coin decide whether we want to do a crossover operation. I here chose a crossover probability of 80 %. If no crossover is done, the child is simply a clone of parent 1. If a crossover is done, the child becomes a genetic combination of both parents. For mixing the genomes there are different possibilites. Here we use the following approach: We randomly choose a crossover point X between 0 and n - 1.
In the child we then select the first X - 1 cities from the tour directly from parent 1. The remaining cities are ordered in the order how they occur in parent 2.
As an example let us consider the parents AGEDCBF and FDAGEBC. As a crossover point we choose 3. The child thus starts its tour with the cities AGE. Remaining are the cities D, C, B and F, which are ordered like in parent 2: FDBC. Thus we get as the complete tour for the child: AGEFDBC. Again let me say, this is just one possible way to do a crossover, for example there is also the possibility to split the string in 2 parts (Two-point crossover).

In the last step the child is randomly mutated. Also for this there are different possibilities, we use the following one: We iterate over every element of the tour and mutate with a probability of 10 %. If the mutation rate is too low, the algorithm cannot create enough good solution candidates and gets stuck in a local minimum, if it is too high, the algorithm becomes random search. If we mutate, we randomly select another city in the tour and switch the two.

With this, we created, based on the old population, a new child. We repeat this process until there are again 30 individuals in the new population.
Since the best individuals remain in the populations, the fitness of the best individual never decreases and hopefully increases over time. In the end we output the best solution.

Here the code of a console application, I think this should be relatively good readable with the above explanation: