Project musings

Archive for the ‘Genetic Algorithms’ Category

In Part 1 we built a basic genetic solver that used mutation to solve problems. In this part we’re going to tackle a slightly more complex problem, the 8 Queens Puzzle, and then expand the solver as necessary.

The 8 Queens Puzzle involves putting 8 queens on a standard chess board such that none are under attack. According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

To start with we need to define a genome. The chess board conveniently has the same number of rows as columns (8) so we’ll use the digits 1-8 for our genes.

geneset = '12345678'

Array indexes are zero based however, so we’ll need to convert gene symbols to row and column values. To do that we’ll find the gene’s index into the set of gene symbols then use that index as a row or column, and combine those to make a Point.

The row of digits under the board is the set of genes that created the board layout. The number to the right will be the fitness, a measure of how close this set of genes is to the desired result. To drive improvement we’ll want to increase the fitness value whenever the related board position lets more queens coexist on the board. Let’s think about how we can do that.

We’ll start with counting the number of columns that have a queen. Here’s a layout that gets an optimal score but is undesirable

After several test runs we find that most of the time it gets close but can’t get all the way to the optimal value of 32. We need to enhance the solver’s capabilities for it to be able to handle this problem.

If at first you don’t succeed, try, try again!

We’re going to do that by introducing a second genetic line. We’ll mutate the 2nd line as long as it is improving. If it ends up with a better fitness than bestParent then it will become the bestParent. Otherwise, we’ll start a new genetic line again with a random gene sequence. We repeat this process over and over until we find an optimal result. Here’s the updated solver loop:

However, when we run the string duplication test from Part 1 it now struggles to find a solution. This is because we ignore our best line completely once we find it and only try to improve by mutating a new genetic line.

We need a way to continue to take advantage of the genes in bestParent. One way nature handles this is through crossbreeding so we’ll introduce a crossover strategy where we take one random gene from a 2nd (the best) parent.

Now when we run the tests both are able to achieve optimal results every time.

Refactor

Now that everything works we’re going to do some code hygiene. In solving this problem we used genes “12345678” but it would have been more convenient, and faster, if we could have used raw integers 0-7. So let’s make that change. I’ll show selected changes below but you can get the full set from github.

Like this:

The goal of this, my first program in Python, is to reproduce a target string (like Hello World!) without looking directly at it. I’ll do this with a simple genetic algorithm that randomly generates an initial sequence of characters and then mutates one random character in that sequence at a time until it matches the target. Think of this like playing a Hangman variant where you pass a letter sequence to the person who knows the target word, and the only feedback you get is how many of your letters are correct. It is also reminiscent of the game Hotter Colder, except we’re doing it with code.

We start off with a standard set of letters for genes and a target string:

geneset = " abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!."
target = "Not all those who wander are lost."

Next we need a way to generate a random gene sequence from the gene set.

There are many ways to calculate a fitness value (how close the guess is to the target) for the generated string. For this particular problem we’ll simply count the number of characters that are the same between the candidate string and the target string.

We also need a way to produce a new child gene sequence by mutating the existing (parent) string. The point is to create a copy of the parent then replace 1 letter/gene in the copy with a randomly selected one from the set of all possible genes/letters.

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

Refactor

Good, it works. Now we need to separate the solver code from that specific to the string duplication problem so we can use it to solve other problems. We’ll start by moving our main loop into a function called getBest. That function’s parameters will include the functions it should call to get the candidate’s fitness and to display a new best sequence.

Like this:

In Part 2 of this series we converted our code to a library, made our initial puzzle into an integration test, and extracted test related parameters and methods from the library.

Now we’re ready to try a new puzzle. This time we’ll expand our solver to handle a slightly more difficult problem – the 8 Queens puzzle.

In the 8 Queens puzzle we wan’t to place 8 chess Queens on a Chess board such that none of them are under attack.

According to WikiPedia there are only 92 solutions to this puzzle and once we remove mirrorings and rotations there are only 12 unique solutions.

We start off by figuring out how we’re going to map genes to the problem. One solution that I’ve used before is to assign each square on the 8×8 Chess board a symbol from the 64 symbol set ([a-z][A-Z][0-9]@#) as follows:

We need to be able to convert a symbol (gene) to a board position. To do that we’ll find its index in the set of genes then convert that index to a row and column, or Point.

To count the number of diagonals that have exactly one Queen we’ll introduce a generator that creates Points starting from an initial position and then moving by a given row and column offset. First the generator and some tests for it.

But the odds are against it… That’s because the Mutation genetic strategy can’t always solve this problem. For our solver to be able to find a solution every time we’re going to have to introduce a new strategy. That is the subject of Part 4.

The source code to this point is available on Github if you want to experiment.

It is nice that map/reduce type functions are available out of the box, like in C#.

We also need a way to produce a new (child) string by mutating an existing (parent) string. The point is to create a copy of the parent then replace 1 character (gene) in the copy with a randomly selected one from the set of valid genes.

The heart of the genetic solver is a loop that uses the functions above to generate a candidate gene sequence, compare it to the previous best, and randomly mutate it until all the genes match those in the target.

Yes, I know we could just call the functions instead of passing them in but this demonstrates that capability in the language and it is a feature we will need in order to use different fitness functions and display methods in a more complex solver.

Like this:

In this project we’ll be solving a variant of John Koza‘s Lawnmower Problem. The previous projects in this series successively drove out the functionality of a genetic solver capable of handling this kind of problem. This project raises our understanding of genetic algorithms and their application to problem solving to a whole new level.

The Lawnmower Problem asks us to provide instructions to a lawnmower to make it mow a field of grass. The field wraps in all directions – if the mower goes off the top it ends up at the bottom and vice versa and the same side-to-side. Let’s say the mower begins in the middle of an 8×8 field facing south.

Next let’s say the available instructions are: mow and turn. mow moves the mower forward one grid square in whatever direction it is facing and cuts the grass in that square. turn causes the mower to turn left 90 degrees.

Simple enough right? Using our previous experience in this series we know we can define a gene for each instruction and use hill climbing to find an optimal solution. I leave this as an exercise to the reader – check out the previous post if you need a refresher.

Now let’s expand the available instruction set because all we ever get is spiral shaped mowing patterns. The new instruction is: jump. jump has 2 non-negative arguments, forward and right. The mower will jump forward and to the right the specified number of squares and cut the grass where it lands.

To implement this simply add jump to the set of genes that require special handling and treat the two genes that follow it as the arguments. If less than 2 genes follow it because it is at or near the end of the gene sequence then fill the missing arguments with zeros. Again, I leave the implementation as an exercise. Note: also make your implementation of getFitness() prefer shorter sequences if and only if the gene sequence mows the entire field.

Now back the Koza’s purpose for this problem. As interesting as this solution is, the sequence of instructions generated by the solver are completely different from the solution a human would use. Now think about how you’d tell another person to mow a toroidal field. You wouldn’t give them detailed instructions for every step right? You’d break it down into a set of repeatable sequences. In a non-toroid field you might say something like: start at the corner of the field and mow a strip along the edge of the field all the way to the other side.

Turn around again and repeat the process until you’ve mowed the whole field.

You automatically combine squares into strips and trips across-and-back into a repeatable pattern. How do we do that with the mower?

The best result we’ve generated so far requires 64 jump and mow instructions, one for each grid square, to tell the mower how to cut the grass in the field. How can we make it look more like the instructions you’d give a human? We have to introduce the ability to repeat a sequence of instructions by reference and make this an instruction too.

This is where things get interesting. We’re going from using the genes of the genetic sequence more-or-less one-to-one to solve a problem, to genetic programming.

Implementation-wise this means we need two more special genes: begin-numbered-reference and call-numbered-reference. begin-numbered-reference will increment an id and start a new instruction sequence, or block, if and only if the current block is non-empty. call-numbered-sequence, or more simply call, will take a parameter for the id of the sequence to execute. Once that sequence has completed, execution will return to the sequence that made the call – exactly like calling a subroutine.

Here’s a Go implementation of a gene decoder that builds a program for the mower as described above.

Note: Koza’s implementation prevents recursion. This implementation allows recursion but it isn’t difficult to modify it to work Koza’s way. I just find the recursive results more interesting.

Now when we move to from one-to-one gene evaluation to running a program, determining fitness becomes a problem in its own right. On the face it is the same: perform the instructions, determine how many field positions were mowed, switch evaluation strategies to program length if all field squares have been mowed. The problem is we need to handle the flow control involved in fetching subroutines, running them, and returning to the previous location upon completion. We also need the ability to track the number of instructions executed and optionally exit if we run beyond a pre-determined maximum – this prevents infinite loops from blocking evolution in the solver.

Like this:

The goal of knapsack problems is to put as much stuff into a container as it will hold, optimizing for constraints such as item weight and size and value. The standard Knapsack Problem adds the limitation that there is only one of each particular item available. In the unbounded variant of the Knapsack Problem there is no limit. This project solves the Unbounded Knapsack Problem (UKP) by improving upon the genetic solver used in the previous post. The code is written in Go.

The knapsack contents cannot weigh more than 25 units and its maximum volume is 0.25 units. Our goal is to maximize the value of the contents of the knapsack without exceeding either of the weight or volume limits.

Let’s think about how we would solve this problem by hand. We want to maximize the value within the constraints. So we want a high ratio of value to weight and value to volume. And we want as many of those in the bag as we can get. When we can’t stuff any more of the top item into the bag, we fill in the remaining space with the next most valuable item, and so on. This process is known as hill climbing.

This project uses genes in a new way – as indexes and counts. Each chromosome will have two parts. The first will represent an index into the list of available resources. The second will be the quantity of that resource to take. The candidate gene sequence will be decoded as follows:

Excellent. A success with a sub-second run time. Also, that result matches one of the four optimal combinations for this problem from the RosettaCode page.

Now, just as there are standard problem sets for the Travelling Salesperson Problem, there are also standard problem sets available for the Unbounded Knapsack Problem. One such set, named exnsd16, comes from the PYAsUKP site.

Let’s use the structure of the code above as a guide to implementing a solver for standard problem sets.

First, the resource data must be imported from the file. The file has the following format:

A chromosome will still have two parts, the index and count, but this time it will need more genes because the problem we’re solving has 2000 possible resources and the quantities may need to go as high as 160.

If we use hexadecimal values for the geneSet we’ll need 3 genes for the resource index and 2 for the quantity.

It doesn’t achieve an optimal solution every time but it does find a solution that’s better than 99 percent optimal almost every time. Here’s the fitness results across 100 runs with 5 seconds to run and a maximum of 3 rounds without improvement:

Like this:

This Go project uses the genetic solver from my previous post to evolve a regular expression that matches all items in a set of wanted strings without matching any items in the set of unwanted strings. For example:

wanted := []string{"00", "01", "10"}
unwanted := []string{"11"}

We desire a regular expression that finds strings from wanted without finding those from unwanted. We’ll start by determining the unique set of characters in the wanted strings: