Coding a Tetris AI using a Genetic Algorithm

About two years ago, when I was in grade 9, I decided to make a tetris clone in Java. One or two months later, I had a fully working and playable tetris, complete with background and sound effects and scoring and graphical effects when a line is cleared.

A year after I created it, I decided to go ahead and write an AI to play my game. Although it played much better than I could (I’m not a particularly good tetris player), it would still die after a few dozen lines when the number of columns is 10. This means it was pretty outclassed by existing AI’s that could go on for thousands of lines!

Now, another year later, I coupled my previous AI algorithm with a genetic algorithm, with some pretty neat results:

Rules of the Game

How does one make a computer program to play tetris? Or more generally, how does one play tetris in the first place? The rules of tetris seem to me to be better viewed than explained, but I’ll give a quick overview of tetris gameplay and tetris strategies.

As I’m sure most of you know, in tetris you have blocks of four (tetrominoes) falling from the top of the board. The player moves and rotates the blocks and stacks them up:

Here the black outline is one of the places you can put the funny shaped block. And when a row is filled entirely with blocks (the row with the red outline below), you get a clear; that entire row is removed and the rest of the board is shifted down (often with a series of beeping noises and a small increase to your score):

If the blocks don’t get cleared and they stack to the top of the board, you lose. So ideally you want to fill as many lines as possible and avoid stacking the blocks up. Very simple.

Simple Strategies

So what are some things that we can try to avoid losing the game? Some things immediately come to mind. We lose when the blocks get stacked up so high that we can’t put in new blocks, right? So it makes sense to avoid piling up large numbers of blocks in high positions in the first place, or to penalize height:

So for each position the computer can calculate a height penalty — for each block the computer adds a number depending on how high it is. Then when the AI tries to decide where to put the next block, it ‘knows’ that blocks piled up really high is a bad thing and tries to avoid it.

Another strategy that seems pretty obvious is to try to get clears! We assign a positive score for each line we clear, in other words we reward clears. Duh.

Anyone who has played a few games of tetris would probably subconsciously know a number of intuitive strategies — packing together blocks as tightly as possible for instance. How do we translate that into code? Well, to start, blocks that are packed tightly has little or no holes — we’ll define these as any empty spaces for which there is a block somewhere directly above it:

Why don’t we want holes? A row is only considered a clear if the entire row is filled — if there’s even a single hole in the row, it doesn’t get removed. Not good. So it makes sense to give a negative score to positions with holes in them — to penalize holes.

It’s best to try not to have any holes at all, but sometimes having a hole or two is inevitable. What can we do after we have holes in our formation? Good question, but we should not pile more blocks on top of our holes. If we define a blockade as any block that’s directly above a hole, we should penalize blockades:

Why are blockades bad again? Well, a hole only stops being a hole if there are no more blocks above it, so stacking more blocks above holes would only make it harder to remove the hole.

These are all the obvious strategies. I also put in less obvious scores rewarding or penalizing for hugging the wall (edge of the current block touching edge of the wall), hugging the floor (edge of current block touching the floor) and flattening (rewarding if the edge of the current block touches an existing block). Again, these are harder to justify and mostly for fine-tuning — it’s not even clear whether they should be positive or negative.

A Hedonistic AI

These strategies are sufficient to make a passable tetris AI. This algorithm is very simple:

Look at the current block and the next block and simulate ALL possible combinations (positions and rotations) of the two blocks.

Calculate a score for each of the positions.

Move the block to the position with the highest score and repeat.

To give a score for a position, we would use an equation like this:

Score = A * Sum of Heights
+ B * Number of Clears
+ C * Number of Holes
+ D * Number of Blockades

Where A, B, C, and D are weights that we decide — how important is each of the factors. I initially came up with some pretty arbitrary values:

-0.03 for the height multiplier

-7.5 per hole

-3.5 per blockade

+8.0 per clear

+3.0 for each edge touching another block

+2.5 for each edge touching the wall

+5.0 for each edge touching the floor

The reason I gave such a low multiplier for the height is because the numbers stack up so quickly it racks up a huge penalty for each block on the field. The numbers I chose seem pretty reasonable — and puts blocks more or less where a human would put them.

Playing God: Bringing in the Genetic Algorithm

The biggest problem with this method is that we choosed the weights pretty much arbitrarily. They might work well or they might not, but we don’t really know whether there are better values for them.

What can we do about it? We could brute force it — but with solutions that range across a continuum, there is a better way — a genetic algorithm.

A genetic algorithm is just a searching heuristic; it derives its ideas from evolution, where nature creates complex and sophisticated organisms by making random changes to the DNA.

Charles Darwin specifies four criteria for the process of natural selection to occur:

Variation: Organisms in a population must be slightly different from one another.

Inheritance: Traits of parent organisms must be passed onto their offspring.

Limited space: Only some of the offspring in any generation is able to survive and pass on its genes.

Competition: Individuals that are more fit are more likely to pass on their genes to the next generation.

In order to turn this into an algorithm, we’ll need — let’s quote this article:

A chromosome which expresses a possible solution to the problem as a string

A fitness function which takes a chromosome as input and returns a higher value for better solutions

A population which is just a set of many chromosomes

A selection method which determines how parents are selected for breeding from the population

A crossover operation which determines how parents combine to produce offspring

A mutation operation which determines how random deviations manifest themselves

We begin by constructing a chromosome — a solution to the problem of making an AI to play tetris. This is pretty easy, since we can already run an AI with a set of seven weights. So the chromosome is simply an array of seven doubles.

Next, our fitness function is very easy too, since the AI already has a scoring system. Basically the program would run the tetris AI at max speed on a 8 column board until it died, after which it would use the score it earned. Why only 8 columns and not the normal 10? In later generations, AI’s are able to survive for hours in the 10 column version, but when we reduce the number of columns to 8, even the best AI’s can survive for only a few seconds to a minute (we’re still talking about hundreds or thousands of lines here).

I used Nintendo’s original scoring system for tetris — 40 points for one clear, 120 points for two simultaneous clears, 300 for three simultaneous clears, and 1200 for four simultaneous clears. I also added 1 point for each block placed, to differentiate between AI’s that couldn’t score any lines.

Three, I chose a population of sixteen chromosomes. Initially the chromosomes are filled with randomly generated numbers (floating points fitting a normal distribution). Each generation onwards, the population’s chromosomes are derived from the best candidates of the previous generation (more on this later) — but the population size stays the same.

Next, for the selection method I chose a simple tournament method. After we run each candidate from a generation and collect all of their scores, we randomly pair up the candidates. For each pair, we take the high scorer — the winner — and discard the low scorer. Then, we pair up the winners randomly again to generate new offspring for the next generation.

Lastly, I implemented the crossover as follows: for each of the seven attributes in the offspring’s chromosome, we randomly select the respective attribute from the two parents with equal probability.

Occasionally, we have a mutation – a trait in an offspring that does not come from either parent. Each time we give an offspring an attribute, we have a 10% chance of assigning a completely random value to that attribute instead.

Results of the Genetic Algorithm

In the first generation or two, most candidates performed horribly. Many candidates had completely wrong weights — rewarding height and rewarding holes! Needless to say, these programs did not survive very long. But the genetic algorithm quickly came up with some decent solutions, and pretty soon most algorithms were scoring a few hundred lines (my original values gave about 20 lines on the 8-column version by comparison)

After running the genetic algorithm for about ten generations, I picked a candidate that was scoring decently:

-3.78 for the height multiplier

-2.31 per hole

-0.59 per blockade

+1.6 per clear

+3.97 for each edge touching another block

+6.52 for each edge touching the wall

+0.65 for each edge touching the floor

Whoa — that’s a huge height multiplier. Perhaps the multiplier is so big that it just overwhelms everything else in the list — remember that the height multiplier applies to every block on the field. Also, holes and blockades might not have been as bad as I thought — and what’s with the huge bonus for touching the wall?

I ran the whole thing again from scratch for ten generations — using different randomly generated starting values. What it came up with made me think at first that my program had a bug somewhere:

-3.71 for the height multiplier

-4.79 per hole

+1.4 per blockade

-1.87 per clear

+4.8 for each edge touching another block

+3.22 for each edge touching the wall

+3.68 for each edge touching the floor

Yup, this one rewarded blockades and penalized clears. And it would outperform both my naive values and the first set of AI values — I used this set in the video. It seems to put blocks in really bad positions, creating holes and blockades when it is unnecessary to — but it still does better than anything else I have.

Conclusion

Was the exercise a success? I would say partially so. There is the good and the bad. The good is that it came up with a much better AI than the one I originally had. But there may have been some things that could’ve been done better:

What I said about the height multiplier overwhelming everything else is a bit misleading. While it is true that the height multiplier itself applies to every block on the field, it doesn’t really work that way, and really only affects the current block. Reason being, the rest of the field — everything but the current block — stays constant no matter where the current block goes. It’s kind of like if you vote for everybody, you really vote for nobody as your votes have no effect on the outcome.

The lines cleared factor also turned out to be a bit misleading. While the second AI had a negative weight for clearing a line, it still cleared lines whenever it could: again tying back to the height multiplier. Clearing a line does exactly what it says: removing an entire row of blocks — and removing that many blocks does a huge blow to the height multiplier.

The fitness function was really kind of screwed up. By the time your AI’s can get a few thousand lines on a tiny 8-column board, the only thing that causes the AI to die is a bad sequence of hard-to-place S and Z blocks — and in any random number generator you’ll eventually get a series of unlucky blocks. So at later generations, simply simulating a 8 column tetris was fairly bad at separating the very good AI’s from the excellent AI’s.

Selection was also a bit screwed up. After ten generations, the entire population had more or less the same values, with only some minor variations — a bit ironic since this situation was exactly the situation the tournament selection algorithm was supposed to prevent.

Although a genetic algorithm can converge on a local optimum fairly quickly, producing a decent solution, it is very hard for it to achieve a global optimum — the best possible solution. You might have a situation where mutating any one value seriously harms the candidate, but mutating two or more values simultaneously in a certain way makes it better. This is a drawback for genetic algorithms in general.

So that’s all I have to say. I’m fairly new to genetic algorithms, so I may have botched one or more parts of the algorithm. I’d love to know what I did wrong and how I should’ve done better.

Like this:

LikeLoading...

Related

This entry was posted on Friday, May 27th, 2011 at 10:20 pm and is filed under Programming. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.

Have you tried when there was no chromosome for touching the floor? seems that touching the floor and height are redundant (and I would keep height penalty, since it’s got numeric values instead of just true/false)… might encourage the genes to get 4-row wins more often :)

It’s a pity that the story ends so quickly. It was really written in a way to learn a lot about it. Maybe the problem was too simple. Maybe the goal should be different that you can actually start the learning, meaning that you learn how to better choose your fitness and selection functions and how to find the problem in a model you set up yourself.

Not sure how this came up, but the AI does rotate the blocks. It would be silly not to.

It’d be nice to be able to run the genetic algorithm for hundreds of generations, but simulating an entire tetris game is pretty intensive, and takes about 5 or 10 minutes per generation when I put the speed at max. My program doesn’t seem to be stable enough to run for days at a time (all kinds of weird race conditions going on).

Run the AI against a half height, 10×10 grid. Using 10 columns is *very* important, as survival strategies change with less or more columns. The 8 column simulation is giving you bunk data for an actual game :\ But since you seem to be trying to tune the AI for survival, 10 rows instead of 20 involves uses the same strategies, but just ends orders of magnitude faster.

As to your tournament function, are you running each candidate several times and testing the -averages- against each other? If not a bad mino run, early or late, would make a perfectly strong candidate appear weak, again making your runs meaningless. When writing a Tetris AI myself I found I needed about 60 runs of each candidate before the average lines cleared became somewhat stable.

You should strongly consider changing the selection. Keep the best candidate to prevent losing any effective values. Consider moving away from tournament selection and just do the random crossover you have already. If you are converging at a local minimum, then not enough variation is being created. Your tournament is a major contributor to that. Also, implement a mutation to zero out parameters, as their actual presence may inhibit the most effective strategy. Rethink the fitness function. If bad runs will result, then either use a static seed or use a probabilistic score, like average score or macimum score over 10 runs. Finally, try more than just 10 generations. I have optimized simple functions that took 100,000 generations to reach a high level of fitness. Good luck. Machine learning is very exciting.

Hey, great work! I’ve been programming GAs and their variants for a number of years, off and on; only used them professionally back in ’89, but research such as you’re doing is all kinds of fun.

The best part of this is that you’ve *thought about what happened* and learned from what you did. This is one of the more amazing things about AI in general.

So, some opinions: First of all, the recommendation for Koza’s “Genetic Programming” book isn’t wonderful. If you can find a really cheap copy of his book, it’s great for stealing ideas from :-) but the GP idea is an evolutionary dead end (har!) and is vastly surpassed by GEP.

Check out Gene Expression Programming http://www.gene-expression-programming.com/ which has an excellent discussion under the “Tutorials” link. GEP in effect writes a program for you… but it’s not a magic bullet. Ordinary GAs (such as what you implemented) which tune numbers used by the fitness function are incredibly useful when appropriate.

Also (and some of the comments above bear this idea out) watch out for the advice you’ll get from people who’ve never worked with GAs. Something that comes up *constantly* is that people will say, “Gee, nice idea, but that crossover thing doesn’t seem to make any sense, I can’t see how it’s useful, just get rid of it and only use mutation!” But that’s just hill-climbing, and (you’ll find) much less successful than full operation GAs.

Yeah, every algorithm (except on trivial problems, in which case, why invoke AI at all?) can only find a local maximum. But GAs are better than most at finding a *good* solution.

And yeah, eventually your population will converge. The GENESIS package (an early “C” implementation of GAs) had some great statistics. First of all, it really implemented the numbers as bit-strings. The downside was a loss of speed, the upside was that it kept a count of “converged” and “lost” bits. A bit was considered “converged” if more than (IIRC) 70% of the population had the same value for that bit. “Lost” meant that 100% of the population had the same value for a bit.

So when your population converges, you can either turn the mutation way up (so that it diverges again) or just start a new population.

Another thing that’s useful to help stave off convergence is to not do “pan-mictic” (love them crazy biology terms :-) breeding. What that means is that instead of choosing parents from the entire population, they are chosen to be fairly close to each other (at some metric). This sometimes has the effect of having multiple solutions found in the population at the same time, but the boundaries between them are pretty bad.

Finally, I think you’ve discovered possibly the most amazing thing about GAs… you get a solution that (A) works and (B) is COMPLETELY non-intuitive, the point of appearing wrong! IMHO, that’s incredibly cool

First, great job.
The last output of your program is easilly explained.
you see it will give penalties for clears and it rewards holes.

the ai is doing this, because it gets rewarded bigtime when it is clearing multiple lines at once. when building thightly stacked columns with a single hole on top of each other, and the effect of a tetris block with a high score on nr of direct neighbours. ( read vertical column of 4 blocks)

Awesome post. As someone who made compute to play Tetris I can totally correlate to what you are talking about here. That too I wrote my game in Java too. I even made it as an applet and put it up on the web a few years back. You can go to the website and in the menu choose “Options -> Computer Plays” and then from “Game -> New” and see the game being played by the computer.

I never thought along the lines of using GA to solve this problem. I did like what you had, some heuristics. I have in my to-do list to integrate my applet with a scripting engine so that people interested in focusing on writing a clever algorithm can submit their code and test it and I can keep track of the best algorithm. I haven’t done it as I didn’t find a good way to compare two separate runs of the game (as the blocks are random). But I recently got some ideas on how to work around it. If I manage to execute my grand plan, I will let you know.

Neat program. It looks like you did a good job with it. I once managed to produce a GA that made worse results every generation, and it had elitism.

I don’t think you incorporated enough weights to make the AI sufficiently complex. Ideally, the GA could represent ANY algorithm you could imagine if it was “perfect.” 7 weights doesn’t come very close to it.

Also, I have a suggestion for the “height” addition. You should calculate the increase it adds. Basically, find the maximum value of newheight – oldheight, in every column. That might make it more realistic.

Also, I had an explanation for the penalty for single-row clears. It could be like you said that it’s overwhelmed by the height decrease. It could also be that it’s used to encourage double-row clears, which is 3 times as many points.

Also, rewarding blockades could be a strategy to quickly forget about holes. Basically, it could choose to make another hole or another blockade. It makes sense to reward blockades so you don’t have a structure that looks like Swiss cheese. Quickly covering holes in each column would penalize every mistake (producing a hole) after your first one in each column A LOT.

Still, I thought your analysis of what it was doing was pretty good. I would be interested in seeing how well a GA could perform if it had many more weights to look at.

Hi, nice work! You’ve done a great job explaining the pieces of GA.
I’m also doing a tutorial on this, using Java.

An ideia is to assume that a feature is either positive or negative, so you can take a random between 0 and 1, narrowing the search space. Another ideia is to fix a sequence of n random games, and let each agent play the whole sequence, taking the average of cleared rows as fitness!

Check out my blog if you can (it’s in Portuguese, but the translation is trivial).