Author
Topic: Any info on how the AI works? (Read 4117 times)

I've been really enjoying Lost Cities, and the AI manages to pull off some victories when I least expect it, which is great.

I would love to hear from the AI creator (Nicolas Guibert?) about how the AI is designed. Do the different bots have 'personalities', or how are the different strength bots distinguished, more computing time? Are there simple heuristics programmed in or does it use anything fancy like a neural network? Does it know the concept of bluffing or concealing information about its hand?

It is difficult to explain how the AI is done without explaining too much and giving away how they each play. But I will try.

First, you have to understand that we can't afford any CPU-intensive algorithm. Our AIs run on our server and your browser (depending if it is an online match or a training one) and in both cases, it is not possible to ask the machine to think too long. Suppose every move takes just one second to compute. You soon realize that the server will not be able to play more than a few games at a time before coughing hard. We want our server to serve thousands of players at a time, not a handful. Same applies to the client side, we can't ask an old mobile phone to do much. I have even seen mobile browsers stop completely when the page runs Javascript (the language we use for the AI) for more than a few seconds without interruption.

That is a big issue as most of the powerful techniques used in AI are CPU-intensive. The last (huge) AI achievement done by Google on the game of Go (look for Alphago) used super-high-caliber machines to let the machine learn the game by self-playing many many times. I am digressing a bit here as this was for training the algorithm, not for the actual playing of the game (although I don't know what machine they used for the actual games once the machine was trained). So let's go back to the original point, CPU-intensive algorithms are very often used to get good results and it is the only way. Give a chess program only 1/1000 of a second to play a move and there is every chance that it will not be able to compete with the best players as it would in 1/10 of a second.

So, as if AI was not difficult enough, we have this extra constraint that we must make the programs play as quickly as possible and preferably within 1/100 of a second. This is usually achieved and when I let the bots play against each other to tune them and seed them (find their relative rating), I can see that they play several games per second. I don't remember the exact figure and it depends a lot on the game, but I'd say it is probably about 5 to 10 games per second on average. A game of Lost Cities is about 90 moves.

This does not leave much room for fancy techniques. So for most games, we use a very streamlined version of Alpha-Beta algorithm. Alpha-Beta is the basic algorithm for Chess, Othello, and most abstract games without hidden information. I have used it successfully in the past for developing my international draughts program (see Buggy for more information).

Alpha-Beta has 2 sides. The first one is the search, in other words, the analysis of the variants in the tree of possibilities. Once you have looked up x moves ahead, you stop searching and reach what is called a leaf of the tree. That's where the 2nd part of the algorithm plays its part: the evaluation function. Its goal is to assess the leaf and return a score, good or bad, more or less positive or negative. The search part then uses the leaf scores to navigate through the variants, eliminates the weak ones and finally decides which move is best at the root of the tree.

Both sides of the Alpha-Beta can be CPU-intensive. Each call to the evaluation function can take a lot of time if this function is refined. The more the evaluation knows (and is accurate), the more it will need time. Also, the deeper you search, the more leaves you will have and the more calls to the evaluations you will need. So basically we have this formula :

Time spent on a move = number of leaves X time needed to evaluate one leaf

This tells the whole story. We can't afford to explore many leaves and we can't afford to assess them accurately.

So what we usually do is only search one move ahead and evaluate the leaves from there. Very basic stuff.

To get a decent end product, despite the low complexity of the algorithm, requires building an evaluation function that estimates the positions as accurately as possible with as few lines of codes as possible. And that is probably where the secret of our AIs lies (if there is a secret). A good understanding of the game is key (we can't hope that brute force will hide the developer deficiency, we can't use brute force) and only then a good modelization of that knowledge is necessary.

I think your original question was only related to Lost Cities, but as the above applies to all games including Lost Cities, I think it was worth mentioning.

Now how did we build this evaluation function for Lost Cities? The most important feature of Lost Cities is time. You want to buy time and avoid making decisions most of the time. But how do you tell this to the computer? I did not specifically teach him this notion, but the way scoring was done naturally takes care of that idea. The whole evaluation is based on estimating the score likelihood for each series. Example: you have already started a series with $ and 3. Is it good to play a 6 now when the 4 and the 5 have not shown up yet? Obviously playing the 6 right now costs potential points in the future. How many? Mostly it's maths with a few refinements. The program will recognize that if we have a 7 and 8 in hand, it is better to play the 6 right now than if we did not have them. We will have good moves next time, that's worth points. It will obviously also now the value of $ and the expectation of future possible $ and the likelihood of missing cards coming into our hands in future (less and less likely as the game progresses and less likely if the opponent has started the color). Then we compare this move with alternative moves and take the best.

Obviously, the program will also recognize the end game and try not to get stuck with all his good cards in hand at the end (as we humans always do).

This is enough to get to a very decent level. The program does not even think "play a card" / "draw a card" as one single move. It looks at the "play a card" move first and plays it, then looks at what it can draw. A clear possible improvement.

We usually let the program run against itself to optimize parameters (and find bugs when a supposedly good new piece of code leads to a sharp decrease in rating). The best way to proceed is to alternate between playing the program oneself and let it run against each other. There is no point optimizing a program that has evident flaws that can easily be seen with your own eyes. What's the point of knowing that version A is better than version B by 100 points if both play really badly?

Once we have the best bot we can in a reasonable amount of time (a couple of weeks in general), we can start to program the 11 weaker bots. That's not completely straight-forward but in general much easier than building the best bot. This usually gives us new ideas so we are often caught going back to the previous phase, trying to improve the best bot further.

We try to give each bot its personality by tweaking the main parameters of the game. For Lost Cities, that would be how much the program likes to start a new series, how much it likes to discard cards (both strongly related), how much it likes to draw from the deck rather than the discard piles, etc.

For Lost Cities, there is not much you can do really. So 3 to 4 personalities maybe. But that's a good start. To make bots even weaker (in a different way too), we almost always add randomness to the evaluation so that in essence the bot does not always play the same in the same situation. Not only does it weaken it, it also adds some unpredictability. We usually don't add randomness to the best bot as it inevitably leads to a weaker rating when evaluating against other bots. This is not an accurate representation of the strength of the bots though but we have no better way to measure it (we can't ask 10 players to play 100 games against each bot before releasing the game). However, we did it for Hanamikoji as this is really a game where you don't want to be too predictable (I am quite sure the bot is still too predictable). So we voluntarily weakened Verboten thinking that it would then be better against real players, able to exploit predictability. Finally, there are other ways to make the bots weaker and different. We can simply make them not understand/recognize certain features of the game, or at least not understand them all the time. For example, some might struggle to recognize the endgame or not know that the 6 is not a bad move if we have a 7 and 8 in hand.

And of course, we must make sure that Lobotomo is not too strong for beginners or even worse for some experienced players.

I guess this answers your questions. Feel free to ask more as required! In any case, I hope it is good reading and that I have not given away too much.

Have fun all!

Nicolas Guibert.

PS: I have moved this post to another sub-forum (originally posted in Lost Cities forum) as my reply has now gone much further than the original question. Thanks for asking!

Really interesting. One thing that has always interested me is whether the bots play with perfect memory. This is especially relevant for Glastonbury, where knowing exactly what you and your opponent has is a huge advantage in the endgame. Most humans will remember most but not all of the cards, and weigh up the benefits of lokking at the cards collected or picking up a different card

Though the game (Lost Cities) is simple enough, i'd say that there are certain understandings, characteristics , acknowledgements that single out 'better' players (whatever that means).

In the above AI explanation, i found this - "Obviously, the program will also recognize the end game and try not to get stuck with all his good cards in hand at the end (as we humans always do)" to be most interesting...

IMO one of the more 'advanced' techniques to be on the watch for (and bots use!), is their ability to grasp probabilities that their opponent has X number of playable cards in their hand. Given the number of remaining draw cards, the bot may choose to extinguish the discard piles (something most players don't keep track of) eliminating the players ability to extend the match.

I hate when that happens.

Congrats on keeping it fun...And have you ever considered an even higher level bot (for Lost Cities) that wouldn't be in the regular rotation, but could be accessed if desiring a higher challenge?

Really interesting. One thing that has always interested me is whether the bots play with perfect memory. This is especially relevant for Glastonbury, where knowing exactly what you and your opponent has is a huge advantage in the endgame. Most humans will remember most but not all of the cards, and weigh up the benefits of lokking at the cards collected or picking up a different card

Well indeed the bots use their perfect memory when it is needed (by that I mean for the better bots). As I said, it is hard enough to program an AI. It would be cruel on us to forbid the use of this.

I already hear people complaining. "This is not fair. I can't have perfect memory. The bots should not have perfect memory. Nor should they be allowed to play with precomputed tables/knowledge/opening database/endgame database." To this, I reply that the fight against an AI is not fair by definition. They use electricity, we don't. They have perfect maths, we don't (and they are quick at it too). They have infinite storage, etc. Suppose you were racing a 100 meter against a car. What would the point of saying that the race is unfair? Of course it is.

Of course, it is always better if the bots play similarly to humans, with the same strengths and the same flaws... And we always hope to be as close to that as possible. But by essence, it is not possible. For a start, computers will always be stronger at tactics (that requires computation and short-mid term exhaustivity) than at strategy. As much as we try to avoid the issue, our bots will always struggle with strategy, at least from the point of view of the top human players.

IMO one of the more 'advanced' techniques to be on the watch for (and bots use!), is their ability to grasp probabilities that their opponent has X number of playable cards in their hand. Given the number of remaining draw cards, the bot may choose to extinguish the discard piles (something most players don't keep track of) eliminating the players ability to extend the match.I hate when that happens.

They don't know anything about that specifically and as a player, I have never really thought about it this way really. There is little a chance the AI could have grasped it by itself without me telling them about it. In any case, you are right, it is an important part of the game to figure out the best moment to start laying your last cards and always an advantage to be the one who does not need to slow down the end game by drawing useless discarded cards. I guess sometimes the AI find the right timing.

Quote

Congrats on keeping it fun...

Thanks!

Quote

And have you ever considered an even higher level bot (for Lost Cities) that wouldn't be in the regular rotation, but could be accessed if desiring a higher challenge?

I am not sure what you have in mind here. Can you develop? I have in any case never thought about that possibility. That's why I am curious.

Wow, that was a very interesting read! Thanks for sharing the details, as I am a computer programmer myself it makes the game more interesting.

It's very impressive they can play so well in only 10ms, that speaks to a very well designed evaluation function.

I understand how Alpha-Beta works in the perfect information case like with Buggy (very cool by the way), and I even understand how it could extend to a game like Backgammon that has dice where each outcome has a fixed probability. But I'm hoping you can explain to me how this can work in Lost Cities where the actions your opponent can take depend on the hidden information...?

So suppose the bot would search each of the 8 cards in hand to either play or discard that card, then it would search a draw action. Is it going to have a lot of branches and search every card that's still in the deck, and assume they're equal probability? Or does it already know what card is on top of the deck and only needs to search the true outcome (which a human player wouldn't know)?

Then the same question for when the bot is searching the opponent's turn - would the bot search every possible card the opponent could have, or does it already know?

Anyway, I understand that you might not want to say too much about how the AI works, but I would appreciate any info you're willing to give

So suppose the bot would search each of the 8 cards in hand to either play or discard that card, then it would search a draw action. Is it going to have a lot of branches and search every card that's still in the deck, and assume they're equal probability? Or does it already know what card is on top of the deck and only needs to search the true outcome (which a human player wouldn't know)?

Then the same question for when the bot is searching the opponent's turn - would the bot search every possible card the opponent could have, or does it already know?

Well in both cases, that would be considered cheating. And I don't want to use this kind of technique to make my life easier. Players have already accused the bots of cheating on many occasions. I can only imagine what the reaction would be if the bots actually knew what cards were coming and based their decision upon it. brrr...

In case of hidden information, one can imagine three approaches.

The first one is to go through all possibilities. If we are talking about the deck (and the remaining cards in the deck), it is doable. But it is not even good enough. Near the end of the game, the probability of drawing a 10 that has not been played yet, is much lower than the possibility of drawing a 2 that has not been seen. Why is that? Because the 10 is probably in your opponent's hand and they kept it. In case of a 2, there is every chance that they would have discarded it earlier. The program knows this sort of thing and uses it when estimating the potential remaining in each suit.

The first approach is in any case completely impractical if we think of the hand of the opponent. We can try to compute all possible hands and decide for each one what would be the best play. Then compute the best moves from there. But that's going to be a big/huge number of combinations, especially at the start of the game. In that case, the second approach could be used: generating a set of possible opponent's hands (and their likelihood to refine the process) and for each of them find the best course of action. So we would decide on the best move based on a subset of possible opponent's hands. The method would probably be called Monte-Carlo simulation.

We don't use any Monte-Carlo simulation because we went for the 3rd approach which is guaranteed to be less CPU-intensive and hopefully gives good enough estimations. Basically, we built a simplified model that does not need to know exactly what the opponent has. We are simply interested in the potential of each expedition (for us and for the opponent) and decide from there what cards are worth.

As I said, dealing with hidden information very easily leads to huge combinatorial complexities. So simplifying the process is absolutely needed for all games but the simplest ones. Even for Hanamikoji, a game that has only 21 cards, of which we know 6-7 at the start, an exhaustive approach is probably already be too much.

So, interestingly I think I do quite well against other players but the computer can be hard to beat. Computer seems to play quite differently than normal players. Very aggressive discards, see discards up to 4,5. Most humans just avoid aggressive discards, not sure when to use them or how to counter.

Quote

Obviously, the program will also recognize the end game and try not to get stuck with all his good cards in hand at the end (as we humans always do).

I think this might be one of the weakness's of the AI. I think it delays the endgame too often. It might be correct to delay the endgame at that point, but it might've been better to play more cards earlier (or not start a new colour so late).