milwac wrote:This is an excellent resource mattsc, thank you! While I look through this code, perhaps I can be a bit more specific about what I'm looking for right now, and ask : are there some simple functions that I can write in a script in Python or Lua to access in game properties like - side to move, villages each side has, types of units available, positions of units, reachable hexes for a unit etc.? I wish to visualize these properties while a game/replay is being played, and see which ones could be useful as input features, before I think about the AI code itself.

I don't know about python, but a lot of this is built-in Wesnoth Lua functionality. Go to LuaWML and check out the links in Section 4. For example:

If you want more complex functions, there's a good chance that I or somebody else has already written something in our own AI experiments. Just ask, it might exist. These are very different types of AI though, so likely not much help to what you want to do otherwise.

The bigger problem I see is that you say you want to visualize this while you are playing a replay. The way Wesnoth works is that anything you want to see in a replay needs to be there while the game is being "recorded" (played). So while that is possible (some of my AIs have that type of debugging mechanism built in; see, for example, eval_exec_CA.lua), it has to be present when the game is started and cannot be applied after the fact to previous replays. Well, it is possible, but you would have to add it by editing the replay files. Writing a script to do so automatically is, of course, a possibility.

Thanks mattsc for all your help (on this thread and offline) in understanding how a new AI can be written in Wesnoth and how the AI functions in general.

So far I have come to the conclusion that simulating a sequence of moves (in a single turn or otherwise) is not possible with the current AI framework provided in Lua. It is only possible to evaluate the current state of the game and perform some actions (moves) on it, but it is not possible to copy over the current state, perform actions on that state, evaluate the final modified state and then perform the actual actions on the original state. This process is essential for any learning algorithm to work, in fact, it is essential for any algorithm which tries to 'look ahead' for instance, minimax. It is also essential to perform any sort of optimization algorithm to optimize actions of a single turn (recruitment or unit placement).

This has been my observation. mattsc did suggest to me in a pm that it might be possible to write a wrapper around the evaluation and execution functions of the candidate actions, and call them.. but i wonder if there is a possibility to save a copy of the game state and run these wrapper functions on that copy, in such a way that the standard wesnoth.<xyz> queries can be made on the copy. So far, I think not.

If someone thinks that there is a way to perform simulations like the way I want, please do let me know.

milwac wrote:[..]Another key feature of reinforcement learning is 'delayed reward'. When I said, killing the opponent's leader is the aim of an N v N game - the delayed reward in this case is the leader-kill. The algorithm is trained in such a way that all situations that can lead to a leader-kill are given high values and all situations that lead to such situations are also given high values and so on. Of course there can be good and bad moves in between and a match can be lost with a single good or bad move. Such patterns are accounted for by sampling a lot of possible directions probabilistically from a given state of the game and all those future states evaluated to determine whether the current state is a good state to be in, or not. As we know, Wesnoth has a a huge branching factor, and so how we deal with this is still up for discussion at this point. But the core idea is - yes, it is possible to assign intermediate rewards to any given state of the game and determine how good or bad it is for a said player.

Hi,
I'm not experienced but as player I can tell that any reward may be reduced to gold (aside from leader kill ofc). Collecting and holding villages has a numerical value as much as killing an unit which have costed some amount on the enemy side. On Isar's for example, detecting an easy kill and reaching it at turn 4-5, say a naga, will weight exactly -14 gold for the enemy ability to reach his future goal. Pretty much better than locking 2 units on 2 villages and increasing the total game income by .. mumble ..+4 i think. Not sure if opponent's upkeep factor should be considered, probably not.
Given this statement, everything leads to valuating the risk of -future- attacks (even from the first turn!), moves to reduce risks which doesn't exclude "don't move at all" and how much resources put in every plan which is matter of enough ai train.

An interesting point to me is the wesnoth rng and how tactical choices are rewarded in ai's learning since the same assumptions trough the game will lead to different results, so the ai growth should be less elastic than alphago or chess implementations. I would like to read more brainstormings.
Anyway, thx for trying it