5
Agent Basics Two main approaches – Action selector – State evaluator Each of these has strengths and weaknesses For any given problem, no hard and fast rules – Experiment! Success or failure can hinge on small details!

9
Simple Example: Mountain Car Often used to test TD learning methods Accelerate a car to reach goal at top of incline Engine force weaker than gravity (DEMO)

10
State Value Function Actions are applied to current state to generate set of future states State value function is used to rate these Choose action that leads to highest state value Discrete set of actions

11
Action Selector A decision function selects an output directly based on current state of system Action may be a discrete choice, or continuous outputs

19
Setup Use weighted piece counter – Fast to compute (can play billions of games) – Easy to visualise – See if we can beat the ‘standard’ weights Limit search depth to 1-ply – Enables billions of games to be played – For a thorough comparison Focus on machine learning rather than game-tree search Force random moves (with prob. 0.1) – Get a more robust evaluation of playing ability

39
Conclusions Games are great for CI research – Intellectually challenging – Fun to work with Agent learning for games is still a black art Small details can make big differences! – Which inputs to use Big details also! (NTuple versus MLP) Grand challenge: how can we design more efficient game learners? EvoTDL hybrids are the way forward.