Main menu

You are here

Dynamic Programming

Dynamic Programming (DP) refers to a collection of algorithms which, given a perfect model of an environment, can compute optimal policies for a Markov Decision Process. The classical DP algorithms are of limited use due to their assumption of a perfect model.

Tic-tac-toe, (or noughts and crosses or Xs and Ox), is a turn-based game for two players who alternately tag the spaces of a $3 \times 3$ grid with their respective marker: an X or an O. The object of the game is to place three markers in a row, either horizontally, vertically, or diagonally. Given only the mechanics of Tic-tac-toe, the game can be expressed as Combinatorial Group by defining a set $A$ of generators $\{a_i\}$ which describe the actions that can be taken by either player. The Cayley Graph of this group can be constructed which will express all the possible ways the game can be played. Using the Cayley Graph as a model, it should be possible to learn the Tic-tac-toe game tree using dynamic programming techniques (hint: the game tree is a sub-graph of the Cayley Graph).