Main menu

One of the motivations in creating the Didactronic Framework was to
learn new technology. Many ports of the framework have been started
including using Python, Java, Clojure, and most recently Rust. Rust
was an interesting options because of its promise of speed, safety and
expressiveness. It seemed a good middle ground between imperative and
functional programming. Since this is a completely new language and
development paradigm for me (being primarily a C and Lisp hacker), the
Rust framework will need to be refined over time to make use of the
various constructs that are unique to that language. One such
construct is the match form.

The Didactronic Toolkit came about as a way to investigate ideas about formulating reinforcement learning problems using group theory. Its name, didactronic, is a contraction of the "didact" part of the word didactic, appended with the suffix "-tronic".

Greek: a suffix referring to a device, tool, or instrument; more generally, used in the names of any kind of chamber or apparatus used in experiments.

Therefore the term Didactronic signifies an instrument or apparatus intended to teach or convey instruction to an agent through experiments; this is the essence of reinforcement learning. The Didactronic Toolkit is meant to provide the basic tools to build such an instrument for an arbitrary task. However, since the toolkit is meant to be independent of domain, it must be both useful enough to simplify the task while being generic enough not to constrain it. The goal of this article is to distill reinforcement learning into its most basic elements to provide insight into the design philosophy behind the toolkit. The secondary objective of this work is to provide a vehicle for learning the Rust language. To that end, the Didactronic Toolkit will be re-implemented as a crate in Rust.

In part 1 of this series, the Tic-tac-toe reinforcement learning task was expressed as a Combinatorial Group with the hypothesis that the expansion of the group into a Cayley Graph could be used to learn its associated game tree. In this instalment, the expansion of the group into a Caley Graph will be examined in a bit more detail. Initially, the Tic-tac-toe group will be set aside in favour of a simpler domain which will offer a more compact and pedagogical representation. However, the expansion of the Tic-tac-toe group should follow the same process, this article will circle back to the Tic-tac-toe domain to highlight the equivalences which should ensure that this is so.

Tic-tac-toe, (or noughts and crosses or Xs and Ox), is a turn-based game for two players who alternately tag the spaces of a $3 \times 3$ grid with their respective marker: an X or an O. The object of the game is to place three markers in a row, either horizontally, vertically, or diagonally. Given only the mechanics of Tic-tac-toe, the game can be expressed as Combinatorial Group by defining a set $A$ of generators $\{a_i\}$ which describe the actions that can be taken by either player. The Cayley Graph of this group can be constructed which will express all the possible ways the game can be played. Using the Cayley Graph as a model, it should be possible to learn the Tic-tac-toe game tree using dynamic programming techniques (hint: the game tree is a sub-graph of the Cayley Graph).