Topics

Alpha Go

Alpha Go Documentary

Timepoint: Jump straight to the part of the Alpha Go Documentary where they explain the learning process Alpha Go uses. It also is the start of the first moment where the program does a creative move that humans did not expect.https://youtu.be/jGyCsVhtW0M?t=2834

Thompson Sampling

Markov Decision Processes

Domains

Gridworld

Eligibility Traces

Eligibility traces in tabular setting lead to a significant benefit in training time in additional to the Temporal Difference method.

In Deep RL it is very common to use experience replay to reduce overfitting and bias to recent experiences. However, experience replay makes it very hard to leverage eligibility traces which require a sequence of actions to distribute reward backwards.