Navigation

The top of the learner hierarchy is more conceptual than functional.
The different classes distinguish algorithms in such a way that we can automatically
determine when an algorithm is not applicable for a problem.

A class for learners that learn from a dataset, which has no target output but
only a reinforcement signal for each sample. It requires a
ReinforcementDataSet object (which provides state-action-reward tuples).

Learn on the current dataset, either for many timesteps and
even episodes (batchMode = True) or for a single timestep
(batchMode = False). Batch mode is possible, because Q-Learning
is an off-policy method.

In batchMode, the algorithm goes through all the samples in the
history and performs an update on each of them. if batchMode is
False, only the last data sample is considered. The user himself
has to make sure to keep the dataset consistent with the agent’s
history.

In batchMode, the algorithm goes through all the samples in the
history and performs an update on each of them. if batchMode is
False, only the last data sample is considered. The user himself
has to make sure to keep the dataset consistent with the agent’s
history.

Reinforce is a gradient estimator technique by Williams (see
“Simple Statistical Gradient-Following Algorithms for
Connectionist Reinforcement Learning”). It uses optimal
baselines and calculates the gradient with the log likelihoods
of the taken actions.