Fuzzy State Aggregation and Off-Policy Reinforcement Learning for Stochastic Environments

Keywords

Abstract

Reinforcement learning is one of the more attractive
machine learning technologies, due to its unsupervised
learning structure and ability to continually learn even as
the environment it is operating in changes. This ability to
learn in an unsupervised manner in a changing
environment is applicable in complex domains through
the use of function approximation of the domain’s policy.
The function approximation presented here is that of
fuzzy state aggregation. This article presents the use of
fuzzy state aggregation with the current policy hill
climbing methods of Win or Lose Fast (WoLF) and
policy-dynamics based WoLF (PD-WoLF), exceeding the
learning rate and performance of the combined fuzzy state
aggregation and Q-learning reinforcement learning.
Results of testing using the TileWorld domain
demonstrate the policy hill climbing performs better than
the existing Q-learning implementations.