Author

Zhou, Yi

Date of Issue

2007

School

School of Electrical and Electronic Engineering

Abstract

In this thesis, a novel Reinforcement Learning (RL) methodology,
termed Dynamic Self-Generated Fuzzy Q-Learning (DSGFQL) is
developed for generating Fuzzy Neural Networks (FNNs).
In the DSGFQL system, RL is adopted for both structure
identification and parameters estimation of FNNs. Structure and
premise parameters can be dynamically adjusted according to
reinforcement evaluations. Besides evaluation signals for system
performance, a reinforcement sharing mechanism is adopted for
evaluating contributions of each fuzzy rules. Therefore, both
system performance and individual contributions of each fuzzy
rules can be evaluated through reinforcement signals. Fuzzy rules
with good contributions can be reinforced while fuzzy rules with
poor contributions will be penalized or eliminated. Therefore,
structure and premise parts of FNNs can be determined in an RL
manner.
The DSGFQL offers a novel view of generating FNNs. RL
methodologies are not only applied for selecting optimal actions
(consequent parameters) but also applied in determining the number
of rules, pruning and adjusting premise parameters. Similarly as
reinforcing good actions and penalizing poor actions in
convectional RL approaches, good \emph{rules} are promoted while
bad \emph{rules} are demoted or eliminated in the DSGFQL method.
Therefore, instead of only focusing on applying RL in training
consequent parameters (consequent-generation), RL is adopted at a
higher level (premise-generation level).
As structure and premise parameters of FNNs can be adjusted
according to reinforcement evaluations, efficient structure can be
determined through the DSGFQL method.
The novel DSGFQL methodology can automatically create, delete and
adjust fuzzy rules according to the evaluation of system
performance as well as contributions from individual fuzzy rules.
The whole learning process is based on evaluative information and
it does not required instructive training data or many human
efforts.
Besides self-generating FNNs without a \emph{priori} structure,
the DSGFQL approach can also be incorporated by domain knowledge
from human experts or from previous training. At premise level,
initial domain knowledge about tasks can be incorporated as bias
into the system by If-Then fuzzy rules. An NN structure for
incorporating bias components is proposed according to the
confidence in the initial knowledge. Therefore, rapid and safe
learning can be achieved. At consequents training level, a sharing
mechanism is proposed to initialize Q-values of newly generated
rules when applying the Q-learning. Instead of randomly assigning
Q-values of new rules, Q-values are initialized according to those
existing neighboring values. Therefore, previous knowledge can be
learned from those neighboring fuzzy rules and learning speed can
be increased.
Furthermore, extended studies for further developing the DSGFQL
algorithm are carried out. For non-Temporal Difference (TD)-based
RL approaches, a reward function scheme (DSGFQL-reward) is
proposed as a general approach for all RL problems. Global and
local rewards are adopted as evaluation criteria for system and
local performances respectively. As reward function is a basic
element for all RL problems, including non-TD-based approaches,
the reward scheme offers a general RL methodology for generating
FNNs.
Moreover, an enhanced version of the DSGFQL termed Enhanced
Dynamic Self-Generated Fuzzy Q-Learning (EDSGFQL) is proposed by
combining the DSGFQL with an extended Self-Organizing Map (SOM)
algorithm. An extended SOM is proposed and adopted to adjust the
center positions of fuzzy neurons for better feature
representation. With better allocation of fuzzy neurons, the
original DSGFQL is enhanced and the number of fuzzy rules can be
further reduced.
Besides extensional approaches in determining premise parameters
of FNNs, continuous action Q-learning is combined with the DSGFQL
in generating local continuous actions. Therefore, besides
applying fuzzy inference for generating continuous global actions,
local continuous actions can also be obtained instead of discrete
ones from each local fuzzy rules. In the DSGFQL-CA approach,
continuous consequent parameters are estimated instead of discrete
ones.
The DSGFQL algorithm and its extended methodologies
are applied in robotics tasks for navigation such as
wall-following and obstacle avoidance tasks. Comparison studies
with other existing fuzzy RL approaches demonstrate the
superiority of the proposed methods as more efficient FNNs can be
generated. A number of comparative studies are carried out to
validate the viability of the proposed approaches in both static
and dynamic training environments.