ON DEVELOPMENTAL VARIATION IN HIERARCHICAL SYMBIOTIC POLICY SEARCH

View/Open

Date

Author

Metadata

Abstract

A hierarchical symbiotic framework for policy search with genetic programming (GP)
is evaluated in two control-style temporal sequence learning domains. The symbiotic
formulation assumes each policy takes the form of a cooperative team between multiple
symbiont programs. An initial cycle of evolution establishes a diverse range of
host behaviours with limited capability. The second cycle uses these initial policies
as meta actions for reuse by symbiont programs. The relationship between development and ecology is explored by explicitly altering the interaction between learning agent and environment at fixed points throughout evolution. In both task domains, this developmental diversity significantly improves performance. Specifically, ecologies designed to promote good specialists in the first developmental phase and then good generalists result in much stronger organisms from the perspective of generalization ability and efficiency. Conversely, when there is no diversity in the interaction between task environment and policy learner, the resulting hierarchy is not as robust
or general.
The relative contribution from each cycle of evolution in the resulting hierarchical
policies is measured from the perspective of multi-level selection. These multi-level
policies are shown to be significantly better than the sum of contributing meta actions.