Abstract

Reinforcement learning agents usually learn from scratch, which requires alarge number of interactions with the environment. This is quite different fromthe learning process of human. When faced with a new task, human naturally havethe common sense and use the prior knowledge to derive an initial policy andguide the learning process afterwards. Although the prior knowledge may be notfully applicable to the new task, the learning process is significantly sped upsince the initial policy ensures a quick-start of learning and intermediateguidance allows to avoid unnecessary exploration. Taking this inspiration, wepropose knowledge guided policy network (KoGuN), a novel framework thatcombines human prior suboptimal knowledge with reinforcement learning. Ourframework consists of a fuzzy rule controller to represent human knowledge anda refine module to fine-tune suboptimal prior knowledge. The proposed frameworkis end-to-end and can be combined with existing policy-based reinforcementlearning algorithm. We conduct experiments on both discrete and continuouscontrol tasks. The empirical results show that our approach, which combineshuman suboptimal knowledge and RL, achieves significant improvement on learningefficiency of flat RL algorithms, even with very low-performance human priorknowledge.