Abstract

This thesis presents novel work on how to improve exploration in reinforcement learning using domain knowledge and knowledge-based approaches to reinforcement learning. It also identifies novel relationships between the algorithms' and domains' parameters and the exploration efficiency.
The goal of solving reinforcement learning problems is to learn how to execute actions in order to maximise the long term reward. Solving this type of problems is a hard task when real domains of realistic size are considered because the state space grows exponentially with each state feature added to the representation of the problem.
In its basic form, reinforcement learning is tabula rasa, i.e. it starts learning with very limited knowledge about the domain. One of the ways of improving the performance of reinforcement learning is the principled use of domain knowledge. Knowledge is successful in related branches of artificial intelligence, and it is becoming increasingly important in the area of reinforcement learning as well. Reinforcement learning algorithms normally face the problem of deciding whether to execute explorative of exploitative actions, and the paramount goal is to limit the number of executions of suboptimal explorative actions. In this thesis, it is shown how domain knowledge and understanding of algorithms' and domains' properties can help to achieve this.
Exploration is an immensely complicated process in reinforcement learning and is influenced by numerous factors. This thesis presents a new range of methods for dealing more efficiently with the exploration-exploitation dilemma which is a crucial issue of applying reinforcement learning in practice. Reward shaping was used in this research as a well established framework for incorporating procedural knowledge into model-free reinforcement learning. Two new ways of obtaining heuristics for potential-based shaping were introduced and evaluated: high level symbolic knowledge and the application of different hypothesis spaces to learn the heuristic. These techniques open the way to improve reinforcement learning via reward shaping in situations when there is no information about the potential function. In the work on potential-based reward shaping, the actual shaping reward under different conditions was also specified and empirically evaluated. In the context of model-based reinforcement learning, a novel technique to incorporate knowledge into the initial MDP-models was proposed, evaluated, and proven to meet properties of PAC-MDP learning. One of the important factors which influence exploration in reinforcement learning is the concept of eligibility traces. The last part of this research focused on a detailed analysis of how eligibility traces influence exploration under a multitude of conditions.
The contribution of this thesis shows how to learn the potential function for reward shaping when it is not available, and also shows formal specification of the actual shaping reward under a multitude of conditions. It also shows how to use partial knowledge about effects of actions to create knowledge-based and theoretically correct implementations of PAC-MDP learning. Novel relationships between eligibility traces and exploration efficiency were also identified. Findings of this thesis extend current understanding and improve the exploration efficiency of reinforcement learning algorithms.