Abstract:
Model predictive control (MPC) is becoming an increasingly popular
method to select actions for controlling dynamic systems.
Traditionally MPC uses a model of the system to be controlled and a
performance function to characterize the desired behavior of the
system. The MPC agent finds actions over a finite horizon that lead
the system into a desired direction. A significant problem with
conventional MPC is the amount of computations required and
suboptimality of chosen actions. In this paper we propose the use of
MPC to control systems that can be described as Markov decision
processes. We discuss how a straightforward MPC algorithm for Markov
decision processes can be implemented, and how it can be improved in
terms of speed and decision quality by considering value functions. We
propose the use of reinforcement learning techniques to let the agent
incorporate experience from the interaction with the system in its
decision making. This experience speeds up the decision making of the
agent significantly. Also, it allows the agent to base its decisions
on an infinite instead of finite horizon. The proposed approach can be
beneficial for any system that can be modeled as Markov decision
process, including systems found in areas like logistics, traffic
control, and vehicle automation.