Abstract

We investigate the important problem of certifying stability of reinforcementlearning policies when interconnected with nonlinear dynamical systems. We showthat by regulating the input-output gradients of policies, strong guarantees ofrobust stability can be obtained based on a proposed semidefinite programmingfeasibility problem. The method is able to certify a large set of stabilizingcontrollers by exploiting problem-specific structures; furthermore, we analyzeand establish its (non)conservatism. Empirical evaluations on two decentralizedcontrol tasks, namely multi-flight formation and power system frequencyregulation, demonstrate that the reinforcement learning agents can have highperformance within the stability-certified parameter space, and also exhibitstable learning behaviors in the long run.