Complexity analysis and optimal algorithms for decentralized decision making

by Bernstein, Daniel S

Abstract (Summary)

Coordination of distributed entities is required for problems arising in many areas, including multi-robot systems, networking applications, e-commerce applications, and the control of autonomous space vehicles. Because decisions must often be made without a global view of the system, coordination can be difficult. This dissertation focuses on the development of principled tools for solving problems of distributed decision making. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). This framework is very general, incorporating stochastic action effects, uncertainty about the system state, and limited communication. It has been adopted for use in the fields of control theory, operations research, and artificial intelligence. Despite this fact, a number of fundamental questions about the computational aspects of the model have gone unanswered. One contribution of this thesis is an analysis of the worst-case complexity of solving DEC-POMDPs. It was previously established that for a single agent, the finite-horizon version of the problem is PSPACE-complete. We show that the general problem is NEXP-complete, even if there are only two agents whose observations together determine the system state. This complexity result illustrates a fundamental difference between single agent and multiagent decision-making problems. In contrast to the single agent problem, the multiagent problem provably does not admit a polynomial-time algorithm. Furthermore, assuming that EXP and NEXP are distinct, the problem requires super-exponential time to solve in the worst case. A second contribution is an optimal policy iteration algorithm for solving DEC-POMDPs. Stochastic finite-state controllers are used to represent policies. A controller can include a correlation device, which allows agents to correlate their actions without communicating during execution. The algorithm alternates between expanding the controller and performing value-preserving transformations, which modify a controller without sacrificing value. We present two efficient value-preserving transformations, one which can reduce the size of the controller and another which can improve its value while keeping the size fixed. Our policy iteration algorithm serves as the first nontrivial exact algorithm for DEC-POMDPs.