Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

Multi-agent teams can be used to perform tasks that would be very difficult or impossible for single agents. Although such teams provide additional functionality and robustness over single-agent systems, they also present additional challenges, mainly due to the difficulty of coordinating multiple agents in the presence of uncertainty and partial observability. Agents in a multi-agent team must not only reason about uncertainty in their environment; they must also reason about the collective state and behaviors of the team.

Partially Observable Markov Decision Processes (POMDPs) have been used extensively to model and plan for single agents operating under uncertainty. These models enable decision-theoretic planning in situations where the agent does not have complete knowledge of its current world state. There has been recent interest in Decentralized Par- tially Observable Markov Decision Processes (Dec-POMDPs), an extension of single-agent POMDPs that can be used to model and coordinate teams of agents. Unfortunately, the problem of finding optimal policies for Dec-POMDPs is known to be highly intractable. However, it is also known that the presence of free communication transforms a multi-agent Dec-POMDP into a more tractable single-agent POMDP. In this thesis, we use this transformation to generate "centralized" policies for multi-agent teams modeled by Dec-POMDPs. Then, we provide algorithms that allow agents to reason about communication at execution-time, in order to facilitate the decentralized execution of these centralized poli- cies. Our approach trades off the need to do some computation at execution-time for the ability to generate policies more tractably at plan-time.

This thesis explores the question of how communication can be used effectively to enable the coordination of cooperative multi-agent teams making sequential decisions under uncertainty and partial observability. We identify two fundamental questions that must be answered when reasoning about communication: "When should agents communicate," and "What should agents communicate?" We present two basic approaches to enabling a team of distributed agents to Avoid Coordination Errors. The first is an algorithm that Avoids Coordination Errors by reasoning over Possible Joint Beliefs (ACE-PJB). We contribute ACE-PJB-COMM, which address the question of when agents should communicate. SELECTIVE ACE-PJB-COMM, which answers the question of what agents should communicate, is an algorithm that selects the most valuable subset of observations from an agent's observation history.

The second basic coordination approach presented in this thesis is an algorithm that Avoids Coordination Errors during execution of an Individual Factored Policy (ACE-IFP). Factored policies provide a means for determining which state features agents should communicate, answering the questions of when and what agents should communicate. Additionally, we use factored policies to identify instances of context-specific independence, in which agents can choose actions without needing to consider the actions or observations of their teammates