Generalizing Inverse Reinforcement Learning for the Real World

Prashant Doshi

About the Event

Inverse reinforcement learning (IRL) investigates ways by which a learner may approximate the preferences of an expert by observing the experts’ actions over time. Usually, the expert is modeled as optimizing its actions using a Markov decision process (MDP), whose parameters except for the reward function are known to the learner. IRL is enjoying much success and attention in applications such as robots learning tasks from demonstrations by human experts and in imitation learning.

However, operational settings for traditional IRL are restrictive. In this talk, I will present methods for generalizing IRL to contexts that involve multiple observed robots, which may interact and whose observed trajectories are partially occluded from the learner. I will also explore generalizing IRL to settings where additional parameters of others' MDPs are unknown. These are all challenges encountered in real-world contexts. We evaluate these methods in the context of an application setting involving two mobile robots executing simple cyclic trajectories for perimeter patrolling. Both robots’ patrolling motions are disturbed when they approach each other in narrow corridors leading to an interaction. A subject robot observes them from a hidden vantage point that affords partial observability of their trajectories only. It’s task is to penetrate the patrols and reach a goal location without being spotted. Thus, its eventual actions do not impact the other robots. Videos of demonstration runs will be shown.

Biography

Dr. Prashant Doshi is an Associate Professor of Computer Science, director of the THINC Lab (http://thinc.cs.uga.edu) and is a faculty member of the AI Institute at the University of Georgia, USA. He is also serving as the founding director of the Faculty of Robotics at UGA (http://robotics.uga.edu) -- an OVPR initiative. He received his Ph.D. in Computer Science from the University of Illinois at Chicago in 2005. His research interests lie in decision making under uncertainty in multiagent settings, robotics and the semantic Web. His research has been funded by grants from NSF (including the CAREER in 2008), AFOSR, ARO and ONR. He received UGA's Creative Research Medal in 2011 for his contributions to automated decision making. He has published extensively in JAIR, conferences such as AAAI and AAMAS, and other forums in the fields of agents and AI. Currently, he is on sabbatical with the Cheriton School of Computer Science at the University of Waterloo.