Safe Learning Agents

Papers from the AAAI Spring Symposium

Since Weld and Etzioni's "The First Law of Robotics" at AAAI in 1994, there has been growing concern with the safety of deploying intelligent agents in the real world. Perhaps HAL in Kubrick's 2001: A Space Odyssey is the best image of such an agent gone wrong.

One area often missing from such discussions is the safety of learning agents. This is an important omission since learning/adaptation is a component in most definitions of what it means to be an agent. Some recent work has begun to address some of the issues involved, but the field is still in the initial stages of defining the problem.

A safe agent is one which can efficiently find and execute acceptable solutions for its target problems. Learning can adversely affect which problems the agent can solve, the efficiency with which they come up with plans to solve them, and the quality of the solutions. Thus learning can cause a "safe" agent to become "unsafe."

Since we would like the agents to learn from their environment, either the agent must be able to quickly/cheaply check that its new learning hasn't made it unsafe or the learning method must guarantee that it will preserve the agent's safeness.

Unfortunately, it is unlikely that for problems in general, that any learning method would be able to guarantee it won't decrease coverage, efficiency, and solution quality. So, at least initially, we must identify learning methods that make guarantees with respect to at least one of the performance dimensions and/or with respect to some restricted class of problems. While it may not be possible to guarantee monotonicity for these performance dimensions, we may be able to bind the degradation for some of these.

In this symposium we are interested in peoples' experiences with agent learning going wrong, how to prevent it, and with what end-users both want and fear from learning agents.