Google DeepMind, University of Oxford working on emergency ‘kill switch’ for rogue AI machines

News: The kill switch aims to prevent AI machines deviating from their original purpose.

A team comprising of Google DeepMind‘s Laurent Orseau, and the University of Oxford’s Stuart Armstrong, are working on a ‘kill switch’ concept which will neutralise machines when they are deviating from their original purpose.

Google and Oxford’s Future of Humanity Institute of Artificial Intelligence and Machine Learning team describe their research as reinforcement learning agent interruptibility, The team will present their findings at UAI 2016.

The research team is exploring a method to safely and repeatedly interrupt or override the actions of an intelligent machine when it finds that the machine is turning rogue. These deviations could be harmful and can put the machine and the environment around it at risk.

Another major issue that was highlighted in the paper was that the machine being self-aware or being aware of the human interruption (the ‘kill switch’). The team said that it makes sense to ensure that the machine does not learn to plan around the ‘kill switch’ mechanism.

Stuart Armstrong said: "Interruptibility has applications for many current agents, especially when we need the agent to not learn from specific experiences during training.

"Many of the naive ideas for accomplishing this — such as deleting certain histories from the training set — change the behaviour of the agent in unfortunate ways."

In the paper, it was mentioned that safe interruptibility can be a useful way to control a robot that is ‘misbehaving’. Misbehaving has been interpreted as taking the robot from its comfort zone and letting it perform a task which it did not learn to perform.

Armstrong said: "Machine learning is one of the most powerful tools for building AI that has ever existed. But applying it to questions of AI motivations is problematic: just as we humans would not willingly change to an alien system of values, any agent has a natural tendency to avoid changing its current values, even if we want to change or tune them.

"Interruptibility and the related general idea of corrigibility, allow such changes to happen without the agent trying to resist them or force them.

"The newness of the field of AI safety means that there is relatively little awareness of these problems in the wider machine learning community. As with other areas of AI research, DeepMind remains at the cutting edge of this important subfield."