How Disney shows an AI apocalypse is possible

I want to convince you of something: that an ‘AI apocalypse’ is not a ridiculous thing...

I want to convince you of something: that an ‘AI apocalypse’ is not a ridiculous thing to worry about.

Sure, there are other, more near-future things to worry about involving artificial intelligence (AI) – including privacy and surveillance, and the use of AI-controlled weapons on the battlefield. But we can worry about more than one thing at a time. And while the idea of AI destroying humanity is, I think, not likely, nor is it so improbable that we can dismiss it, as some people do, as quasi-religious mumbo-jumbo, or bad sci-fi.

For the last year or so, I have been writing a book about the possibility. The thing I have come to dread, in that time, is people saying “Oh! Like in The Terminator?”

This is, sadly, the go-to reference. The image that people have in their minds, when they wonder whether artificial intelligence might cause a real, existential threat to humanity, is Skynet. Every article on the subject is illustrated with a grinning metal android.

Unfortunately, it’s spectacularly unhelpful. The risk is not that AI might become ‘self-aware’, or that it might turn against its creators, or that it will ‘go rogue’ and break its programming. The risk is that, instead, it will become competent.

The risk is that it will do exactly what it is asked to do, but it will do it too well: that completing what sounds like a simple task to a human could have devastating unforeseen consequences.

Here’s roughly how that could go. One group that worries about ‘AI safety’, as it’s known, is the Machine Intelligence Research Institute (MIRI) in Berkeley, California. Their executive director, Nate Soares, once gave a talk at Google in which he suggested that, instead of The Terminator, a better fictional analogy would be Disney’s Fantasia.

While the idea of AI destroying humanity is, I think, not likely, nor is it so improbable that we can dismiss it

Mickey, the Sorcerer’s Apprentice, is asked to fill a cauldron with water. When the Sorcerer leaves, Mickey enchants a broom to do it for him, and goes to sleep. Inevitably enough, the broom obeys him perfectly, eventually flooding the entire room and tipping Mickey into the water.

Of course, if Mickey simply told the broom to keep bringing water and never stop, then he’d only have himself to blame. But even if he’d told the broom to bring the water until the cauldron was full, it would probably still have gone terribly wrong. Imagine the broom filled it until the water was four inches from the top. Is that ‘full’? How about one inch? The broom isn’t sure.

Well, surely when it’s right at the top, and water is splashing on the floor, the broom is sure? Well, probably 99.99% sure. But, crucially, not completely sure. It can’t do any harm to add more water, in case, say, its eyes are deceiving it, or the cauldron has a leak. You haven’t told the broom to “fill the cauldron until you’re pretty sure it’s full”, you’ve just said “fill it until it’s full”.

A human would know that other things – not flooding the room, for instance – are more important than ever-more-tiny increments of certainty about how full the cauldron is. But when you ‘programmed’ your broom ‘AI’, you didn’t mention that. The broom cares about nothing else but the fullness of the cauldron.

The fear is that a powerful, “superintelligent” AI could literally end human life, while obeying its innocuous-seeming instructions to the letter.

What we humans think of as simple goals are actually surrounded by other, much more complex, considerations, and unless you tell the AI, it won’t know that.

There are other problems. For instance, the goal of ‘fill the cauldron’ is most easily completed if you, the broom ‘AI’, are not destroyed, or switched off, or given another new goal. So almost any AI would be incentivised to stop you from switching it off or destroying it – either by fighting back, or perhaps by copying itself elsewhere.

And almost any goal you are given, you could probably do better with more resources and more brainpower, so it makes sense to accumulate more of both. Eliezer Yudkowsky, also of MIRI, has a saying: “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”

Steve Omohundro, an AI researcher, suggests that even something as harmless-sounding as a chess-playing AI, simply ordered to become as good at chess as possible, could be very dangerous, if precautions weren’t taken. It would, for instance, be in its interests to acquire unlimited amounts of matter to build more computers out of, to enable it to think ever more deeply about chess. That may not strike you as inherently dangerous, but if you consider that you are made of matter, and so is the Earth, you may see the potential problem. The fear is that a powerful, “superintelligent” AI could literally end human life, while obeying its innocuous-seeming instructions to the letter.

I know, coming to this stuff cold, it sounds silly. But it appears that AI researchers take it seriously. The standard undergraduate AI textbook, Artificial Intelligence: A Modern Approach, dedicates three pages to this sort of AI risk.

Shane Legg and Demis Hassabis, the founders of Google’s DeepMind AI firm, are on record saying it’s a serious risk, and DeepMind has collaborated on research into ways to prevent it. Surveys of AI researchers find that a majority of them think that superintelligent AI will arrive in the lifetimes of people alive now, and that there is a strong possibility – roughly a 1 in 5 chance – that it will lead to something “extremely bad (existential catastrophe)”, i.e. human extinction.

And you can see hints of similar things happening now, at a much smaller, funnier scale. A paper published on arXiv in March gave examples of ways that AIs go wrong, by obeying the task perfectly, but in unexpected ways. The AIs were developed by evolutionary techniques, so the programmers didn’t know how they worked.

There is a strong possibility – roughly a 1 in 5 chance – that it will lead to something “extremely bad (existential catastrophe)”, i.e. human extinction

One was supposed to fix a piece of software that, due to a bug, was putting the lists it was sorting in the wrong order. The AI went in, did its thing, and the programmers found that the piece of software started returning lists that scored perfectly – apparently in perfect order. Suspicious, they went and looked, and found that the AI had simply broken the software it was meant to be fixing; the software now returned empty lists, and an empty list can’t be out of order.

Another one was supposed to play noughts and crosses. It worked out how to win by playing impossible moves on points billions of squares away from the actual board. Its opponents were forced to try to represent a board with billions of squares in their memory, and, unable to do so, promptly crashed, leaving the cheating AI to win by default.

I’m not saying that this is inevitable. But I do worry that people discount it utterly, because it sounds weird, and because the people who talk about it are easy to dismiss as weird (and they are weird; please do read my book, The Rationalists: AI and the geeks who want to save the world, to learn more about them! Out in spring 2019).

But remember that actual AI researchers seem to think there’s a risk. Imagine they’ve got it wrong: let’s say that their guess of a one in five chance is a massive overestimate. Let’s say that, from a combination of sampling error in the survey and bad guesswork by the researchers, they’re wrong by a factor of 20, and there’s only a one in 100 chance they’re right.

There’s about a one in 100 chance that you’ll be killed at some point in your lifetime in a motor accident. That is not something we think it is ridiculous to worry about. Just because the people saying something are weird, doesn’t mean they’re wrong.