Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”

My forthcoming paper, “Disjunctive Scenarios of Catastrophic AI Risk”, attempts to introduce a number of considerations to the analysis of potential risks from Artificial General Intelligence (AGI). As the paper is long and occasionally makes for somewhat dry reading, I thought that I would briefly highlight a few of the key points raised in the paper.

The main idea here is that most of the discussion about risks of AGI has been framed in terms of a scenario that goes something along the lines of “a research group develops AGI, that AGI develops to become superintelligent, escapes from its creators, and takes over the world”. While that is one scenario that could happen, focusing too much on any single scenario makes us more likely to miss out alternative scenarios. It also makes the scenarios susceptible to criticism from people who (correctly!) point out that we are postulating very specific scenarios that have lots of burdensome details.

To address that, I discuss here a number of considerations that suggest disjunctive paths to catastrophic outcomes: paths that are of the form “A or B or C could happen, and any one of them happening could have bad consequences”.

Superintelligence versus Crucial Capabilities

Bostrom’s Superintelligence, as well as a number of other sources, basically make the following argument:

An AGI could become superintelligent

Superintelligence would enable the AGI to take over the world

This is an important argument to make and analyze, since superintelligence basically represents an extreme case: if an individual AGI may become as powerful as it gets, how do we prepare for that eventuality? As long as there is a plausible chance for such an extreme case to be realized, it must be taken into account.

However, it is probably a mistake to focus only on the case of superintelligence. Basically, the reason why we are interested in a superintelligence is that, by assumption, it has the cognitive capabilities necessary for a world takeover. But what about an AGI which also had the cognitive capabilities necessary for taking over the world, and only those?

Such an AGI might not count as a superintelligence in the traditional sense, since it would not be superhumanly capable in every domain. Yet, it would still be one that we should be concerned about. If we focus too much on just the superintelligence case, we might miss the emergence of a “dumb” AGI which nevertheless had the crucial capabilities necessary for a world takeover.

That raises the question of what might be such crucial capabilities. I don’t have a comprehensive answer; in my paper, I focus mostly on the kinds of capabilities that could be used to inflict major damage: social manipulation, cyberwarfare, biological warfare. Others no doubt exist.

A possibly useful framing for future investigations might be, “what level of capability would an AGI need to achieve in a crucial capability in order to be dangerous”, where the definition of “dangerous” is free to vary based on how serious of a risk we are concerned about. One complication here is that this is a highly contextual question – with a superintelligence we can assume that the AGI may get basically omnipotent, but such a simplifying assumption won’t help us here. For example, the level of offensive biowarfare capability that would pose a major risk, depends on the level of the world’s defensive biowarfare capabilities. Also, we know that it’s possible to inflict enormous damage to humanity even with just human-level intelligence: whoever is authorized to control the arsenal of a nuclear power could trigger World War III, no superhuman smarts needed.

Crucial capabilities are a disjunctive consideration because they show that superintelligence isn’t the only level of capability that would pose a major risk: and there many different combinations of various capabilities – including ones that we don’t even know about yet – that could pose the same level of danger as superintelligence.

Incidentally, this shows one reason why the common criticism of “superintelligence isn’t something that we need to worry about because intelligence isn’t unidimensional” is misfounded – the AGI doesn’t need to be superintelligent in every dimension of intelligence, just the ones we care about.

How would the AGI get free and powerful?

In the prototypical AGI risk scenario, we are assuming that the developers of the AGI want to keep it strictly under control, whereas the AGI itself has a motive to break free. This has led to various discussions about the feasibility of “oracle AI” or “AI confinement” – ways to restrict the AGI’s ability to act freely in the world, while still making use of it. This also means that the AGI might have a hard time acquiring the resources that it needs for a world takeover, since it either has to do so while it is under constant supervision by its creators, or while on the run from them.

However, there are also alternative scenarios where the AGI’s creators voluntarily let it free – or even place it in control of e.g. a major corporation, free to use that corporation’s resources as it desires! My chapter discusses several ways by which this could happen: i) economic benefit or competitive pressure, ii) criminal or terrorist reasons, iii) ethical or philosophical reasons, iv) confidence in the AI’s safety, as well as v) desperate circumstances such as being otherwise close to death. See the chapter for more details on each of these. Furthermore, the AGI could remain theoretically confined but be practically in control anyway – such as in a situation where it was officially only giving a corporation advice, but its advice had never been wrong before and nobody wanted to risk their jobs by going against the advice.

Would the Treacherous Turn involve a Decisive Strategic Advantage?

Looking at crucial capabilities in a more fine-grained manner also raises the question of when an AGI would start acting against humanity’s interests. In the typical superintelligence scenario, we assume that it will do so once it is in a position to achieve what Bostrom calls a Decisive Strategic Advantage (DSA): “a level of technological and other advantages sufficient to enable [an AI] to achieve complete world domination”. After all, if you are capable of achieving superintelligence and a DSA, why act any earlier than that?

Even when dealing with superintelligences, however, the case isn’t quite as clear-cut. Suppose that there are two AGI systems, each potentially capable of achieving a DSA if they prepare for long enough. But the longer that they prepare, the more likely it becomes that the other AGI sets its plans in motion first, and achieves an advantage over the other. Thus, if several AGI projects exist, each AGI is incentivized to take action at such a point which maximizes its overall probability of success – even if the AGI only had rather slim chances of succeeding in the takeover, if it thought that waiting for longer would make its chances even worse.

Indeed, an AGI which defects on its creators may not be going for a world takeover in the first place: it might, for instance, simply be trying to maneuver itself into a position where it can act more autonomously and defeat takeover attempts by other, more powerful AGIs. The threshold for the first treacherous turn could vary quite a bit, depending on the goals and assets of the different AGIs; various considerations are discussed in the paper.

A large reason for analyzing these kinds of scenarios is that, besides caring about existential risks, we also care about catastrophic risks – such as an AGI acting too early and launching a plan which resulted in “merely” hundreds of millions of deaths. My paper introduces the term Major Strategic Advantage, defined as “a level of technological and other advantages sufficient to pose a catastrophic risk to human society”. A catastrophic risk is one that might inflict serious damage to human well-being on a global scale and cause ten million or more fatalities.

“Mere” catastrophic risks could also turn into existential ones, if they contribute to global turbulence (Bostrom et al. 2017), a situation in which existing institutions are challenged, and coordination and long-term planning become more difficult. Global turbulence could then contribute to another out-of-control AI project failing even more catastrophically and causing even more damage

Summary table and example scenarios

The table below summarizes the various alternatives explored in the paper.

AI’s level of strategic advantage

Decisive

Major

AI’s capability threshold for non-cooperation

Very low to very high, depending on various factors

Sources of AI capability

Individual takeoff

Hardware overhang

Speed explosion

Intelligence explosion

Collective takeoff

Crucial capabilities

Biowarfare

Cyberwarfare

Social manipulation

Something else

Gradual shift in power

Ways for the AI to achieve autonomy

Escape

Social manipulation

Technical weakness

Voluntarily released

Economic or competitive reasons

Criminal or terrorist reasons

Ethical or philosophical reasons

Desperation

Confidence

in lack of capability

in values

Confined but effectively in control

Number of AIs

Single

Multiple

And here are some example scenarios formed by different combinations of them:

The “classic” AI takeover scenario: an AI is developed, which eventually becomes better at AI design than its programmers. The AI uses this ability to undergo an intelligence explosion, and eventually escapes to the Internet from its confinement. After acquiring sufficient influence and resources in secret, it carries out a strike against humanity, eliminating humanity as a dominant player on Earth so that it can proceed with its own plans unhindered.

Many corporations, governments, and individuals voluntarily turn over functions to AIs, until we are dependent on AI systems. These are initially narrow-AI systems, but continued upgrades push some of them to the level of having general intelligence. Gradually, they start making all the decisions. We know that letting them run things is risky, but now a lot of stuff is built around them, it brings a profit and they’re really good at giving us nice stuff—for the while being.

Many different actors develop AI systems. Most of these prototypes are unaligned with human values and not yet enormously capable, but many of these AIs reason that some other prototype might be more capable. As a result, they attempt to defect on humanity despite knowing their chances of success to be low, reasoning that they would have an even lower chance of achieving their goals if they did not defect. Society is hit by various out-of-control systems with crucial capabilities that manage to do catastrophic damage before being contained.

Google begins to make decisions about product launches and strategies as guided by their strategic advisor AI. This allows them to become even more powerful and influential than they already are. Nudged by the strategy AI, they start taking increasingly questionable actions that increase their power; they are too powerful for society to put a stop to them. Hard-to-understand code written by the strategy AI detects and subtly sabotages other people’s AI projects, until Google establishes itself as the dominant world power.

Share this:

One comment

1. It was created, and is maintained, by people.
2. A majority opinion seems to be that it is beneficial to people. Anarchists, hardcore communists and rewilders and such are regarded in suspicion, at least.
3. We don’t really control it, or know how it will react to our actions.
4. It is commonly discussed as having its own will and intentions, though I doubt this is a serious philosophical claim or anything.
5. It has a great effect on our life, both of local and global scale.

In what sense would this be similar to or different from a powerful AI? To what extent can the economic system be considered to be an AI?

Obligatory social links

Follow me on:

Google+ Posts

Kaj Sotala:
Every now and then one sees accusations of plagiarism, in e.g. design: frequently, the evidence is just "these two designs are way too similar for it to be chance", based on an appeal to common sense. And yes, no doubt many of the accusations are correct, and it was indeed a case of plagiarism.

But those news always make me wonder - in a world with almost 8 billion people, how complicated and similar do any two designs have to be before we can be sure that it was indeed plagiarism? With this many people, it would be surprising if people working independently and with no knowledge of each other didn't ever accidentally create designs that looked "too similar for it to be an accident". (especially since different designers aren't developing their designs purely at random, but are rather working under similar constraints and goals)

With design, if that happens, then we might never be able to say for sure whether it was independent creation or whether someone did plagiarize from the other. Now this article's example of something that would also feel too implausible for it to be chance, if we didn't have evidence to the contrary, is from photography. There, enough information did exist in the two photos that the two people who took them could verify that they were indeed different shots. But the next time that I see a side-by-side comparison of two designs, one of them claimed to be a plagiarism of the other, I'm probably going to think "yeah, those two do look so similar that one of them has to be stolen... but that's what I would have thought of those lighthouse shots too."

>... there was one comment that mentioned that I had stolen the image from another New England photographer, Eric Gendon. After letting the commenter know that it was indeed my image and that I possess the original RAW file, I headed over to the other photographers page and was blown away. We had what looked like the exact same image, taken at the exact millisecond in time, from what looked like the same exact location and perspective.How Two Photographers Unknowingly Shot the Same Millisecond in Time

Kaj Sotala:
In the Star Trek universe, we are told that it's really hard to make genuine artificial intelligence, and that Data is so special because he's a rare example of someone having managed to create one.

But this doesn't seem to be the best hypothesis for explaining the evidence that we've actually seen. Consider:

- In the TOS episode "The Ultimate Computer", the Federation has managed to build a computer intelligent enough to run the Enterprise by its own, but it goes crazy and Kirk has to talk it into self-destructing.- In TNG, we find out that before Data, Doctor Noonian Soong had built Lore, an android with sophisticated emotional processing. However, Lore became essentially evil and had no problems killing people for his own benefit. Data worked better, but in order to get his behavior right, Soong had to initially build him with no emotions at all. (TNG: "Datalore", "Brothers")- In the TNG episode "Evolution", Wesley is doing a science project with nanotechnology, accidentally enabling the nanites to become a collective intelligence which almost takes over the ship before the crew manages to negotiate a peaceful solution with them.- The holodeck seems entirely capable of running generally intelligent characters, though their behavior is usually restricted to specific roles. However, on occasion they have started straying outside their normal parameters, to the point of attempting taking over the ship. (TNG: "Elementary, Dear Data") It is also suggested that the computer is capable of running an indefinitely long simulation which is good enough to make an intelligent being believe in it being the real universe. (TNG: "Ship in a Bottle")- The ship's computer in most of the series seems like it's potentially quite intelligent, but most of the intelligence isn't used for anything else than running holographic characters. - In the TNG episode "Booby Trap", a potential way of saving the Enterprise from the Disaster Of The Week would involve turning over control of the ship to the computer: however, the characters are inexplicably super-reluctant to do this.- In Voyager, the Emergency Medical Hologram clearly has general intelligence: however, it is only supposed to be used in emergency situations rather than running long-term, its memory starting to degrade after a sufficiently long time of continuous use. The recommended solution is to reset it, removing all of the accumulated memories since its first activation. (VOY: "The Swarm")

There seems to be a pattern here: if an AI is built to carry out a relatively restricted role, then things work fine. However, once it is given broad autonomy and it gets to do open-ended learning, there's a very high chance that it gets out of control. The Federation witnessed this for the first time with the Ultimate Computer. Since then, they have been ensuring that all of their AI systems are restricted to narrow tasks or that they'll only run for a short time in an emergency, to avoid things getting out of hand. Of course, this doesn't change the fact that your AI having more intelligence is generally useful, so e.g. starship computers are equipped with powerful general intelligence capabilities, which sometimes do get out of hand.

Soong's achievement with Data was not in building a general intelligence, but in building a general intelligence which didn't go crazy. (And before Data, he failed at that task once, with Lore.)

The original design for the game didn't have warfare, diplomacy, or technological advancement; all of that was added as the design was iterated on:

> Like Railroad Tycoon before it, Civilization was born out of Meier’s abiding fascination with SimCity. [...] Railroad Tycoon had attempted to take some of the appeal of SimCity and “gameify” it by adding computerized opponents and a concrete ending date. It had succeeded magnificently on those terms, but Meier wasn’t done building on what Wright had wrought. In fact, his first conception of Civilization cast it as a much more obvious heir to SimCity than even Railroad Tycoon had been. Whereas SimCity had let the player build her own functioning city, Civilization would let her build a whole network of them, forming a country — or, as the game’s name would imply, a civilization.

To think, most 4X games today, they tend to just copy Civ’s basic formula, including elements like the city-building, warfare, diplomacy, technology…

And then the guys making the first Civ had no idea that this would become a genre, just putting together systems that seemed to make sense to them. If they hadn’t thought of the technology idea, for instance, would anyone else have come up with it? Today, it feels like such an obvious idea that surely someone would eventually have made a game that also had you developing technology throughout the ages… but would they have?» The Game of Everything, Part 1: Making Civilization The Digital Antiquarian

> If someone says “in Rotherham the police ignored evidence that these people were assaulting children, for politically motivated reasons”, then if I’m responsible I will go check how often the police ignore evidence that people are assaulting children for absolutely no reason at all and eventually I will probably conclude that police just frequently ignore evidence of serious crimes.

> I have encountered communities where everyone constantly talked at Rotherham in exhausting detail but they had absolutely no idea about any of the other cases I mentioned.

> I mean that. They just had no idea. You ask them “can you name a csa case where there isn’t evidence that the police could have acted ten years sooner than they did?” and they are genuinely surprised that in the case of Larry Nassar, in the case of Jerry Sandusky, in the case of Jimmy Saville, in the case of Catholic clergy, the police could have acted ten years earlier and didn’t. They’ve heard about Rotherham, and only Rotherham, and because their sources were so carefully selective in which horrible things they let their readers learn of, the readers end up thinking that something uniquely went wrong in Rotherham, instead of realizing that police just don’t actually typically do anything about evidence of sexual abuse of children until years and sometimes decades after they could have.

> As far as I can tell, in every single csa scandal that is uncovered, there’s abundant evidence that it could have been uncovered a lot sooner, and the police got reports and failed to act. This seems to be very nearly universal. I’m not sure why it’s true. I find it disturbing that it’s true. The fact that so many people cover up sexual assault of children is something that has caused me to seriously ask myself “am I the kind of person who would do that? Why not? Those people would presumably have answered that question ‘of course not’, and they were wrong, so how do I make sure I’m not wrong?” And I think it’s a good idea for other people to ask themselves that too! But the people who talk endlessly in horrifying detail about Rotherham and are totally clueless that this is a general feature of sexual abuse cases…. they’re working from a disastrously bad model of the world, and I am pretty sure that a lot of sexual abuse might pass them by because they’ve managed to end up with such a wrong and distorted impression of what the problem is. (If you think the problem is “political correctness”, of course you fight political correctness. If it turns out that actually, near-universally police do not act on these accusations, that points to a completely different solution and all of your political-correctness fighting is actively worse than useless.) Re the TERF thing, I think you underestimate the...

Kaj Sotala:
> ... we hypothesized that extreme forms of music such as heavy metal, which is associated with antisocial behavior, irreligiosity, and deviation from the norm is less prevalent in the regions with higher prevalence of pathogenic stress. [...] Results showed that parasite stress negatively predicts the number of heavy metal bands. However, no relationship was found between the intensity of the music and parasite stress.