Abstract

Discourse on AI safety suffers from heated disagreement between those sceptical and those concerned about existential risk from AI. Framing discussion using strategic choice of language is a subtle but potentially powerful method to shape the direction of AI policy and AI research communities. It is argued here that the AI safety community is committing the dual error of frequently using language that hinders constructive dialogue and missing the opportunity to frame discussion using language that assists their aims. It is suggested that the community amend usage of the term ‘AI risk’ and employ more widely the ‘AI accidents’ frame in order to improve external communication, AI policy discussion, and AI research norms.

Contents

Abstract

The state of public discourse on AI safety

Why to care about terminology

Introducing ‘AI accidents’ and why use of ‘AI risk’ can be inaccurate

Why use of ‘AI risk’ is problematic and why use of ‘AI accidents’ is helpful

From the perspective of sceptics

From the perspective of the newcomer to the subject

Shaping policy discussion and research norms

Seizing the opportunity

Footnotes

Works Cited

The state of public discourse on AI safety

Contemporary public discourse on AI safety is often tense. Two technology billionaires have engaged in a regrettable public spat over existential risks from artificial intelligence (Samuelson, 2017); high profile AI experts have volleyed loud opinionpieces making contradictory calls for concern or for calm (Dafoe & Russell, 2016) (Etzioni, 2016); both factions (the group sceptical of existential risk posed by AI and the group concerned about the risk) grow larger as interest in AI increases, and more voices join the debate. The divide shows little sign of narrowing. If surviving machine superintelligence will require strong coordination or even consensus, humanity’s prospects currently look poor.

In this polarised debate, both factions, especially the AI safety community, should look to ways to facilitate constructive policy dialogue and shape safety-conscious AI research norms. Though it insufficient on its own, framing discussion using strategic choice of language is a subtle but potentially powerful method to help accomplish these goals (Baum, 2016).

Why to care about terminology

Language choice frames policy debate, assigns the focus of discussion, and thereby influences outcomes. It decides whether the conversation is “Gun control” (liberty reducing) or “Gun violence prevention” (security promoting); “Red tape” or “Safety regulations”; “Military spending” or “Defence spending”. If terminology does not serve discussion well, it should be promptly rectified while the language, the concepts it signifies, and the actions, plans, and institutions guided by those concepts are still relatively plastic. With that in mind, the below advocates that the AI safety community revise its use of the term ‘AI risk’ and employ the ‘AI accidents’ frame more widely.

It will help first to introduce what is argued to be the substantially better term, ‘AI accidents’. The inaccuracy of current language will then be explored, followed by discussion of the problems caused by this inaccuracy and the important opportunities missed by only rarely using the ‘AI accidents’ frame.

Introducing ‘AI accidents’ and why use of ‘AI risk’ can be inaccurate

An AI accident is “unintended and harmful behavior that may emerge from poor design of real-world AI systems” (Amodei, et al., 2016). The earliest description of misaligned AI as an ‘accident’ appears to be in Marvin Minsky’s 1984 afterword to Vernor Vinge's novel, True Names:

“The first risk is that it is always dangerous to try to relieve ourselves of the responsibility of understanding exactly how our wishes will be realized. Whenever we leave the choice of means to any servants we may choose then the greater the range of possible methods we leave to those servants, the more we expose ourselves to accidents and incidents. When we delegate those responsibilities, then we may not realize, before it is too late to turn back, that our goals have been misinterpreted, perhaps even maliciously. We see this in such classic tales of fate as Faust, the Sorcerer's Apprentice, or the Monkey's Paw by W.W. Jacobs.” (Minsky, 1984)

“He (Tallinn) said that in his pessimistic moments he felt he was more likely to die from an AI accident than from cancer or heart disease,” (University of Cambridge, 2012).

There is some evidence that the term was used in the AI safety community prior to this (LessWrong commenter "Snarles", 2010), but other written evidence proved elusive through online search.

The first definition of ‘accidents in machine learning systems’ appears to be provided in the well-known paper Concrete Problems in AI Safety (Amodei, et al., 2016). This is the definition for ‘AI accident’ given above and used here throughout.

Some examples of AI accidents may be illustrative: A self-driving car crash where the algorithm was at fault would be an AI accident; a housekeeping robot cooking the cat for dinner because it was commanded to “Cook something for dinner” would be an AI accident; using algorithms in the justice system that have inadvertently been trained to be racist would be an AI accident; the 2010 Flash Crash or similar future incidents would be an AI accident; deployment of a paperclip maximiser would be an AI accident. There is no presupposed upper bound for the size of AI accidents. AI safety seeks to reduce the risk of AI accidents.

Figure: AI accidents. The relative placement of instances of AI accidents may be subject to debate; the figure is intended for illustration only.

At significant risk of pedantry, close examination of terminology is worthwhile because, despite the appearance of hair-splitting, it yields what will emerge to be useful distinctions.

‘AI risk’ has at least three uses.

‘An AI risk’ - The ‘count noun’ sense, meaning a member of the set of all risks from AI, ‘AI risks’, which can be used interchangeably with ‘dangers from AI’, ‘potential harms of AI’, ‘threats from AI’, etc. Members of the set of AI risks include:

AI accidents

Deliberate misuse of AI systems (e.g. autonomy in weapons systems)

Risks to society deriving from intended use of AI systems, which may result from coordination failures in the deployment of AI (e.g. mass unemployment resulting from automation).

‘AI risk’ – The ‘mass noun’ sense, meaning some amount of risk from AI. In practice, this means to discuss at least one member of the above set of risks, but the source of risk is not implied. It can be used interchangeably with ‘danger from AI’, ‘potential harm of AI’, ‘AI threat’, etc.

Observe that in the third usage, the label used for the second (mass noun) sense is used to refer to an instance of the first (count noun) sense. It would be easy to overlook this small discrepancy of ‘crossed labels’. Nevertheless, below it is argued that using the third sense causes problems and missed opportunities.

Before exploring why use of the third sense might cause problems, note that it has been employed frequently by many of the major institutions in the AI safety community (although the accurate senses are used even more commonly)2:

Why use of ‘AI risk’ is problematic and why use of ‘AI accidents’ is helpful

Use of the third sense could be defended on several grounds. It is conveniently short. In a way, it is not even especially inaccurate; if, like many in the AI safety community, one believes that the vast majority of AI risk comes from catastrophic AI accidents, one could be excused for equivocating the labels.

Problems arise in the combination of the generality of the mass noun sense and the inaccuracy of the third use. An additional issue is the missed opportunity of not using ‘AI accidents’.

Generality:A key issue is that general terms like ‘AI risk’, ‘AI threat’, etc., when used in their mass noun senses, conjure the most available instances of ‘AI risk’, thus summoning in many listeners images of existential catastrophes induced by artificial superintelligence – this is perhaps one reason why the AI safety community came to employ the third, inaccurate use of ‘AI risk’. The generality of the term permits the psychological availability of existential risks from superintelligent AI to overshadow less sensational risks and accidents. A member of the AI safety community will not necessarily find this problematic; a catastrophic AI accident is indeed their main concern, so they might understandably not care much if general terms like ‘AI risk’, ‘AI threat’, etc. conjure their highest priority risk specifically. There are two groups for which this usage may cause problems: (1) sceptics of risks from catastrophic AI accidents and (2) newcomers to the subject who have not yet given issues surrounding AI much serious consideration. Aside from causing issues, not using a strategically selected frame misses opportunities to influence how groups such as policymakers and AI researchers think about existential risks from AI; using the AI accident frame should prove beneficial.

From the perspective of sceptics

Inaccuracy:Most informed sceptics of catastrophic AI accidents are plausibly still somewhat concerned about small AI accidents and other risks, but they may find it difficult to agree that they are concerned with what the AI safety community might, by the third sense, refer to as ‘AI risk’. The disagreement with ‘AI risk’ (third sense) does not reflect the fact that the two groups are in broad agreement on most risks, disagreeing only on risk from a part of the AI accidents scale. The crossed labels creates the illusion of discord regarding mitigation of ‘AI risk’. The confusion helps drive the chorus of retorts that safety-proponents are wrong about ‘AI risk’ and that AI accidents are the ‘wrong risk’ to focus on (Sinders, 2017) (Nogrady, 2016) (Madrigal, 2015) (Etzioni, 2016), and presents AI safety work, which in fact mitigates risk of AI accidents of any size, as the domain of the superintelligence-concerned uniquely.

With the ‘AI accidents’ frame, otherwise-opposing factions can claim to be concerned with different but overlapping areas on the scale of AI accidents; the difference between those concerned about catastrophic AI accidents and those who are not is simply that the former camp sees reason to be cautious about the prospect of AI systems of arbitrary levels of capability or misalignment, while the latter chooses to discount perceived risk at higher levels of these scales. To observers of the debate, this partial unity is much easier to see within the AI accident frame than when the debate concerned ‘AI risk’ or ‘existential risk from AI’. There does not need to be agreement about the probability of accidents on the upper-end of the scale to have consensus on the need to prevent smaller ones, thereby facilitating agreement to prioritize research that prevents AI accidents in general.

Both factions now working within the same conceptual category, the result is that the primary disagreement between groups becomes only the scope of their concerns rather than on the existence of a principal concept. Using the ‘AI accidents’ frame helps find common ground where ‘AI risk’ struggles.

From the perspective of the newcomer to the subject

Missed opportunity: We should conservatively assume that a newcomer to the subject holds priors that are sceptical of existential risks from artificial superintelligence. For these individuals, current language misses an opportunity for sound communication. What ‘AI risk’ and even ‘existential risk from artificial superintelligence’ omits to communicate is the fundamental nature of the risk: that the true risk is of the simple accident of deploying a singularly capable machine with a poorly designed objective function – not something malicious or fantastical. This central point is not communicated by the label, giving the priors of the newcomer free reign over the interpretation, facilitating the ‘dismissal by science fiction’.

Using ‘AI accidents’, it is directly implied that the risk involves no malicious intent. Moreover, one can point to existing examples of AI accidents, such as racist algorithms or the 2010 Flash Crash. AI accidents slightly higher on the capability scale are believable accidents: a housekeeping robot cooking the cat for dinner is an accident well within reach of imagination; likewise the AI that fosters war to maximise its profit objective. Using ‘AI accidents’ thus creates a continuous scale populated by existing examples and facilitates arrival at the comprehension of misaligned superintelligence by simple, believable steps of induction. The framing as an accident on the upper, yet-to-be-realised part of a scale arguably makes the idea feel more tangible than ‘existential risk’.

Shaping policy discussion and research norms

Missed opportunity: This reframing should confer some immediate practical benefits. Since most policy-making organisations are likely to be composed of a mix of sceptics, the concerned, and newcomers to the subject, it may be socially difficult to have frank policy discussion on potential risks from artificial superintelligence; an ill-received suggestion of existential risk from AI may be dismissed as science fiction or ridiculed. If it exists, this difficulty would be especially marked in organizations with pronounced hierarchy (a common attribute of e.g. governments), where there is a greater perceived social cost to making poorly received suggestions. In such organizations, concerns of existential risk from artificial superintelligence may thus be omitted from policy discussion or relegated to a weaker mention than if framed in terms of AI accidents involving arbitrary levels of intelligence. The ‘AI accidents’ frame automatically introduces large scale AI accidents, making it an opt-out discussion item, rather than opt in.

“I think the right approach is to build the issue directly into how practitioners define what they do. No one in civil engineering talks about “building bridges that don't fall down.” They just call it “building bridges.” Essentially all fusion researchers work on containment as a matter of course; uncontained fusion reactions just aren't useful. Right now we have to say “AI that is probably beneficial,” but eventually that will just be called “AI.” [We must] redirect the field away from its current goal of building pure intelligence for its own sake, regardless of the associated objectives and their consequences.” (Bohannon, 2017).

How to realise Russell’s edict? Seth Baum discusses framing as an ‘intrinsic measure’ to influence social norms in AI research to pursue beneficial designs and highlights the importance not only of what is said, but how something is said (Baum, 2016). For engineers, it would be strangely vague to talk about ‘car risk’, ‘bridge risk’ or other broad terms. Instead, they talk about reducing the risk of car accidents or bridge collapses – referring explicitly to the event that they are responsible for mitigating and precluding competing ideas, e.g. the risks from mass use of cars on air pollution, or from disruption to a horse-driven economy. The same should be true for AI. The ‘AI accidents’ frame moves thinking away from abstract argument and analogy and brings the salient concepts closer to the material realm. Giving AI researchers a clear, available, and tangible idea of the class of events they should design to avoid will be important to engender safe AI research norms.

Seizing the opportunity

The count noun and mass noun senses of ‘AI risk’ and ‘existential risk from AI’ etc. still have their place. But opportunities should be sought for the ‘AI accidents’ frame where it is appropriate. Without being prescriptive (and cognisant that not all catastrophic AI risks are of catastrophic AI accidents), instead of ‘reducing AI risk’ or ‘reducing existential risk from AI’, the policy, strategy, and technical AI safety community would claim to work on reducing the risk of AI accidents, at least where they are not also working on other risks.

Shifting established linguistic habits requires effort. The AI safety community is relatively small and cohesive, so establishing this subtle but potentially powerful change in frame at a community level could be an achievable aim. By driving a shift in terminology, a goal of wider adoption by other groups such as policy makers, journalists, and AI researchers is within reach.

Footnotes

[1] For comment and review, I am grateful to Nick Robinson, Hannah Todd, and Jakob Graabak.

[2] In some links, other AI risks are discussed elsewhere in the texts, but nevertheless the sense in which ‘AI risk’ was used was actually the third sense. The list is not exhaustive.

Meh, that makes it sound too narrowly technical - there are a lot of ways that advanced AI can cause problems, and they don't all fit into the narrow paradigm of a system running into bugs/accidents that can be fixed with better programming.

This seems unnecessarily rude to me, and doesn't engage with the post. For example, I don't see the post anywhere characterising accidents as only coming from bugs in code, and it seems like this dismissal of the phrase 'AI accidents' would apply equally to 'AI risk'.

For example, I don't see the post anywhere characterising accidents as only coming from bugs in code, and it seems like this dismissal of the phrase 'AI accidents' would apply equally to 'AI risk'.

But I didn't say that the author is characterizing accidents as coming from bugs in code. I said that the language he is proposing has that effect. The author didn't address this potential problem, so there was nothing for me to engage with.

it seems like this dismissal of the phrase 'AI accidents' would apply equally to 'AI risk'.

It does in fact apply, since AI risk neglects important topics in AI ethics, but it doesn't apply as strongly as it would for "AI accidents."

Hi Kyle, I think that it's worth us all putting effort into being friendly and polite on this forum, especially when we disagree with one another. I didn't find your first comment informative or polite, and just commented to explain why I down-voted it.

I didn't find your first comment informative or polite, and just commented to explain why I down-voted it.

Yeah, and now I'm commenting to explain why I downvoted yours, and how you are failing to communicate a convincing point. If you found my first comment "rude" or impolite then you've lost your grip on ordinary conversation. Saying "meh" is not rude, yikes.

Thanks Ben, for telling us that communities of do-gooders should be considerate. But I wasn't inconsiderate. If you linked an article titled "why communities of do-gooders should be so insanely fragile that they can't handle a small bit of criticism" then it would be relevant.

I agree that more of both is needed. Both need to be instantiated in actual code, though. And both are useless if researchers don't care implement them.

I admit I would benefit from some clarification on your point - are you arguing that the article assumes a bug-free AI won't cause AI accidents? Is it the case that this arose from Amodei et al.'s definition?: “unintended and harmful behavior that may emerge from poor design of real-world AI systems”. Poor design of real world AI systems isn't limited to being bug-free, but I can see why this might have caused confusion.

are you arguing that the article assumes a bug-free AI won't cause AI accidents?

I'm not - I'm saying that when you phrase it as accidents then it creates flawed perceptions about the nature and scope of the problem. An accident sounds like a onetime event that a system causes in the course of its performance; AI risk is about systems whose performance itself is fundamentally destructive. Accidents are aberrations from normal system behavior; the core idea of AI risk is that any known specification of system behavior, when followed comprehensively by advanced AI, is not going to work.

You will have to be sure that the researchers actually know what you mean though. AI researchers are already concerned about accidents in the narrow sense, and they could respond positively to the idea of preventing AI accidents merely because they have something else in mind (like keeping self driving cars safe or something like that).

If accept this switch to language that is appealing at the expense of precision then eventually you will reach a motte-and-bailey situation where the motte is the broad idea of 'preventing accidents' and the bailey is the specific long-term AGI scheme outlined by Bostrom and MIRI. You'll get fewer funny looks, but only by conflating and muddling the issues.

This makes a lot of sense to me - people usually give me a funny look if I mention AI risks. I'll try mentioning "AI accidents" to fellow public policy students and see if that phrase is more intuitive.

However, "AI accidents" don't communicate the scale of a possible disaster. Something like "global catastrophic AI accidents" may be even clearer. Or "permanent loss of control of a hostile AI system".

Potentially money gets mis-allocated: Just like all chemistry got rebranded nanotech during that phase in the 2000, if there is money in AI safety, computer departments will rebrand research as AI safety to prevent AI accidents. This might be a problem when governments start to try and fund AI Safety.

I personally want to be able to differentiate different types of work, between AI Safety and AGI Safety. Both are valuable, we are going to living in a world of AI for a while and it may cause catastrophic problems (including problems that distract us from AGI safety) and learning to mitigating them might help us with AGI Safety. I want us to be able continue to look at both as potentially separate things, because AI Safety may not help much with AGI Safety.

I think this proposition could do with some refinement. AI safety should be a superset of both AGI safety and narrow-AI safety. Then we don't run into problematic sentences like "AI safety may not help much with AGI Safety", which contradicts how we currently use 'AI safety'.

To address the point on these terms, then:

I don't think AI safety runs the risk of being so attractive that misallocation becomes a big problem. Even if we consider risk of funding misallocation as significant, 'AI risk' seems like a worse term for permitting conflation of work areas.

Yes, it's of course useful to have two different concepts for these two types of work, but this conceptual distinction doesn't go away with a shift toward 'AI accidents' as the subject of these two fields. I don't think a move toward 'AI accidents' awkwardly merges all AI safety work.

But if it did: The outcome we want to avoid is AGI safety getting too little funding. This outcome seems more likely in a world that makes two fields of N-AI safety and AGI safety, given the common dispreference for work on AGI safety. Overflow seems more likely in the N-AI Safety -> AGI Safety direction when they are treated as the same category than when they are treated as different. It doesn't seem beneficial for AGI safety to market the two as separate types of work.

Ultimately, though, I place more weight on the other reasons why I think it's worth reconsidering the terms.

The agi/narrow ai distinction is beside the point a bit, I'm happy to drop it. I also have an AI/IA bugbear so I'm used to not liking how things are talked about.

Part of the trouble is we have lost the marketing war before it even began, every vaguely advanced technology we have currently is marketing itself as AI, that leaves no space for anything else.

AI accidents brings to my mind trying to prevent robots crashing into things. 90% of robotics work could be classed as AI accident prevention because they are always crashing into things.

It is not just funding confusion that might be a problem. If I'm reading a journal on AI safety or taking a class on AI safety what should I expect? Robot mishaps or the alignment problem? How will we make sure the next generation of people can find the worthwhile papers/courses?

AI risks is not perfect, but is not at least it is not that.

Perhaps we should take a hard left and say that we are looking at studying Artificial Intelligence Motivation? People know that an incorrectly motivated person is bad and that figuring out how to motivate AIs might be important. It covers the alignment problem and the control problem.

Most AI doesn't look like it has any form of motivation and is harder to rebrand as such, so it is easier to steer funding to the right people and tell people what research to read.

It doesn't cover my IA gripe, which briefly is: AI makes people think of separate entities with their own goals/moral worth. I think we want to avoid that as much of possible. General Intelligence augmentation requires its own motivation work, but one so that the motivation of the human is inherited by the computer that human is augmenting. I think that my best hope is that AGI work might move in that direction.

AI accidents brings to my mind trying to prevent robots crashing into things. 90% of robotics work could be classed as AI accident prevention because they are always crashing into things.

It is not just funding confusion that might be a problem. If I'm reading a journal on AI safety or taking a class on AI safety what should I expect? Robot mishaps or the alignment problem? How will we make sure the next generation of people can find the worthwhile papers/courses?

I take the point. This is a potential outcome, and I see the apprehension, but I think it's a probably a low risk that users will grow to mistake robotics and hardware accidents for AI accidents (and work that mitigates each) - sufficiently low that I'd argue expected value favours the accident frame. Of course, I recognize that I'm probably invested in that direction.

Perhaps we should take a hard left and say that we are looking at studying Artificial Intelligence Motivation? People know that an incorrectly motivated person is bad and that figuring out how to motivate AIs might be important. It covers the alignment problem and the control problem.

Most AI doesn't look like it has any form of motivation and is harder to rebrand as such, so it is easier to steer funding to the right people and tell people what research to read.

I think this steers close to an older debate on AI “safety” vs “control” vs “alignment”. I wasn't a member of that discussion so am hesitant to reenact concluded debates (I've found it difficult to find resources on that topic other than what I've linked - I'd be grateful to be directed to more). I personally disfavour 'motivation' on grounds of risk of anthropomorphism.

I take the point. This is a potential outcome, and I see the apprehension, but I think it's a probably a low risk that users will grow to mistake robotics and hardware accidents for AI accidents (and work that mitigates each) - sufficiently low that I'd argue expected value favours the accident frame. Of course, I recognize that I'm probably invested in that direction.

I would do some research onto how well sciences that have suffered brand dilution do.

As far as I understand it Research institutions have high incentives to

You have to frame things with that in mind, give incentives so that people do the hard stuff and can be recognized for doing the hard stuff.

Nanotech is a classic case of a diluted research path, if you have contacts maybe try and talk to Erik Drexler, he is interested in AI safety so might be interested in how the AI Safety research is framed.

I think this steers close to an older debate on AI “safety” vs “control” vs “alignment”. I wasn't a member of that discussion so am hesitant to reenact concluded debates (I've found it difficult to find resources on that topic other than what I've linked - I'd be grateful to be directed to more). I personally disfavour 'motivation' on grounds of risk of anthropomorphism.

Fair enough I'm not wedded to motivation (I see animals having motivation as well, so not strictly human). It doesn't seem to cover Phototaxis which seems like the simplest thing we want to worry about. So that is an argument against motivation. I'm worded out at the moment. I'll see if my brain thinks of anything better in a bit.