Rough blog posts to summarize my research as I try to build a model of how things work that's good enough for me to start acting in the world.

Paths to singleton: a hierarchical conceptual framework

There are lots of arguments out there about AI risk and likely AI takeoff scenarios, and it’s often hard to compare them, because they tacitly make very different assumptions about how the world works. This is an attempt to bridge those gaps by constructing a hierarchical conceptual framework that:

Contextualizes differing arguments as the result of differing world models.

Provides an underlying world model, for which those differing models are special cases.

The framework is hierarchical in order to make it clear when and how any given argument or piece of evidence is relevant to the proximate and ultimate outcomes described.

These goals are very ambitious, and this is only a first attempt. It is of limited scope, in order to keep its complexity manageable; it does not explicitly address questions of AI safety. Instead, it deals with the comparatively circumscribed question of what an AI takeoff is likely to look like, strictly in terms of how many and what kinds of agents are likely to acquire and retain power, and by what means.

It begins with an overview giving a very rough summary of the domain to be covered, and a sketch of an argument for why this domain is relevant. These are followed by a more detailed approach dividing up the conceptual space and exploring each possibility in the specified disjunctions.

This is not an empirical investigation. No ultimate conclusions about outcomes are drawn. The point is to build intuitions for what kind of observations or arguments are evidence in what direction, and what outcomes are likely or unlikely to occur together.

Overview

Domain outline

Singleton - A single superintelligent agent controls everything. A singleton, once achieved, is overwhelmingly likely to be a stable outcome.

Stable multipolar outcome - In a multipolar scenario, multiple agents each have control over some resources. A multipolar scenario could persist stably, or could ultimately produce a singleton. Humanity’s current situation could be described as a potentially unstable multipolar arrangement.

A simple way to model a multipolar situation is to assume that agents attempt to maximize the rate at which their power increases. There are three possible dynamics that could emerge from this model:

A singleton is infeasible or prohibitively expensive.

A singleton is so feasible that it is likely to happen even if no one deliberately sets out to create one.

A singleton is feasible, but would take an extraordinary effort to achieve.

However, if an agent with foresight expects that a singleton is possible but not inevitable, that agent might make an extraordinary investment to capture the gains of controlling singleton - or to prevent others from doing the same.

Background assumptions

This framework make some scope-limiting assumptions that are not explained or justified in this writeup:

Eventually, agent-like artificial general intelligences smarter than humans will be constructed.

These agents can in principle be arbitrarily smart.

These agents will behave as though they have utility functions, whether or not they are constructed in a way as to make that utility function explicit.

The more intelligent an agent, the more power it can exert to align real-world outcomes with its utility function.

Agents can be constructed to function well with arbitrary utility functions. (The orthogonality thesis.)

Any agent with an utility function will try to maximize the resources with which it can implement that utility function, and minimize threats to the fulfillment of its values (Convergent instrumental goals or basic AI drives.)

Agents with the same utility function will coordinate well enough that they can effectively be treated as a single agent.

Any intelligence that does not function as though it had a meaningful utility function is either irrelevant, or a resource that can be controlled by an agent that does.

Most of these, or the intuitions behind them, are explained in Nick Bostrom’s Superintelligence.

This document also does not deal with cases of AI destroying the world through errors of practical judgment or fundamental misunderstanding about the nature of reality. An AI powerful enough to acquire substantial steering power over the future, that makes apparent mistakes relative to its utility function’s overt preferences, is not treated differently here from one that truly prefers the apparently perverse outcome, having full understanding of the consequences of its actions.

Understanding the fastest likely path to a singleton is important for AI safety.

An intelligence explosion might result in a singleton.

Understanding whether and why this might happen should help determine what potential prudential measures would be helpful.

Any established singleton would have both adequate incentive and ability to suppress the creation of any other similarly powerful agents. This simplifies the underlying dynamic in two ways:

Multipolar situations can become singletons, but singletons are always stable.

Of all feasible paths to a singleton, only the fastest one matters.

An AI takeoff might result in a singleton.

The world we inhabit has a variety of agents, of varying levels of power, trading with one another. Why don’t the stronger ones just take everything from the weaker ones? Because it’s expensive to conquer, and the most important resource is often the other person, and a lot of value is destroyed by enslaving them.

It’s conceivable that future AIs - which may occupy a much broader range of power levels - could violate both these conditions. If one AI is sufficiently stronger than each rival agent, it could conceivably conquer them all in turn with comparatively little cost relative to the resources acquired. Since digital intelligences can easily be copied, it could potentially easily make use of those resources as efficiently as the original possessor. In this case we should expect such an agent to directly seize total control of the world as quickly as possible.

Paths to singleton should inform current safety measures.

Understanding why an intelligence explosion would result in a singleton or a stable multipolar outcome should help determine present actions.

We may want to prioritize different safety measures depending on whether a singleton or multipolar outcome is likely. For instance, in a singleton scenario, it is extremely important that the decisionmaking AI have values that are perfectly compatible with the continued existence of humans, and that might be the only relevant problem to solve, while in a multipolar scenario it may be more important to set up institutions that provide a continued incentive for cooperation among different agents including humans.

Knowing what makes a singleton more or less likely would also be helpful if it turns out that a singleton is substantially more or less likely to be safe for humans than a multipolar outcome.

Foresight and AI takeoff

To accurately forecast the fate of a burning building and its occupants, one must take into account likely interventions by firefighters and other emergency responders. However, to forecast their actions, it can be helpful to have some understanding of what would happen in the counterfactual scenario where such agents do not intervene. Similarly, the ultimate outcome of AI development may reflect large investments of resources by outside parties such as philanthropists, governments, or for-profit corporations, beyond what a narrowly economic view of AI progress would predict. However, the underlying strategic landscape informing such investments will be shaped by the likely outcomes in a world where such investments were not made.

The AI strategic landscape in the absence of pervasive AI safety norms or other global coordination can be modeled in three stages, with increasing levels of complexity:

No foresight - Agents have no strategic foresight, and follow a “greedy strategy” based on short-term rewards.

Nonsocial foresight - Each agent has strategic foresight about considerations in stage 1. Such an agent might perceive an opportunity to pursue intelligence gains at the expense of short-run growth in economic power, in order to create an intelligence advantage large enough to establish a singleton. It might also act to prevent or retard the ascent of another agent to singleton status. However, this “long-run” strategic thinking only anticipates other agents acting according to short-run incentives.

Social foresight - Agents with strategic foresight anticipate each other’s actions, and can try to cooperate, compete, or fight, taking into account the likely response of other agents with strategic foresight.

Most of the detail in this outline is in the first section, on the situation without foresight, because much of the detail at the other two levels will depend on the nature of the foresight-free dynamic.

No foresight

Progress in AI might be smooth leading up to, around, and past the human level. This could be the case even if progress ultimately accelerates far past a level where humans can make sense of what is going on. Alternatively, there might be important thresholds where a single AI can make sudden, uneven progress substantially above its nearest competitors by acquiring access to a large quantity of inputs such as hardware, researchers, or training data:

Each AI project at each point in time has some total endowment of resources such as hardware, software, data, money (which can be used to buy many of the other things on this list), and a some things that roughly fall into the business accounting category of "goodwill" (e.g. existing relationships with researchers, coordination practices, documentation). These resources can be used to increase the total quantity of resources available. Early in the process, we should expect much of the AI improvement work to be done by humans. Later on, as AI capabilities exceed human researchers’ abilities, we should expect more of the work to be done by AIs themselves.

Smooth progress

An AI project could make smooth incremental gains in a few ways:

Recursive self-improvement - an AI could think about ways to make itself smarter, thus improving its ability to do everything, including self-improve.

Acquiring access to external resources such as computing hardware:

Trade - an AI could have a comparative advantage in producing some goods or services and trade them for external inputs.

Prestige - top AI researchers might want to work for the most advanced AI projects, so the more successful a project is, the more ability it might have to hire the best researchers.

Much project-specific AI progress is algorithmic, while AI projects typically obtain computing capacity from external sources (mostly by purchasing it as a market good), but this distinction may not always be so clear. However, it provides a reasonable way to begin thinking about how much of the enhancement of an AI’s productive capacity comes from project-specific work vs external resources.

Recursive self-improvement

Arguments for AI risk through an intelligence explosion often take a form similar to that formulated by IJ Good:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

Algorithmic improvement (and other AI research such as specialized hardware design) could either mostly take the form of a service or product provided by specialized algorithm-improvers, or the most generally intelligent AIs might be much better at improving their own processes than less-intelligent specialists. In the latter case, we should expect that much cutting-edge AI progress will come from self-improvement.

Trade

One way to think about AI is as part of a broader economy, in which many inputs are combined in different ways to produce things of value.

The progress made by the most advanced AI teams could be shared to varying degrees.

AI as economic technology - New algorithms and other relevant intellectual property are offered for sale as soon as they are created, and revenue from licensing advanced AI software is the main incentive for researchers to continue working on AI. This would favor a multipolar outcome.

AI as project-specific factor of production - The most powerful AI projects have large project-specific endowments that are not shared. Each agent participating in the economy would invest in its own intelligence with other complementary goods, in whatever proportion maximizes the agent’s productivity growth. This could favor a singleton.

AI as technology

It could be most profitable for AI projects to sell their algorithmic advances as soon as they are marketable. If selling advanced AI algorithms as intellectual property is the wealth-maximizing strategy, and no single project can do better by hoarding its own algorithmic progress to use in idiosyncratic self-improvement, then the overall level and nature of AI progress will matter, but individual AI projects will not be an interesting unit of analysis.

If AI progress is mostly disseminated through markets, then most concentration of power due to AI will happen via ordinary accumulation of economic resources, and any other agent with access to capital should be able to replicate the success of advanced AI projects, unless the returns to investment accelerate so rapidly that a single agent quickly acquires most economic resources before competitors have time to adjust.

Even if AI teams attempt to hold onto trade secrets, it could be difficult enough for the leading team to conceal the exact form of their algorithms, that other teams might be able to perform “catch-up” research into things known to be effective faster than the leading team can explore a truly unknown space of possibilities.

AI as project-specific factor of production

If a large component of AI progress is project-specific and not available for purchase, then the foremost AI projects will have access to a resource that it is difficult for competitors to replicate. This would make catch-up efforts difficult but not necessarily impossible, especially if there are diminishing gains to intelligence at any given level of economic progress. If most AI progress is project specific, then it is plausible that at some point a leading project could convert from a wealth-maximizing to an intelligence-maximizing strategy, reallocating all of its AI output to self-improvement, and thus achieving very rapid gains over the competition at a sustainable cost for a substantial amount of time, sufficient to establish a singleton.

The leading AI projects might still tend to attain the same level of optimization power around the same time, for three reasons:

Renting out intelligence

Leaks

Diminishing returns

Renting out intelligence

An AI project that keeps its own algorithmic insights secret might nonetheless find that the most profitable use of its AI is to rent it out as a service. This could in principle enable monied rivals to construct an AI using the most powerful intelligence available, and would in practice strongly resemble the “AI as technology” scenario.

Leaks

Even if AI teams attempt to hold onto trade secrets, it could be difficult enough for the leading team to conceal the exact form of their algorithms, that other teams might be able to perform “catch-up” research into things known to be effective faster than the leading team can explore a truly unknown space of possibilities. This would resemble the “AI as technology” scenario.

Diminishing returns

In a static environment, there might be diminishing returns to intelligence, so that at each time there is some profit-maximizing level of research progress beyond which marginal costs exceed marginal benefits. As the size of the total economy increases, the total value of any given process improvement might increase, making further investments in intelligence profitable. Subtler cognitive abilities, or small improvements in efficiency, might become worth the expense of inventing them as total size of the market for them increases. (For example, while a single independent restaurant might find it profitable to pay a specialist to find small operational efficiencies worth half a cent per transaction, a large chain restaurant with thousands of locations might easily justify the expense.) AI projects lagging behind the profit-maximizing level of progress will find it profitable to push out to the contemporary frontier of development, projects that have advanced to or beyond it will not find it profitable to further increase their advantage with additional research.

However, this situation may not hold universally. If investment in intelligence were persistently more profitable than other available investments, and AI projects either could not borrow, or were large relative to the available pool of capital (which will happen eventually if leading AI projects grow fast enough), then the earnings-maximizing strategy and the intelligence-maximizing strategy for each AI project would become the same. At this point, if there were increasing returns to intelligence, the leading project could obtain an insurmountable advantage.

It is also conceivable that investment in intelligence could be persistently less profitable than the best alternative, in which case no AI research would be economically motivated.

Summary of possible trade-based smooth-progress outcomes

Conflict

Power can be acquired outside the existing economic system. For instance, an agent can influence policymakers to change laws, or can simply seize direct control of resources it does not own and conceal or defend them from attempts to enforce laws against such behavior.

An especially interesting case is one of acquiring subverting existing hardware owned by another party, and setting up a distributed computing network (a kind of “botnet”). The additional computing resources could be used to improve the AI’s algorithms faster, which in turn could increase the AI’s effectiveness at seizing better-defended resources.

Subverting other AIs working on AI enhancement research would be functionally similar to acquiring direct access to computing hardware.

A simple way to model AI growth through conflict is to suppose that there are many agents, each of which controls a fairly small share of the total resources. Each agent has some endowment of resources that allows them to exert some amount of optimization power at each time on increasing their total productive capacity through self-improvement, trade, conquest. Defending against conquest also requires optimization power. Once an agent is conquered, all of its resources become available to the conquering agent. In this model, a more powerful agent would conquer a less powerful one when the expected value of the resources acquired is greater than the expected opportunity cost of spending optimization power on conquest.

Depending on how costly in expectation conquest is, and how much this depends on the optimization power applied to defending the target hardware, cyberwarfare could be the dominant power-maximizing strategy, a substantial part of an AI’s total portfolio of power-increasing strategies, or largely unappealing as a route to power.

The true situation may be more complicated in several ways:

Concentration: Power might be concentrated among a few large agents. If a large share of a potential conqueror’s trading opportunities come from its potential conquest, the value of the foregone trades might add substantially to the opportunity cost of conquest.

Negotiation: Agents might be able to negotiate terms of surrender instead of fighting it out. This would lower the cost of conquest substantially. Agreements could be enforceable through reputation, or precommitment devices such as self-modification. An especially interesting case of the latter would be combining utility functions so that if the conquered agent has 30% of the total resources controlled by the pair, the conquering agent self-modifies to want to spend somewhere between 0% and 30% of resources on maximizing the conquered agent’s utility function.

Policing: Agents could coordinate to punish those who seize resources through conflict. Sufficiently powerful policing could make make some or all forms of conquest infeasible. If there were a non-negligible time delay in recovering stolen resources, a sufficiently powerful agent could conceivably benefit from conquest, but only if it were capable of seizing sufficient resources to enable it to successfully defend itself against policing actions.

Scorched earth: Some resources might not be transferable to the conqueror. Among humans, a large part of someone’s total productive capacity is their own person. If I eliminate my rival, I could seize their land and things, but not their ability to use these things to produce more wealth. This could be less relevant among AIs, because they could copy themselves or otherwise repurpose any seized hardware, which is likely to be more general-purpose and modular than human brains and bodies. Algorithmic insights can also be copied and applied indefinitely, once readable. On the other hand, potential victims of aggression could modify their algorithms or hardware in a way that would cause it to become useless in the event of conquest, which could make conquest worthless except as an extortionary threat. (Some non-modular types of AI such as whole-brain emulations might simply not have distinct algorithms that can be copied independently of their utility function.)

Switching costs: It might be costly to reallocate resources between different uses such as self-improvement, production and trade, conquest, and defense. A weaker AI specializing in conflict might thus be able to seize the resources of a substantially more powerful AI that had not previously allocated a substantial amount of resources to defense. This could result in a Prisoner’s Dilemma situation, where a group of agents would all be better off doing nothing but trading than they would be doing nothing but attack and defense, but the first agent to retool for conflict gains a substantial advantage.

Prestige

In the early stages of advanced AI’s development, most of the design work will likely be performed or directed by human researchers, because no more advanced intelligence will be available to do this work.

The best researchers often want to work with each other and work on the most advanced or important projects, so a project could plausibly accumulate enough of a track record, or enough impressive researchers, to become the preferred employer for AI researchers. Assuming that high-quality research is hard to buy, this could constitute a durable, substantial early advantage for that project. This could give the project a significant lead well before a “takeoff” event.

A prestige cascade is likely incompatible with total secrecy, as people need to know about a thing for it to be prestigious.

Sudden resource gain

One AI project could acquire a large amount of external resources quickly. These resources could in turn be used to quickly and substantially increase the intelligence of the AI. This might be enough of a lead for the AI project to bootstrap itself to a level where it is smart enough to seize total physical control over its local environment (e.g. the Earth and nearby astronomical bodies), eliminating plausible competitors. A large enough discontinuous endowment could thus potentially overcome a dynamic that otherwise strongly favors a multipolar outcome.

This section enumerates plausible paths by which an AI project could be the first to acquire such resources. In particular, it appears plausible that an AI project could uniquely accumulate a large amount of several different kinds of resource:

Algorithmic insights

Prestige

Money

Computing hardware

Training data

Note that these resources can be converted into each other, though not necessarily at consistent or favorable exchange rates.

Algorithmic insights

There might exist an algorithmic insight that makes further research developments much easier and enables rapid improvement in intelligence. An AI project that discovered such an insight would quickly be able to improve its AI substantially beyond what competitors might be capable of. Novel insights are difficult to predict in advance, because there is little difference between knowing what the insight is, and having it.

Prestige

The best researchers often want to work with each other and work on the most advanced or important projects. A project that successfully portrays itself as doing high-quality work might have access to researchers who might otherwise be unobtainable. Prestige might also attract funders or other collaborators.

If one project shows a sufficiently dramatic lead, this could conceivably give it such a prestige advantage as to make its lead insurmountable.

In the early stages of advanced AI’s development, most of the design work will likely be performed or directed by human researchers, because no more advanced intelligence will be available to do this work. However, prestige is likely to be less important later in the process once most of the work is being done by AI.

Prestige is likely incompatible with total secrecy, as people need to know about a thing for it to be prestigious. Some types of AI project might also be more compatible with prestige than others; for example, it might be difficult to persuade top researchers to work on a project designed to favor some narrow interest, and easier to persuade them to work on a humanitarian AI project.

Money

Money can be used to pay additional researchers, buy computing hardware, or buy additional training data. Conversions between most other resources, including ones not enumerated above, are likely to be mediated by money, so money may be preferable to any other available resource that the AI does not directly want to use. However, the corollary to this is that money has no direct use. The AI would always have to convert money into the desired good, and this may introduce complications and delays.

An AI project could plausibly acquire money very quickly through financial fraud or through legitimate market mechanisms.

Financial fraud

An AI project could extract a large amount of money from the global financial system through fraud. Since electronic fraud can be committed very quickly, the AI could acquire control of these funds before financial institutions, regulators, or other policymaking bodies have a chance to respond.

A sufficiently large fraudulent acquisition of assets might trigger a response sufficient to thwart this strategy even if the specific source of the acquisition is unknown. However, a smaller acquisition might not provide an unique advantage, as multiple AI projects could initiate the same strategy before the leading project is able to acquire the relevant hardware.

Market services

There might be some threshold or very narrow interval of general intelligence above which an AI could quickly learn to do any job a human can do, with higher quality, at lower cost. Unless the second most advanced AI project is very close behind the foremost one, this could quickly enable one project to outcompete most humans for most goods and services in the economy, which could very quickly be enormously profitable.

However, even if under a simple model of economic competition this would grant a single AI project an unbeatable advantage, in practice such a transition would likely require a large number of humans to cooperate with it. Such a transition might easily be thwarted by a prompt response by regulators or other governance bodies, or delayed long enough to allow competitors to catch up, so that multiple AI projects were past the threshold before the most advanced project could begin to exploit its advantage. It would be difficult to exploit such an advantage secretly.

A closely related case is that of financial speculation. Specialized automation is already used to to make money through trading on financial markets at a superhuman pace. It is plausible that there exists some threshold of ability past which an AI could suddenly outcompete most humans in this domain.

Computing hardware

At some point most AI research is likely to be performed by AIs. The rate at which AIs can perform research depends strongly on the capacity of the hardware on which they run. There may be diminishing returns to hardware at a given level of algorithmic progress, but a project focused on improving its own algorithms might continue to benefit steeply from additional hardware.

Large amounts of hardware could suddenly become available through a financial windfall, or through suddenly acquiring the ability to subvert a large amount of existing hardware owned by someone else.

Hardware purchased via financial windfall

If an AI acquires a lot of money (see above) and then purchases a large quantity of hardware for physical delivery, this is likely to be noticed, making it harder to keep progress a secret. In addition, this introduces substantial delay into the process. These disadvantages to the AI project might be somewhat mitigated by the availability of computing as a service, though the project would have to accept much higher prices for computing capacity (currently ten times the cost of directly acquiring hardware). These difficulties may go away as the market in computing as a service becomes more efficient and extensive, or production and delivery of hardware becomes faster and more automated.

Subverting hardware directly

Subverting existing hardware directly would be another way to acquire it.

The first general intelligence with the specialized skills necessary to subvert networked computing hardware might have enough of an advantage over existing defenses, to quickly acquire direct access to a large fraction of computing hardware. Much computing hardware could potentially be subverted unnoticed for a substantial period of time (e.g. personal computers during times that they are not in use by their owners).

This method would have some of the same advantages and disadvantages as purchasing computing as a service. The cost and speed might be higher or lower depending on the defensive capacity of the target system, and such an acquisition might be easier or harder to notice depending on the extent of cybersurveillance and economic surveillance.

Training data

The availability of training data is an important input into automated decisionmaking systems. A very large quantity of data about the world is available on the public internet, but there might be especially useful proprietary datasets for an AI project to use, either legitimately through economic resources or sponsorship by the owner of the data, or illegitimately through cyberattacks, sufficient to give their AI a large lead over competitors in developing real-world capabilities.

Nonsocial foresight

An agent with foresight and access to a large idiosyncratic pool of resources (e.g. a government, a corporation with an exogenous source of cash or income, or a wealthy individual) might survey the foresight-free strategic landscape and make investments to avert the default outcome, in order to better promote their long-term interests. What investments might such an agent make?

If a singleton is likely to occur by default, then an agent with foresight is likely to want to impede other AI projects, accelerate its own, or identify the leading projects and persuade them to use their AI to satisfy that agent’s goals.

If a singleton is likely unfeasible, then no special action results.

If a singleton is profitable but only with an extraordinary up-front investment, then an agent with foresight that does not expect other agents to have foresight is likely to make that investment.

Singleton occurs by default

Suppress rivals

General barriers

If the fastest feasible path to a singleton depends on an AI project seizing idiosyncratic resources, an agent with foresight could try to close that avenue of advantage. For example, it might promote better cybersecurity norms or defenses in order to make cyberconquest more difficult, or regulations to impede the rise of an AI monopoly.

Targeted attacks

If the leading AI project is identifiable in advance, a rival agent might attempt to sabotage the leading project, either legally through PR attacks or “poaching” top researchers, or illegally through more direct methods.

Effects of suppression

Suppression of a likely singleton could result in:

Making a nominally slower but less detectable or otherwise less suppressible path towards singleton the fastest feasible path, by blocking a nominally faster one that is easier to suppress.

Allowing the same process to eventually produce a singleton, but more slowly.

Preventing a singleton altogether by enabling other AI projects to overcome some of the foremost AI project’s lead.

Invest in extraordinary progress

An agent might try to prevent a singleton that does not promote its interests, by pre-empting it with one that does.

An agent with foresight that believed a singleton was inevitable might decide to invest an extraordinary amount, beyond what is profitable in the short run, in order to capture the long-term gains of controlling a singleton. Considered in isolation, such an investment would be nothing but another resource infusion such as those covered in the “sudden resource acquisition” section.

Espionage

If the fastest path to a singleton is detectable far enough in advance, an agent with foresight could attempt to spy on the leading AI project in order to copy their algorithmic progress. This could potentially reduce the foremost AI project’s lead enough to prevent a singleton.

Persuasion

If potential singletons are likely to be detectable, or the pool of plausible candidates is small enough and in large part knowable, an agent with foresight might attempt to directly persuade the relevant AI projects to design an AI that satisfies the agent’s goals. Current attempts to promote a commitment to safety and a focus on value-alignment problems among current AI researchers appear largely consistent with this strategy.

Singleton feasible but requires extraordinary investment

Even if a singleton is not likely to occur by default, there might be some possible level of exogenous investment at which a singleton could be created, by giving one AI project a sufficiently large head start over others. An agent that believed its interests would be much better served by controlling a singleton than by a multipolar scenario, would have an incentive to invest a large amount of capital in creating an AI project with a sufficient lead over competitors to establish a singleton. This behavior is similar to that in the “invest in extraordinary progress” scenario above.

Singleton infeasible

It is conceivable that maintaining a large lead over other AI projects, or establishing a singleton after attaining such an advantage, would be costly enough to make spending resources on establishing a singleton unprofitable relative to a combination of self-investment and trade. In that scenario, an agent with foresight would not invest in developing a large lead over other AI projects, except as a way of serving other interests such as prestige, or creating positive externalities.

Social foresight

If agents with foresight anticipate each other’s likely behavior, what additional consequences might this have? This section is not a comprehensive overview, but a loose sketch of some dynamics that might emerge promoting a singleton, some that might make a singleton less likely, and some other kinds of potential responses worth exploring.

Dynamics that would promote a singleton

AI arms race

If a singleton is feasible, then even if it does not appear likely to happen by default based on short-run economic incentives, actors with foresight might invest in creating one in order to capture outsized long-run benefits by having all available resources used to satisfy their preferences. Other agents with foresight, anticipating this possibility, then face an even stronger incentive to create a singleton, because they no longer perceive themselves as facing a choice between controlling a singleton and participating in a multipolar outcome with mutually beneficial trade, but between winning everything and getting nothing. In this case, the equilibrium scenario may be an “arms race” in which all parties try to create an AI singleton under their control as quickly as possible, investing all of their resources in this project.

Differences in opinion about the feasibility of a singleton are likely to amplify this tendency: if even one AI project believes that a singleton would be profitable, other AI projects have an incentive to invest in winning the arms race, in self-defense. This works even if every other AI project believes that the initiator is excessively optimistic, and that the resource expenditure necessary to create a singleton is not worth the gains over a multipolar outcome.

The possibility of secrecy is likely to amplify the effect of differences of opinion: AI projects not only need to worry about the most ambitious project they know about, but the most ambitious project they do not know about.

Domination of lesser AIs

If a more powerful AI can, at some cost to itself in time or other resources, directly seize resources from a less powerful rival, then it can also use the threat of doing so to extort cooperation from weaker AIs. This would leave the targets of such extortion stronger than they otherwise would be in the short run, but ultimately benefit stronger AIs over weaker ones, and the strongest one most of all.

Dynamics that would promote a multipolar outcome

Coordinated singleton suppression

If one AI project appears well positioned to create a singleton, other AIs may coordinate to suppress it, either in order to preserve a multipolar outcome, or each in the hope that they might individually form a singleton later, or from some combination of these motives.

This type of suppression is only possible if a leading AI project will be detectable early enough for other projects to take effective action against it.

Resource sharing

AI projects that do not believe that they could more profitably form a singleton might decide to coordinate by sharing algorithmic insights. This could cause them to advance faster than AI projects not participating in such a trade arrangement.

Such a free trade coalition could prevent an arms race dynamic from re-emerging among members of the coalition by enforcing stronger information sharing arrangements, and by regulating the use of resources other than algorithms.

Other potential coordination responses

Trading utility functions

If the most advanced AI projects resemble agents implementing utility functions, projects that wish to avoid the cost of conflict might decide to “trade” by jointly adopting a compromise utility function.

Universal regulation

Powerful individual agents and groups could promote an universal AI regulatory regime, whether formal or informal, to make all leading projects more likely to be compatible with human values.

11 comments on “Paths to singleton: a hierarchical conceptual framework”

> For instance, in a singleton scenario, it is extremely important that the decisionmaking AI have values that are perfectly compatible with the continued existence of humans, and that might be the only relevant problem to solve, while in a multipolar scenario it may be more important to set up institutions that provide a continued incentive for cooperation among different agents including humans.

It seems to me that, if you had a control measure that would cause multiple superintelligent AGIs to cooperate with humans, you could repurpose this control measure to cause a single superintelligent AGI to directly act on behalf of humans. At the very least, if some group had a large capability advantage over the rest of the world, they could run a "crystal society" AGI made out of many AGIs and have the resulting system maximize human values. Do you have different intuitions here?

In the other direction, if you can cause a superintelligent AGI to act on behalf of a human, then if you give every human (or more realistically, group of humans) access to a superintelligent AGI that acts on their behalf (using proper decision theory), the result will be that the AGIs cooperate with each other to maximize the values of humans. So it seems like both solutions work in both cases.

Currently the main way I think singleton/multipolar would affect my research is to determine how important it is for the solution to be highly efficient (see https://medium.com/ai-control/technical-and-social-approaches-to-ai-safety-5e225ca30c46#.d2tjx635j). If it's possible for a small group to get a large capability advantage over the rest of the world, then they can implement relatively inefficient pivotal acts (https://arbital.com/p/pivotal/) such as inventing nanotechnology and using it to upload a human brain (and run it at a very fast speed). Such a solution would not work in a multipolar scenario; nanotechnology would have already been invented long before this point and been used for more impactful actions than uploading a human brain.

It seems to me that, if you had a control measure that would cause multiple superintelligent AGIs to cooperate with humans, you could repurpose this control measure to cause a single superintelligent AGI to directly act on behalf of humans.

This seems plausible but not overwhelmingly likely. I think the example of corporations might be instructive. There are some bad consequences from modern financial capitalism but these consequences are much worse when there's a monopoly or collusion situation.

One example of an outcome I can imagine ending up in that doesn't satisfy this requirement: let's say that through some path-dependent process we end up with a few big AIs or firms, that spend most of their resources renting out individually much less powerful assistive AIs to humans. These assistive AIs lose a lot of efficiency to elaborate monitoring and safeguards. Any one of the big AIs or firms, if acting in isolation, could in principle capture some big efficiency gains by converting its resources from providing market services, to a single optimizer, BUT is effectively surveilled and would be blocked by doing so by the combined power of all the assistive AIs operated by the other big AIs.

This is just one example and may be noncentral in lots of ways, so don't read too much into the details.

Currently the main way I think singleton/multipolar would affect my research is to determine how important it is for the solution to be highly efficient [...]. If it's possible for a small group to get a large capability advantage over the rest of the world, then they can implement relatively inefficient pivotal acts [...] such as inventing nanotechnology and using it to upload a human brain (and run it at a very fast speed). Such a solution would not work in a multipolar scenario; nanotechnology would have already been invented long before this point and been used for more impactful actions than uploading a human brain.

This seems right to me. Big unique advantages allow time to execute more complex maneuvers. Safety is likely to require complex maneuvers.

Thanks, lots of interesting material here, and most of it seems right to me.

My biggest question mark is perhaps terminological. I would have defined 'singletons' so as to include 'cooperative singletons' where the world acts in a consistently unified way as a result of sufficient cooperation between actors with originally distinct goals. One way of getting this is trading utility functions, but I think there are smoother ways for it to happen. I'm not sure whether you mean to count this as a singleton (in which case is it relevant how quickly it could be achieved?) or as a stable multipolar scenario?

A couple of other things I wasn't sure about:

> Of all feasible paths to a singleton, only the fastest one matters.

Seems wrong as written. Presumably the point is that we only need to get a singleton once, so faster paths pre-empt slower paths. But this only tells us that the fastest inevitable path matters -- if path A is fast and feasible, but in fact isn't pursued (or is actively blocked), slower paths now look relevant. Is this a difference of interpretation of the word 'feasible'?

> Since digital intelligences can easily be copied, it could potentially easily make use of those resources as efficiently as the original possessor. In this case we should expect such an agent to directly seize total control of the world as quickly as possible.

I think this is correct as written, but I wanted to highlight a possible failure mode for the assumption. If the utility function of the agent cares not just about long-term outputs, but values not seizing control of the world, this could provide a route for much more efficient use of resources by the original possessors.

Thanks for the critique! Here are some preliminary thoughts off the top of my head.

Ultimately I think the line between comparatively cooperative multipolar scenarios and singletons is a bit blurry, and that's a major weakness with the simplifying strategy of trying to cleanly partition the different possibilities. This writeup is least incomplete on the "no foresight" level of analysis. Without foresight I expect there's comparatively little incentive for many agents with different values to make an up-front investment in a system to help them stably cooperate. And a many-different-valued-agents outcome without such a system seems closer conceptually to a "stable conflict" scenario than a "singleton" scenario.

There's a class of strategies that's likely to be employed if a lot of different agents with different interests who expect that they might diverge in power levels later want to cooperate. I don't know exactly how to articulate the boundaries of this yet, but here's one example: at some point the majority of AIs agree to a power-sharing arrangement and cooperate to design an oversight system that can take them over at will, but does nothing but enforce the agreement. (There's a related scenario where they give it a considerably larger "blended" mandate. This is conceivably one way we could bootstrap from a comparatively weak superhuman friendly AI to a single universe-controlling FAI.)

An example of a sort of "essentially multipolar" scenario I mean to distinguish this from is a scenario where no one wants to bother fighting over resources because it's just much more profitable, for most of the remaining history of the universe, to expand rapidly outwards towards unexploited resources and then set up trade on an ad hoc basis; AIs never collaborate to build a single agreement-enforcing agent because the up front cost is just too high relative to occasional conflict. (Or a scenario where everyone's stably at war with everyone else but no one's really winning, because defense is easier than offense except when someone lets their guard down.)

This isn't fully fleshed out and I expect I'll want to put more thought into it.

On "only the fastest", you're right that I meant "feasible" to bear that load, which may have been unclear - good catch 🙂 I think I just meant to make a fairly narrow claim, that if you can show that X is faster than Y with the same initial allocation of resources, then until and unless X is somehow impeded, you can safely ignore Y.

On your third point (an AI valuing not taking control of the world), I agree that this is a thing that could change the dynamic a lot, but I think it's fairly unlikely to happen except as a strong AI safety measure (and therefore outside of scope for this writeup, though quite important in its own right). Does that sound right to you?

Eliezer Yudkowsky left this comment on Facebook - copying it over with permission:

>Sudden resource gain needs a "cracks protein folding and builds nanotechnology" section. More generally, critical tech thresholds or insights, if an AI that becomes able to model a domain can gain domain expertise very fast due to eg hardware overhang or RSI.