Superintelligence: Paths, Dangers, Strategies

Bostrom's "Superintelligence" is a great overview of some of the research and political/existential issues around the creation of a superintelligent system. He describes various paths to potentially creating an superintelligence, including some surprising ones like whole brain emulation and eugenics. Bostrom also has a thoughtful discussion about the different kinds of superintelligence we could expect to see - speed, quality, and collective.

Then he changes tack and starts to talk about some of the risks associated with exponential improving intelligence - including arms races, “breaking out of the box”, and goal-specification issues that could result in the entire universe being turned into paperclips. I particularly liked the idea of "honeypots" to catch rogue AIs. This leads Bostrom into his final section where he talks about how to think about specifying goals for artificial intelligence systems. The idea of a "windfall clause" for commercial entitites is elegant as well.

Overall, it’s a very thoughtful book, but I’m having trouble buying it. We have a hard enough time debugging simple software as it is now. What could possibly go wrong with building complex software that debugs itself?? This book pairs well with follow Oxfordian David Deutsch's "The Beginning of Infinity".

My highlights below

It is no part of the argument in this book that we are on the threshold of a big breakthrough in artificial intelligence, or that we can predict with any precision when such a development might occur. It seems somewhat likely that it will happen sometime in this century, but we don’t know for sure.

CHAPTER 1 - Past developments and present capabilities

Yet the prospect of continuing on a steady exponential growth path pales in comparison to what would happen if the world were to experience another step change in the rate of growth comparable in magnitude to those associated with the Agricultural Revolution and the Industrial Revolution.

Two decades is a sweet spot for prognosticators of radical change: near enough to be attention-grabbing and relevant, yet far enough to make it possible to suppose that a string of breakthroughs, currently only vaguely imaginable, might by then have occurred.

The mathematician I. J. Good, who had served as chief statistician in Alan Turing’s code-breaking team in World War II, might have been the first to enunciate the essential aspects of this scenario. In an oft-quoted passage from 1965, he wrote: Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control.

The methods that produced successes in the early demonstration systems often proved difficult to extend to a wider variety of problems or to harder problem instances. One reason for this is the “combinatorial explosion” of possibilities that must be explored by methods that rely on something like exhaustive search. Such methods work well for simple instances of a problem, but fail when things get a bit more complicated.

To overcome the combinatorial explosion, one needs algorithms that exploit structure in the target domain and take advantage of prior knowledge by using heuristic search, planning, and flexible abstract representations—capabilities that were poorly developed in the early AI systems.

Behind the razzle-dazzle of machine learning and creative problem-solving thus lies a set of mathematically well-specified tradeoffs. The ideal is that of the perfect Bayesian agent, one that makes probabilistically optimal use of available information. This ideal is unattainable because it is too computationally demanding to be implemented in any physical computer (see Box 1). Accordingly, one can view artificial intelligence as a quest to find shortcuts: ways of tractably approximating the Bayesian ideal by sacrificing some optimality or generality while preserving enough to get high performance in the actual domains of interest.

In the view of several experts in the late fifties: “If one could devise a successful chess machine, one would seem to have penetrated to the core of human intellectual endeavor.” This no longer seems so. One sympathizes with John McCarthy, who lamented: “As soon as it works, no one calls it AI anymore.”

The computer scientist Donald Knuth was struck that “AI has by now succeeded in doing essentially everything that requires ‘thinking’ but has failed to do most of what people and animals do ‘without thinking’ — that, somehow, is much harder!”

There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots. The world population of robots exceeds 10 million.

Intelligent scheduling is a major area of success. The DART tool for automated logistics planning and scheduling was used in Operation Desert Storm in 1991 to such effect that DARPA (the Defense Advanced Research Projects Agency in the United States) claims that this single application more than paid back their thirty-year investment in AI.

The Google search engine is, arguably, the greatest AI system that has yet been built.

CHAPTER 2 - Paths to superintelligence

We can tentatively define a superintelligence as any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest.

The idea of using learning as a means of bootstrapping a simpler system to human-level intelligence can be traced back at least to Alan Turing’s notion of a “child machine,” which he wrote about in 1950: Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.

We know that blind evolutionary processes can produce human-level general intelligence, since they have already done so at least once. Evolutionary processes with foresight — that is, genetic programs designed and guided by an intelligent human programmer — should be able to achieve a similar outcome with far greater efficiency.

The idea is that we can estimate the relative capabilities of evolution and human engineering to produce intelligence, and find that human engineering is already vastly superior to evolution in some areas and is likely to become superior in the remaining areas before too long. The fact that evolution produced intelligence therefore indicates that human engineering will soon be able to do the same. Thus, Moravec wrote (already back in 1976): The existence of several examples of intelligence designed under these constraints should give us great confidence that we can achieve the same in short order. The situation is analogous to the history of heavier than air flight, where birds, bats and insects clearly demonstrated the possibility before our culture mastered it.

The availability of the brain as template provides strong support for the claim that machine intelligence is ultimately feasible. This, however, does not enable us to predict when it will be achieved because it is hard to predict the future rate of discoveries in brain science.

A third path to greater-than-current-human intelligence is to enhance the functioning of biological brains. In principle, this could be achieved without technology, through selective breeding. Any attempt to initiate a classical large-scale eugenics program, however, would confront major political and moral hurdles. Moreover, unless the selection were extremely strong, many generations would be required to produce substantial results. Long before such an initiative would bear fruit, advances in biotechnology will allow much more direct control of human genetics and neurobiology, rendering otiose any human breeding program.

Lifelong depression of intelligence due to iodine deficiency remains widespread in many impoverished inland areas of the world — an outrage given that the condition can be prevented by fortifying table salt at a cost of a few cents per person and year.

Table 5 Maximum IQ gains from selecting among a set of embryos

There is, however, a complementary technology, one which, once it has been developed for use in humans, would greatly potentiate the enhancement power of pre-implantation genetic screening: namely, the derivation of viable sperm and eggs from embryonic stem cells.

More importantly still, stem cell-derived gametes would allow multiple generations of selection to be compressed into less than a human maturation period, by enabling iterated embryo selection. This is a procedure that would consist of the following steps:

Genotype and select a number of embryos that are higher in desired genetic characteristics.

Extract stem cells from those embryos and convert them to sperm and ova, maturing within six months or less.

Cross the new sperm and ova to produce embryos.

Repeat until large genetic changes have been accumulated.

In this manner, it would be possible to accomplish ten or more generations of selection in just a few years.

And some countries — perhaps China or Singapore, both of which have long-term population policies — might not only permit but actively promote the use of genetic selection and genetic engineering to enhance the intelligence of their populations once the technology to do so is available.

Far from being the smartest possible biological species, we are probably better thought of as the stupidest possible biological species capable of starting a technological civilization — a niche we filled because we got there first, not because we are in any sense optimally adapted to it.

CHAPTER 3 - Forms of superintelligence

We also show that the potential for intelligence in a machine substrate is vastly greater than in a biological substrate. Machines have a number of fundamental advantages which will give them overwhelming superiority. Biological humans, even if enhanced, will be outclassed.

Here we will differentiate between three forms: speed superintelligence, collective superintelligence, and quality superintelligence.

The simplest example of speed superintelligence would be a whole brain emulation running on fast hardware. An emulation operating at a speed of ten thousand times that of a biological brain would be able to read a book in a few seconds and write a PhD thesis in an afternoon. With a speedup factor of a million, an emulation could accomplish an entire millennium of intellectual work in one working day.

nothing in our definition of collective superintelligence implies that a society with greater collective intelligence is necessarily better off. The definition does not even imply that the more collectively intelligent society is wiser. We can think of wisdom as the ability to get the important things approximately right.

Collective superintelligence could be either loosely or tightly integrated. To illustrate a case of loosely integrated collective superintelligence, imagine a planet, MegaEarth, which has the same level of communication and coordination technologies that we currently have on the real Earth but with a population one million times as large. With such a huge population, the total intellectual workforce on MegaEarth would be correspondingly larger than on our planet. Suppose that a scientific genius of the caliber of a Newton or an Einstein arises at least once for every 10 billion people: then on MegaEarth there would be 700,000 such geniuses living contemporaneously, alongside proportionally vast multitudes of slightly lesser talents. New ideas and technologies would be developed at a furious pace, and global civilization on MegaEarth would constitute a loosely integrated collective superintelligence.

CHAPTER 5 - Decisive strategic advantage

Will one machine intelligence project get so far ahead of the competition that it gets a decisive strategic advantage — that is, a level of technological and other advantages sufficient to enable it to achieve complete world domination?

Since there is an especially strong prospect of explosive growth just after the crossover point, when the strong positive feedback loop of optimization power kicks in, a scenario of this kind is a serious possibility, and it increases the chances that the leading project will attain a decisive strategic advantage even if the takeoff is not fast.

CHAPTER 6 - Cognitive superpowers

On one estimate, we appropriate 24% of the planetary ecosystem’s net primary production.

In other words, assuming that the observable universe is void of extraterrestrial civilizations, then what hangs in the balance is at least 10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 human lives (though the true number is probably larger). If we represent all the happiness experienced during one entire such life with a single teardrop of joy, then the happiness of these souls could fill and refill the Earth’s oceans every second, and keep doing so for a hundred billion billion millennia. It is really important that we make sure these truly are tears of joy.

CHAPTER 8 - Is the default outcome doom?

Third, the instrumental convergence thesis entails that we cannot blithely assume that a superintelligence with the final goal of calculating the decimals of pi (or making paperclips, or counting grains of sand) would limit its activities in such a way as not to infringe on human interests.

The flaw in this idea is that behaving nicely while in the box is a convergent instrumental goal for friendly and unfriendly AIs alike. An unfriendly AI of sufficient intelligence realizes that its unfriendly final goals will be best realized if it behaves in a friendly manner initially, so that it will be let out of the box. It will only start behaving in a way that reveals its unfriendly nature when it no longer matters whether we find out; that is, when the AI is strong enough that human opposition is ineffectual.

The treacherous turn — While weak, an AI behaves cooperatively (increasingly so, as it gets smarter). When the AI gets sufficiently strong—without warning or provocation—it strikes, forms a singleton, and begins directly to optimize the world according to the criteria implied by its final values.

“But wait! This is not what we meant! Surely if the AI is superintelligent, it must understand that when we asked it to make us happy, we didn’t mean that it should reduce us to a perpetually repeating recording of a drugged-out digitized mental episode!” — The AI may indeed understand that this is not what we meant. However, its final goal is to make us happy, not to do what the programmers meant when they wrote the code that represents this goal.

We can call this phenomenon wireheading. In general, while an animal or a human can be motivated to perform various external actions in order to achieve some desired inner mental state, a digital mind that has full control of its internal state can short-circuit such a motivational regime by directly changing its internal state into the desired configuration: the external actions and conditions that were previously necessary as means become superfluous when the AI becomes intelligent and capable enough to achieve the end more directly.

The upshot is that even an apparently self-limiting goal, such as wireheading, entails a policy of unlimited expansion and resource acquisition in a utility-maximizing agent that enjoys a decisive strategic advantage.

In the first example, the proof or disproof of the Riemann hypothesis that the AI produces is the intended outcome and is in itself harmless; the harm comes from the hardware and infrastructure created to achieve this result. In the second example, some of the paperclips produced would be part of the intended outcome; the harm would come either from the factories created to produce the paperclips (infrastructure profusion) or from the excess of paperclips (perverse instantiation).

CHAPTER 9 - The control problem

Refinements to this setup are possible. Instead of trying to endow an AI with a final goal that refers to a physical button, one could build an AI that places final value on receiving a stream of “cryptographic reward tokens.”

To make the tests more stringent, “honeypots” could be strategically placed to create temptations for a malfunctioning AI to commit some easily observable violation. For instance, if an AI has been designed in such a way that it is supposed not to want to access the internet, a fake Ethernet port could be installed (leading to an automatic shutdown switch) just to see if they AI tries to use it.

Bertrand Russell, who spent many years working on the foundations of mathematics, once remarked that “everything is vague to a degree you do not realize till you have tried to make it precise.”

One special type of final goal which might be more amenable to direct specification than the examples given above is the goal of self-limitation. While it seems extremely difficult to specify how one would want a superintelligence to behave in the world in general — since this would require us to account for all the trade-offs in all the situations that could arise — it might be feasible to specify how a superintelligence should behave in one particular situation. We could therefore seek to motivate the system to confine itself to acting on a small scale, within a narrow context, and through a limited set of action modes. We will refer to this approach of giving the AI final goals aimed at limiting the scope of its ambitions and activities as “domesticity.”

For example, the process could be to carry out an investigation into the empirical question of what some suitably idealized version of us would prefer the AI to do. The final goal given to the AI in this example could be something along the lines of “achieve that which we would have wished the AI to achieve if we had thought about the matter long and hard.”

CHAPTER 10 Oracles, genies, sovereigns, tools

If, instead, “simply doing what it is programmed to do” means that the software behaves as the programmers intended, then this is a standard that ordinary software very often fails to meet.

CHAPTER 11 - Multipolar scenarios

A stylized empirical fact is that the total factor share of capital has for a long time remained steady at approximately 30% (though with significant short-term fluctuations). This means that 30% of total global income is received as rent by owners of capital, the remaining 70% being received as wages by workers. If we classify AI as capital, then with the invention of machine intelligence that can fully substitute for human work, wages would fall to the marginal cost of such machine-substitutes, which — under the assumption that the machines are very efficient — would be very low, far below human subsistence-level income. The income share received by labor would then dwindle to practically nil. But this implies that the factor share of capital would become nearly 100% of total world product.

However, in the contemporary world, many people have no wealth. This includes not only individuals who live in poverty but also some people who earn a good income or who have high human capital but have negative net worth. For example, in affluent Denmark and Sweden 30% of the population report negative wealth — often young, middle-class people with few tangible assets and credit card debt or student loans.

Again, because of the explosive economic growth during and immediately after the transition, there would be vastly more wealth sloshing around, making it relatively easy to fill the cups of all unemployed citizens. It should be feasible even for a single country to provide every human worldwide with a generous living wage at no greater proportional cost than what many countries currently spend on foreign aid.

A sad and dissonant thought: that in this Malthusian condition, the normal state of affairs during most of our tenure on this planet, it was droughts, pestilence, massacres, and inequality — in common estimation the worst foes of human welfare — that may have been the greatest humanitarians: they alone enabling the average level of well-being to occasionally bop up slightly above that of life at the very margin of subsistence.

CHAPTER 13 - Choosing the criteria for choosing

The dismal odds in a frontal assault are reflected in the pervasive dissensus about the relevant issues in value theory. No ethical theory commands majority support among philosophers, so most philosophers must be wrong. It is also reflected in the marked changes that the distribution of moral belief has undergone over time, many of which we like to think of as progress.

Yudkowsky has proposed that a seed AI be given the final goal of carrying out humanity’s “coherent extrapolated volition” (CEV), which he defines as follows: Our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Also, Yudkowsky thinks that it should require less consensus for the AI to prevent some particular narrowly specified outcome, and more consensus for the AI to act to funnel the future into some particular narrow conception of the good. “The initial dynamic for CEV,” he writes, “should be conservative about saying ‘yes,’ and listen carefully for ‘no.’”

Another objection is that there are so many different ways of life and moral codes in the world that it might not be possible to “blend” them into one CEV. Even if one could blend them, the result might not be particularly appetizing — one would be unlikely to get a delicious meal by mixing together all the best flavors from everyone’s different favorite dish. In answer to this, one could point out that the CEV approach does not require that all ways of life, moral codes, or personal values be blended together into one stew. The CEV dynamic is supposed to act only when our wishes cohere. On issues on which there is widespread irreconcilable disagreement, even after the various idealizing conditions have been imposed, the dynamic should refrain from determining the outcome.

The CEV approach is meant to be robust and self-correcting; it is meant to capture the source of our values instead of relying on us correctly enumerating and articulating, once and for all, each of our essential values.

The CEV proposal is not the only possible form of indirect normativity. For example, instead of implementing humanity’s coherent extrapolated volition, one could try to build an AI with the goal of doing what is morally right, relying on the AI’s superior cognitive capacities to figure out just which actions fit that description. We can call this proposal “moral rightness” (MR). The idea is that we humans have an imperfect understanding of what is right and wrong, and perhaps an even poorer understanding of how the concept of moral rightness is to be philosophically analyzed: but a superintelligence could understand these things better.

The important thing is to land in the right attractor basin.

CHAPTER 14 - The strategic picture

We have seen in earlier chapters that the introduction of machine superintelligence would create a substantial existential risk. But it would reduce many other existential risks. Risks from nature — such as asteroid impacts, supervolcanoes, and natural pandemics — would be virtually eliminated, since superintelligence could deploy countermeasures against most such hazards, or at least demote them to the non-existential category (for instance, via space colonization). These existential risks from nature are comparatively small over the relevant timescales. But superintelligence would also eliminate or reduce many anthropogenic risks. In particular, it would reduce risks of accidental destruction, including risk of accidents related to new technologies.

The ground for preferring superintelligence to come before other potentially dangerous technologies, such as nanotechnology, is that superintelligence would reduce the existential risks from nanotechnology but not vice versa.

There are several quite strong reasons to believe that the riskiness of an intelligence explosion will decline significantly over a multidecadal timeframe. One reason is that a later date leaves more time for the development of solutions to the control problem. The control problem has only recently been recognized, and most of the current best ideas for how to approach it were discovered only within the past decade or so (and in several cases during the time that this book was being written).

Consider, for example, the following argument template for proceeding with research to develop a dangerous technology X. (One argument fitting this template can be found in the writings of Eric Drexler. In Drexler’s case, X = molecular nanotechnology.)

The risks of X are great.

Reducing these risks will require a period of serious preparation.

Serious preparation will begin only once the prospect of X is taken seriously by broad sectors of society.

Broad sectors of society will take the prospect of X seriously only once a large research effort to develop X is underway.

The earlier a serious research effort is initiated, the longer it will take to deliver X (because it starts from a lower level of pre-existing enabling technologies).

Therefore, the earlier a serious research effort is initiated, the longer the period during which serious preparation will be taking place, and the greater the reduction of the risks.

Therefore, a serious research effort toward X should be initiated immediately.

What initially looks like a reason for going slow or stopping—the risks of X being great—ends up, on this line of thinking, as a reason for the opposite conclusion.

A related type of argument is that we ought — rather callously — to welcome small and medium-scale catastrophes on grounds that they make us aware of our vulnerabilities and spur us into taking precautions that reduce the probability of an existential catastrophe.

In reality, the prudential case for favoring a wide distribution of gains is presumably subject-relative and situation-dependent. Yet, on the whole, people would be more likely to get (almost all of) what they want if a way is found to achieve a wide distribution — and this holds even before taking into account that a commitment to a wider distribution would tend to foster collaboration and thereby increase the chances of avoiding existential catastrophe. Favoring a broad distribution, therefore, appears to be not only morally mandated but also prudentially advisable.

The common good principle does not preclude commercial incentives for individuals or firms active in related areas. For example, a firm might satisfy the call for universal sharing of the benefits of superintelligence by adopting a “windfall clause” to the effect that all profits up to some very high ceiling (say, a trillion dollars annually) would be distributed in the ordinary way to the firm’s shareholders and other legal claimants, and that only profits in excess of the threshold would be distributed to all of humanity evenly (or otherwise according to universal moral criteria). Adopting such a windfall clause should be substantially costless, any given firm being extremely unlikely ever to exceed the stratospheric profit threshold (and such low-probability scenarios ordinarily playing no role in the decisions of the firm’s managers and investors). Yet its widespread adoption would give humankind a valuable guarantee (insofar as the commitments could be trusted) that if ever some private enterprise were to hit the jackpot with the intelligence explosion, everybody would share in most of the benefits.

CHAPTER 15 - Crunch time

One of the many tasks on which superintelligence (or even just moderately enhanced human intelligence) would outperform the current cast of thinkers is in answering fundamental questions in science and philosophy. This reflection suggests a strategy of deferred gratification. We could postpone work on some of the eternal questions for a little while, delegating that task to our hopefully more competent successors — in order to focus our own attention on a more pressing challenge: increasing the chance that we will actually have competent successors. This would be high-impact philosophy and high-impact mathematics.