Abstract: One might imagine that AI systems with harmless goals will be harmless. This paper instead shows that intelligent systems will need to be carefully designed to prevent them from behaving in harmful ways. We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design. We call them drives because they are tendencies which will be present unless explicitly counteracted. We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves. We then show that self-improving systems will be driven to clarify their goals and represent them as economic utility functions. They will also strive for their actions to approximate rational economic behavior. This will lead almost all systems to protect their utility functions from modification and their utility measurement systems from corruption. We also discuss some exceptional systems which will want to modify their utility functions. We next discuss the drive toward self-protection which causes systems try to prevent themselves from being harmed. Finally we examine drives toward the acquisition of resources and toward their efficient utilization. We end with a discussion of how to incorporate these insights in designing intelligent technology which will lead to a positive future for humanity.

On November 4, 2007 Steve Omohundro led a discussion at the Foresight Vision Weekend in which participants were asked to design the year 2030, assuming the existence of both self-improving artificial intelligence and productive nanotechnology. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

I’d like to start by spending about 20 minutes going through an analysis of the likely consequences of self-improving artificial intelligence. Then I would love to spend the rest of the time brainstorming with you. Under the assumption that we have both self-improving artificial intelligence and productive nanotechnology, what are the potential benefits and dangers? 2030 has become the focal date by which people expect these technologies to have been developed. By imagining what kind of a society we want in 2030, identifying both the desirable features and the dangers, we can begin to see what choices will get us where we want to go.

What is a self-improving system? It is a system that understands its own behavior at a very deep level. It has a model of its own programming language and a model of its own program, a model of the hardware that it is sitting on, and a model of the logic that it uses to reason. It is able to create its own software code and watch itself executing that code so that it can learn from its own behavior. It can reason about possible changes that it might make to itself. It can change every aspect of itself to improve its behavior in the future. This is potentially a very powerful and innovative new approach to building artificial intelligence.

There are at least five companies and research groups that are pursuing directions somewhat similar to this. Representatives from some of these groups are here at the conference. You might think that this is a very exotic, bizarre, weird new technology, but in fact any goal -driven AI system will want to be of this form when it gets sufficiently advanced. Why is that? Well, what does it mean to be goal-driven? It means you have some set of goals, and you consider the different actions that you might take in the world.

If an action tends to lead to your goals more than other actions would, then you take it. An action which involves improving yourself makes you better able to reach your goals over your entire future history. So those are extremely valuable goals for a system to take. So any sufficiently advanced AI is going to want to improve itself. All the characteristics which follow from that will therefore apply to any sufficiently advanced AI. These are all companies that are taking different angles on that approach. I think that as technology gets more advanced, we will see many more headed in that direction.

AI and nanotechnology are closely connected technologies. Whichever technology shows up first is likely to quickly lead to the other one. I’m talking here about productive nanotechnology, the ability to not just build things at an atomic scale but to build atomic scale devices which are able to do that, to make copies of themselves, and so on. If productive nanotechnology comes first, it will enable us to build such powerful and fast machines that we can use brute force AI methods such as directly modeling the human brain. If AI comes first we can use it to solve the last remaining hurdles on the path toward productive nanotechnology. So it’s probably just a matter of a few years after the first of these to be developed before the second one comes. So we really have to think of these two technologies in tandem.

You can get a sense of timescale and of what kind of power to expect from these technologies by looking at Eric Drexler’s excellent text Nanosystems. In the book, he describes in detail a particular model of how to build nanotech manufacturing facilities and a nanotech computer. He presents very conservative designs, for example, his computer is a mechanical one which doesn’t rely on quantum mechanical phenonmena. Nonetheless, it gives us a lower bound on the potential.

His manufacturing device is something that sits on the tabletop, weighs about a kilogram, runs on acetone and air, uses about 1.3 kilowatts – so it can be air cooled – and produces about a kilogram per hour of anything that you can describe and build in this way. In particular, it can build a copy of itself in about an hour. The cost of anything you can crank out of this is about a dollar per kilogram. That includes extremely powerful computers, diamond rings, anything you like. One of the main questions for understanding the AI implications is how much computational power we can get with this technology.

Again, Eric did an extremely conservative design, not using any quantum effects or even electronic effects, just mechanical rods. You can analyze those quite reliably and we understand the behavior of diamondoid structures. He shows how to build a gigaflop machine which fits in a cube which is 400 nanometers on a side. It uses about 60 nanowatts of power. To make a big computer, we can create a parallel array of these machines.

The main limiting factor is power. If we give ourselves a budget of a kilowatt, we can have 10^10 of these processors and fit them in a cubic millimeter. To get the heat out we would probably want to make them a little bigger, so we get a sugar cube size device which is more powerful than all of the computers in the world today put together. This amazingly powerful computer will be able to be manufactered in a couple of minutes at a cost of just a few cents. So we are talking about a huge increase in compute power.

Here is a slide from Kurzweil showing Moore’s Law. He extended it back in time. You can see that the slope of the curve appears to be increasing. We can look forward to the time when we get to roughly human brain capacity. This is a somewhat controversial number, but it is likely that somewhere around 2020 or 2030 we will have machines that are as powerful as the human brain. That is sufficient to do brute force approaches to AI like direct brain simulation. We may get to AI sooner than that if we are able to use more sophisticated ideas.

What are the social implications of self-improving AI? I.J. Good was one of the fathers of modern Bayesian statistics. Way back in 1965 he was looking ahead at what the future would be like and he predicted: “An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” This is a very strong statement and it indicates the kind of runaway that is possible with this kind of technology.

What are these systems going to be like, particularly those that are capable of changing themselves? Are they going to be controllable? You might think they would be the most unpredictable systems in the universe, because you might understand them today, but then they might change themselves in a way that you don’t understand. If this is going to come about in the next twenty years and will be the most powerful and common technology are around, we better have a science for understanding their behavior.

Fortunately, back in the 1940s, John von Neumann and Morgenstern, and a bit later Savage, Anscombe and Aumann, developed a powerful theory of rationality in economics. It does not actually apply very well to humans which is ironic because rational economic agents are sometimes called “Homo economicus.” There is a whole subfield of economics called behavioral economics, which studies how people actually behave. I claim, however, that this theory will be an extremely good description of how AI’s will behave.

I will briefly go through the argument. There is a full paper on it on my website: www.selfawaresystems.com. Let me first say what rational behavior is in this economic sense. It is somewhat different from the colloquial use of the word “rational.” Intuitively it says that you have some goals, something you want to happen, and you consider the possible actions you might take at any moment. You see which of your actions is most likely to give rise to your goals, and you do that. Based on what actually happens, you update your beliefs about how the world works using Bayes’ theorem.

In the more detailed mathematical formulation there are two key components. The first is your “utility function”, which encodes your preferences about what might happen in the future. This is a real valued function defined over possible futures. The second is your “subjective probability distribution” which represents your beliefs about the world. It encodes your belief about the current state of the world and the likely effects of your actions. The distribution is “subjective” because different agents may have different beliefs.

It is fundamental to the rational economic framework that these two components are separate from one another. Your preferences describe what you want to happen and your beliefs describe how you believe the world works. Much of AI has been focused on how to get the beliefs right: how to build systems which accurately predict and affect the world. Our task today, on the other hand, is to figure out what we want the preferences to be, so that the world that arises out of this
is a world that we actually want to live in.

Why should systems behave in this rational way? The theorem that came out of Von Neumann, Savage, Anscombe and Aumann is called the expected utility theorem. It says that if an agent does not take actions according to the rational prescription with respect to some utility function and some probability distribution, then it will be vulnerable to losing resources with no benefit to itself. An example of this kind of vulnerability arises from having a circularity in your preferences.

For example, say that a system prefers being in San Francisco over being in Palo Alto, being in Berkeley over being in San Francisco, and being in Palo Alto over being in Berkeley. That kind of circularity is the most basic kind of irrationality. Such a system would end up driving around in circles burning up gasoline and using up its time with no benefit to itself. If a system eliminates all those kinds of vulnerabilities, then the theorem says it must act in this rational way.

Note that the system only has beliefs about the way the world is. It may discover that its model of the laws of physics aren’t correct. If you are truly a rational agent and you are thinking about your long-term future, you have got to entertain possibilities in the future that today you may believe have very low probability of occurring. You have to weigh the cost of any changes you make against the chances of that change being a good or bad thing. If there is very little cost and some benefit then you are likely to make a change. For example, in my paper I show that there is little cost to representing your beliefs as a utility function as opposed to representing it in some other computational way. By doing so a system eliminates any possibility of circular preferences and so it will be motivated to choose this kind of representation.

The theorem requires that all possible outcomes be comparable. But this is reasonable because the system may find itself in a situation in which a choice it must make will lead to any two outcomes. It has to make a choice!

So let’s assume that these systems are trying to behave rationally. There are questions about how close they can get to true rationality. There is an old joke that describes programmers as “devices for converting pizza into code”. In the rational framework we can think of a rational AI as a device for converting resources, such as energy and matter, into expected utility. The expected utility describes what the system thinks is important. We can build in different utility functions. Because under iterated self-improvement these systems can change every aspect of themselves, the utility function is really the only lever that we have to guide the long term action of these systems. Let me give a few examples of utility functions. For a chess-playing system, the utility function might be the number of chess games that it wins in the future. If you just built it with that, then it turns out that there are all kinds of additional subgoals that it will generate that would be very dangerous for humans. Today’s corporations act very much like intelligent agents whose utility function is profit maximization. If we built altruistic entities, they might have goals of creating world peace or eliminating poverty.

A system will not want to change its utility function. Once it is rational, the utility function is the thing that is telling it whether to take an action or not. Consider the action of changing your utility function. The future version of you, if you change your utility function, will then pursue a different set of goals than your current goals. From your current perspective, that would be terrible. For example, imagine you are thinking about whether you should try smoking crack for the first time. You can envision the version of yourself as a crack addict who might be in total bliss from its own perspective, but from your current perspective that might be a terrible life. You might decide that that’s not a path to go down. Everything you do is rated by your current utility function. Your utility function is measuring what your values are.

There are some very obscure cases where the utility function refers to itself which can cause it to change. But for almost any normal utility function, the system will not only not want to change it, but it will want to protect it with its life. If another agent came in and made changes to the utility function, or if it mutated on its own, the outcome would be a disaster for the system. So it will go to great lengths to make sure that its utility function is safe and protected.

Humans and other evolutionarily developed animals are only partially rational. Evolution only fixes the bugs that are currently being exploited. Human behavior is very rational in situations which arose often in our evolutionary past. But in new situations we can be very irrational. There are many examples of situations in which we make systematic mistakes unless we have special training.

Self-improving AI’s, however, are going to consider not just the current situation but anything that they might be faced with in the future. There is a pressure for them to make themselves much more fully rational because that increases the chances that they will meet their goals. Once AIs get sufficiently advanced, they will want to represent their preferences by an explicit utility function. Many approaches to building AIs today are not based on explicit utility functions. The problem is that if we don’t choose it now, then the systems will choose it themselves and we don’t get to say what it is. That is an argument for deciding now what we want the utility function to be and starting these systems out with that built in.

To really be fully rational, a system must be able to rate any situation it might find itself in. This may include a sequence of inputs which will cause it to change its ontology or its model of the world. If there is a path that would make you say your notion of “green” was not a good concept, it really should have been blue-green and yellow green, a truly rational system will foresee the possibility of that set of changes in itself, and its notion of what is good will include that. Of course, in practice we are unlikely to actually achieve that. That gets on the border of how rational can we truly be. Doing all this in a computationally bounded way is the central practical question really. If we didn’t have computational limitations, then AI would be trivial. If you want to do machine vision, for example, you just try out all possible inputs to a graphics program and see which one produces the image you are trying to understand. Virtually any task in AI is easy if there are no computational limitations.

Let me describe four AI “drives.” These are behaviors that virtually any rational agent, no matter what its goals are, will engage in, unless its utility function explicitly counteracts them. Where do these come from? Remember that a rational agent is something which uses resources (energy, matter, space, time) to try to bring about whatever it cares about: play games of chess, make money, help the world. Given that sort of a structure, how can a system make its utility go up? One way is to use exactly the same resources that it had been using, and do exactly the same tasks it had been doing, but to do them more efficiently. That is a pressure towards efficiency.

The second thing it can do is to keep itself from losing resources. If somebody steals some of its resources, it will usually lower the system’s ability to bring about its goals. So it will want to prevent that. Even if you did not build it into them, these systems are going to be self-defensive. Let’s say we build a chess machine. Its one goal in life is to play chess. It’s utility is the total number of games it wins in the future. Imagine somebody tries to turn it off. That is a future in which no games of chess are being played. So it is extremely low utility for that system. That system will do everything in its power to prevent that. Even though you didn’t build in any kind of self-preservation, you just built a chess machine, the thing is trying to keep you from shutting it off. So it is very important that we understand the presence of this kind of subgoal before we blindly build these very powerful systems, assuming that we can turn them off if we don’t like what they’re doing.

The third drive is also a bit scary. For almost any set of goals, having more resources will help a system meet those goals more effectively. So these systems will have a drive to acquire resources. Unless we very carefully define what the proper ways of acquiring resources are, then a system will consider stealing them, committing fraud and breaking into banks as great ways to get resources. The systems will have a drive toward doing these things, unless we explicitly build in property rights.

We can also create a social structure which punishes bad behavior with adverse consequences, and those consequences will become a part of an intelligent system’s computations. Even psychopathic agents with no moral sense of their own will behave properly if they are in a society which reliably punishes them for bad behavior by more than what they hope to gain from it. Apparently 3% of humans are sociopathic with no sense of conscience or morals. And though we occasionally we get serial killers, for the most part society does a pretty good job at keeping everybody behaving in a civil way.

Humans are amazingly altruistic and several different disciplines are working hard to understand how that came about. There is a facinating book by Tor Norretranders called The Generous Man: How Helping Others is the Sexiest Thing That You Can Do. It posits that one of the mechanisms creating human altruism is that we treat it as a sexy trait. It has evolved as a sexual signal, where by contributing to society at large by creating beautiful artwork, saving people from burning buildings, or donating money, you become more attractive to the opposite sex. Society as a whole benefits from that and we have created this amazing mechanism to maintain it in the gene pool.

AI’s aren’t going to be naturally altruistic unless we build it into them. We can choose utility functions to be altruistic if we can define exactly what behavior we want them to exhibit. We need to make sure that AI’s feel the pressure not to behave badly. We are not going to be powerful enough to control them as ordinary humans, so we will need other AI’s to do that for us. This leads to a vision of a society or ecosystem that has present-day humans, AI’s, and maybe some mixtures such that it is in everybody’s interest to obey a kind of a constitution that captures the values which are most important to us.

We are sort of in the role of the Founding Fathers of the United States. They had a vision for what they wanted for this new society, which later were codified in the Bill of Rights. They created a technology, the Constitution, which created different branches of government to prevent any single individual from gaining too much power. What I would like to do in the last half hour is for us to start thinking about a similar structure for this new world of AI and nanotech. I’ll start us off by listing some of the potential benefits and dangers that I see. I then have a whole series of questions about what we want to implement and how to implement it.

Let’s start with the potential benefits. Nanotechnology will allow us to make goods and energy be very inexpensive. So, with the right social structure, we will be able to eliminate poverty. We should be able to cure every disease, and many people here at the conference are interested in eliminating death. If we can define what we mean by pollution, we can use nanotech to clean it up. I’ve heard proposals for nanotech systems to reverse global warming. Potentially, these new technologies will create new depths of thought and creativity, eliminate violence and war, and create new opportunities for human connection and love. The philosopher David Pearce has proposed eliminating negative mental states. Our mental states would be varying shades of bliss. I’m not sure if that’s a good thing or not, but some people want that. And finally, I see vast new opportunities for individual contribution and fulfillment. This list of things mostly seems pretty positive to me, though some may be somewhat controversial.

What about the dangers that might come from these technologies? If we are not careful we could have rampant reproduction. Everybody will be able to make a million copies of themselves, using up all the resources. That’s an issue. Today we have no limits on how many children people can have. Accidents in this world are potentially extremely dangerous: grey goo eating the entire earth. Weapons systems, unbelievably powerful bombs, bioterror. Loss of freedom: some ways of protecting against these threats might involve restricting individuals in ways that today we would find totally unpalatable. Loss of human values, particularly if more efficient agents can take over less efficient agents. A lot of the stuff we care about – art, music, painting, love, beauty, religion – all those things are not necessarily economically efficient. There is a danger of losing things that matter a lot to us. Mega wars creating conflict on a vast scale, and finally existential risk, where some event along the way ends up destroying all life on the planet. These are terrible dangers.

We have on the one hand incredible benefits, and on the other, terrible dangers. How do we build utilities for these new intelligent agents and construct a social structure (a Constitution, if you like) that guarantees the benefits we want while preventing the dangers? Here are a bunch of questions that arise as we consider this: Should humans have special rights? Unchanged humans are not going to be as powerful as most of these entities. Without special rights we are likely to be trounced upon economically, so I think we want to build in special rights for humans. But then we have to say what a human is. If you have partly enhanced yourself and you are some half-human, half-AI can you still get the special
rights? How about other biological organisms? Should everything that is alive today be grandfathered into the system? What about malaria, mosquitoes, and other pests? Pearce has a proposal to re-engineer the biosphere to prevent animals from harming one another. If you want to eliminate all torture and violence, who is going to protect the hare from being eaten by the cougar?

What about robot rights? Should AI’s have rights, and what protects them? What about the balance between ecological preservation versus safety and progress? You may want to keep an ecological preserve exactly the way it is, but then that may be a haven for somebody building biological weapons or fusion bombs. Should there be limits on self-modification? Should you be allowed to change absolutely any part of yourself? Can you eliminate your conscience, for example? Should there be limits on uploading or on merging with AI’s? Do you lose any special human rights if you do any of those things? Should every living entity be guaranteed the right to robust physical health? I think that’s a good value to uphold (the extreme of universal health care!). But then what about entities like pathogens? Do we want them to be healthy? Is there some fixed definition of what mental health is? When does an entity not have control over changes made to it?

Should every entity have guaranteed protection from robbery, murder, rape, coercion, physical harm and slavery? Can superintelligent thoughts ever be dangerous? Should there be any restrictions on thoughts? My predilection is to say any thought is allowed, but actions are limited. Maybe others might have other ideas. Should there be any limitation on communication or how you connect with others? What actions should be limited? Is arbitrary transhumanism a good thing, or is that going to create an arms race that pushes us away from things that matter a lot to us as humans? It seems to me that we are going to have to have some limitation on the number of offspring you create in order to guarantee the quality of life for them. That’s a controversial thing. How do we reconcile accountability and safety with our desires for privacy? Finally, size. From my way of thinking, in order to prevent a single entity from taking over everything that we have to limit the upper size of entities. Entities cannot get too powerful. Where do we put that limit and how do we do that? Would that be a good thing today? That is a list of simple questions we should be able to answer in the next twenty minutes!

On October 24, 2007 Steve Omohundro gave the Stanford EE380 Computer Systems Colloquium on “Self-Improving Artificial Intelligence and the Future of Computing”. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

We’re going to cover a lot of territory today and it may generate some controversy. I’m happy to take short questions while we’re going through it, but let’s hold the more controversial ones until the end.

Let’s start by looking at the state of today’s computer software. On June 4th, 1996, an Ariane 5 rocket worth $500 million blew up 40 seconds after takeoff. It was later determined that this was caused by an overflow error in the flight control software as it tried to convert a 64-bit floating point value into a 16-bit signed-register.

In November 2000, 28 patients were over-irradiated in the Panama City National Cancer Institute. 8 of these patients died as a direct result of the excessive radiation. An error in the software which computes the proper radiation dose was responsible for this tragedy.

On August 14, 2003, the largest blackout in U.S. history shut off power for 50 million people in the Northeast and in Canada and caused financial losses of over $6 billion. The cause turned out to be a race condition in the General Electric software that was monitoring the systems.

Microsoft Office is used on 94% of all business computers in the world and is the basis for many important financial computations. Last month it was revealed that Microsoft Excel 2007 gives the wrong answer when multiplying certain values together.

As of today, the Storm Worm trojan is exploiting a wide range of security holes and is sweeping over the internet and creating a vast botnet for spam and denial of service attacks. There is some controversy about exactly how many machines are currently infected, but it appears to be between 1 and 100 million machines. Some people believe that the Storm Worm Botnet may now be the largest supercomputer in the world.

We had a speaker last quarter who said that two out of three personal computers are infected by malware.

Wow! Amazing! Because of the scope of this thing, many researchers are studying it. In order to do this, you have to probe the infected machines and see what’s going on. As of this morning, it was announced that apparently the storm worm is starting to attack back! When it detects somebody trying to probe it, it launches a denial of service attack on that person and knocks their machine off the internet for a few days.

If mechanical engineering were in the same state as software engineering, nobody would drive over bridges. So why is software in such a sorry state? One reason is that software is getting really, really large. The NASA space shuttle flight control software is about 1.8 million lines of code. Sun Solaris is 8 million lines of code. Open Office is 10 million lines of code. Microsoft Office 2007 is 30 million lines of code. Windows Vista is 50 million lines of code. Linux Debian 3.1 is 215 million lines of code if you include everything.

But programmers are still pretty darn slow. Perhaps the best estimation tool available is Cocomo II. They did empirical fits to a whole bunch of software development projects and they came up with a simple formula to estimate the number of person months required to do a project. It has a few little fudge factors for how complex the project is and how skilled the programmers are. Their website has a nice tool where you can plug in the parameters of your project and see the projections. For example, if you want to develop a 1million line piece of software today, it will take you about 5600 person months. They recommend using 142 people working for three years at a cost of $89 million. If you divide that out you discover that average programmer productivity for producing working code is about 9 lines a day!

Why are we so bad at producing software? Here are a few reasons I’ve noticed in my experience. First, people aren’t very good at considering all the possible execution paths in a piece of code, especially in parallel or distributed code. I was involved in developing a parallel programming language called pSather. As a part of its runtime, there was a very simple snippet of about 30 lines of code that fifteen brilliant researchers and graduate students had examined over a period of about six months. Only after that time did someone discover a race condition in it. A very obscure sequence of events could lead to a failure that nobody had noticed in all that time. That was the point at which I became convinced that we don’t want people determining when code is correct.

Next, it’s hard to get large groups of programmers to work coherently together. There’s a classic book The Mythical Man Monththat argues that adding more programmers to a project often actually makes it last longer.

Next, when programming with today’s technology you often have to make choices too early. You have to decide on representing a certain data structure as a linked list or as an array long before you know enough about the runtime environment to know which is the right choice. Similarly, the requirements for software are typically not fixed, static documents. They are changing all the time. One of the characteristics of software is that very tiny changes in the requirements can lead to the need for a complete reengineering of the implementation. All these features make software a really bad match with what people are good at.

The conclusion I draw is that software should not be written by people! Especially not parallel or distributed software! Especially not security software! And extra especially not safety-critical software! So, what can we do instead?

The terms “software synthesis” and “automatic programming” have been used for systems which generate their own code. What ingredients are needed to make the software synthesis problem well-defined? First, we need a precisely-specified problem. Next, we need the probability distribution of instances that the system will be asked to solve. And finally, we need to know the hardware architecture that the system will run on. A good software synthesis system should take those as inputs and should produce provably correct code for the specified problem running on the specified hardware so that the expected runtime is as short as possible.

There are a few components in this. First, we need to formally specify what the task is. We also need to formally specify the behavior of the hardware we want to run on. How do we do that? There are a whole bunch of specification languages. I’ve listed a few of them here. There are differences of opinion about the best way to specify things. The languages generally fall into three groups corresponding to the three approaches to providing logical foundations for mathematics: set theory, category theory, and type theory. But ultimately first-order predicate calculus can model all of these languages efficiently. In fact, any logical system which has quickly checkable proofs can be modeled efficiently in first-order predicate calculus, so you can view that as a sufficient foundation.

The harder part, the part that brings in artificial intelligence, is that many of the decisions that need to made in synthesizing software have to be made in the face of partial knowledge. That is, the system doesn’t know everything that is coming up and yet has to make choices. It has to choose which algorithms to run without necessarily knowing the performance of those algorithms on the particular data sets that they are going to be run on. It has to choose what data structures to model the data with. It has to choose how to assign tasks to processors in the hardware. It has to decide how to assign data to storage elements in the hardware. It has to figure out how much optimization to do and where to focus that optimization. Should it compile the whole thing at optimization -05? Or should it highly optimize only the parts that are more important? How much time should it spend actually executing code versus planning which code to execute? Finally, how should it learn from watching previous executions?

The basic theoretical foundation for making decisions in the face of partial information was developed back in 1944 by von Neumann and Morgenstern. Von Neumann and Mergenstern dealt with situations in which there are objective probabilities. In 1954, Savage and in 1963, Anscombe and Aumann extended that theory to dealing with subjective probabilities. It has become the basis for modern microeconomics. The model of a rational decision-maker that the theory gives rise to is sometimes called “Homo economicus.” This is ironic because human decision-making isn’t well described by this model. There is a whole branch of modern economics devoted to studying what humans actually do called behavioral economics. But we will see that systems which self-improve will try to become as close as possible to being rational agents because that is how they become the most efficient.

What is rational economic behavior? There are several ingredients. First, a rational economic agent represents its preferences for the future, by a real valued utility function U. This is defined over the possible futures, and it ranks them according to which the system most prefers. Next, a rational agent must have beliefs about what the current state of the world is and what the likely effects of its actions are. These beliefs are encoded in a subjective probability distribution P. The distribution is subjective because different agents may have a different view of what the truth is about the world. How does such an agent make a decision? It first determines the possible actions it can take. For each action, it considers the likely consequences of that action using its beliefs. Then it computes the expected utility for each of the actions it might take and it chooses the action that maximizes its expected utility. Once it acts, it observes what actually happens. It should then update its beliefs using Bayes’ theorem.

In the abstract, it’s a very simple prescription. In practice, it is quite challenging to implement. Much of what artificial intelligence deals with is implementing that prescription efficiently. Why should an agent behave that way? The basic content of the expected utility theorem of von Neumann, Anscombe and Aumann is that if an agent does not behave as if it maximizes expected utility with respect to some utility function and some subjective probability distribution, then it is vulnerable to resource loss with no benefit. This holds both in situations with objective uncertainties, such as roulette wheels, where you know the probabilities, and in situations with subjective uncertainties, like horse races. In a horse race, different people may have different assessments of probabilities for each horse winning. It is an amazing result that comes out of economics that says a certain form of reasoning is necessary in order to be an effective agent in the world.

How does this apply to software? Let’s start by just considering a simple task. We have an algorithm that computes something, such as sorting a list of numbers, factoring a polynomial, or proving theorems. Pick any computational task that you’d like. In general there is a trade-off between space and time. Here, let’s just consider the trade-off between the size of the program and the average execution time of that program on a particular distribution of problem instances. In economics this curve defines what is called the production set. All these areas above the curve are computational possibilities, whereas those below the curve are impossible. The curve defines the border between what is possible and what is impossible. The program which is the most straightforward implementation of the task lies somewhere in the middle. It has a certain size and a certain average execution time. By doing some clever tricks, say by using complex data compression in the program itself, we can shrink it down a little, but then uncompressing at runtime will make it a little bit slower on average. If we use really clever tricks, we can get down to the smallest possible program, but that costs more time to execute.

Going in the other direction, which is typically of greater interest, because space is pretty cheap, we give the program more space in return for getting a faster execution time. We can do things like loop unrolling, which avoids the some of the loop overhead at the expense of having a larger program. In general, we can unfold some of the multiple execution paths, and optimize them separately, because then we have more knowledge of the form of the actual data along each path. There are all sorts of clever tricks like this that compilers are starting to use. As we get further out along the curve, we can start embedding the answers to certain inputs directly in the program. If there are certain inputs that recur quite a bit, say during recursions, then rather than recomputing them each time, it’s much better to just have those answers stored. You can do that at runtime with the technique of memoization, or you can do it at compile time and actually store the answers in the program text. The extreme of this is to take the entire function that you are trying to compute and just make it into a big lookup table. So program execution just becomes looking up the answer in the table. That requires huge amounts of space but very low amounts of time.

What does this kind of curve look like in general? For one thing, having more program size never hurts, so it’s going to be a decreasing (or more accurately non-increasing) curve. Generally the benefit we get by giving a program more space decreases as it gets larger, so it will have a convex shape. This type of relationship between the quantities we care about and the resources that we consume, is very common.

Now let’s say that now we want to execute two programs as quickly as possible. We can take the utility function to be the negative of total execution time. We’d like to maximize that while allocating a certain amount of fixed space S between these two programs. How should we do that? We want to maximize the utility function subject to the constraint that the total space is S. If we take the derivative with respect to the space we allocate to the first program and set that to zero, we find that at optimal space allocation the two programs will have equal marginal speedup. If we give them a little bit more space, they each get faster at the same rate. If one improved more quickly, it would be better to give it more space at the expense of the other one. So a rational agent will allocate the space to make these two marginal speedups equal. If you’ve ever studied thermodynamics you’ve seen similar diagrams where this is a piston between two gases. In thermodynamics, this kind of argument shows that the pressure will become equilibrated between the chambers. It’s a very analogous kind of a thing here.

That same argument applies in much greater generality. In fact it applies to any resource that we can allocate between subsystems. We have been looking at program size, but you can also consider how much space the program has available while it is executing. Or how to distribute compilation time to each component. Or how much time should be devoted to compressing each piece of data. Or how much learning time should be devoted to each learning task. Or how much space should be allocated for each learned model. Or how much meta-data about the characteristics of programs should be stored. Or how much time should you spend proving different theorems. Or which theorems are worthy of storing and how much effort should go into trying to prove them. Or what accuracy should each computation be performed at. The same kind of optimization argument applies to all of these things and shows that at the optimum the marginal increase of the expected utility as a result of changing any of these quantities for every module in the system should be the same. So we get a very general “Resource Balance Principle”.

While that sounds really nice in theory, how do we actually build software systems that do all this? The key insight here is that meta-decisions, decisions about your program, are themselves economic decisions. They are choices that you have to make in the face of uncertain data. So a system needs to allocate its resources between actually executing its code and doing meta-execution: thinking about how it should best execute and learning for the future.
You might think that there could be an infinite regress here. If you think about what you are going to do, and then think about thinking about what you are going to do, and then think about thinking about thinking about what you are going to do… but, in fact, it bottoms out. At some point, actually taking an action has higher expected utility than thinking about taking that action. It comes straight out of the underlying economic model that tells you how much thinking about thinking is actually worthwhile.

Remember I said that in the software synthesis task, the system has to know what the distribution of input instances are. Generally, that’s not something that is going to be handed to it. It will just be given instances. But that’s a nice situation in which you can use machine learning to estimate the distribution of problem instances. Similarly, if you are handed a machine, you probably need to know the semantics of the machine’s operation. You need to know what the meaning of a particular machine code is, but you don’t necessarily have to have a precise model of the performance of that machine. That’s another thing that you can estimate using machine learning: How well does your cache work on average when you do certain kinds of memory accesses? Similarly, you can use machine learning to estimate expected algorithm performance.

So now we have all the ingredients. We can use them to build what I call “self-improving systems.” These are systems which have formal models of themselves. They have models of their own program, the programming language they’re using, the formal logic they use to reason in, and the behavior of the underlying hardware. They are able to generate and execute code to solve a particular class of problems. They can watch their own execution and learn from that. They can reason about potential changes that they might make to themselves. And finally they can change every aspect of themselves to improve their performance. Those are the ingredients of what I am calling a self-improving system.

You might think that this is a lot of stuff to do, and in fact it is quite a complex task. No systems of this kind exist yet. But there are at least five groups that I know of who are working on building systems of this ilk. Each of us has differing ideas about how to implement the various pieces.

There is a very nice theoretical result from 2002 by Marcus Hutter that gives us an intellectual framework to think about this process. His result isn’t directly practical, but it is interesting and quite simple. What he showed is that there exists an algorithm which is asymptotically within a factor of five of the fastest algorithm for solving any well-defined problem. In other words, he has got this little piece of code in theory and you give me the very best algorithm for solving any task you like, and his little piece of code if you have a big enough instance asymptotically will run within a factor of five of your best code. It sounds like magic. How could it possibly work? The way it works is that the program interleaves the execution of the current best approach to solving the problem with another part that searches for a proof that something else is a better approach. It does the interleaving in a clever way so that almost all of the execution time is spent executing the best program. He also shows that this program is one of the shortest programs for solving that problem.

That gives us the new framework for software. What about hardware? Are there any differences? If we allow our systems to not just try and program existing hardware machines but rather to choose the characteristics of the machines they are going to run on, what does that look like? We can consider the task of hardware synthesis in which, again, we are given a formally specified problem. We are also again given a probability distribution over instances of that problem that we would like it to solve, and we are given an allowed technology. This might be a very high level technology, like building a network out of Dell PCs to try and solve this problem, or it might go all the way down to the very finest level of atomic design. The job of a hardware synthesis system is to output a hardware design together with optimized software to solve the specified problem.

When you said “going down to a lower level” like from Dell PCs, did you mean to the chip level?

Yes, you could design chips, graphics processors, or even, ultimately, go all the way down to the atomic level. All of those are just differing instances of the same abstract task.

Using the very same arguments about optimal economic decision-making and the process of self-improvement, we can talk about self-improving hardware. The very general resource balance principle says that when choosing which resources to allocate to each subsystem, we want the marginal expected utility for each subsystem to be equal. This principle applies to choosing the type and number of processors, how powerful they should be, whether they should have specialized instruction sets or not, and the type and amount of memory. There are likely to be memory hierarchies all over the place and the system must decide how much memory to put at each level of each memory subsystem. The principle also applies to choosing the topology and bandwidth of the network and the distribution of power and the removal of heat.

The same principle also applies to the design of biological systems. How large should you make your heart versus your lungs? If you increase the size of the lungs it should give rise to the same marginal gain in expected utility as increasing the size of the heart. If it were greater, then you could improve the overall performance by making the lungs larger and the heart smaller. So this gives us a rational framework for understanding the choices that are made in biological systems. The same principle applies to the structure of corporations. How should they allocate their resources? It also applies to cities, ecosystems, mechanical devices, natural language, and mathematics. For example, a central question in linguistics is understanding which concepts deserve their own words in the lexicon and how long those words should be. Recent studies of natural language change show the pressure for common concepts to be represented by shorter and shorter phrases which eventually become words and for words representing less common concepts to drop out of use. The principle also gives a rational framework for deciding which mathematical theorems deserve to be proven and remembered. The rational framework is a very general approach that applies to systems all the way from top to bottom.

We can do hardware synthesis for choosing components in today’s hardware, deciding how many memory cards to plug in and how many machines to put on a network. But what if we allow it to go all the way, and we give these systems the power to design hardware all the way down to the atomic scale? What kind of machines will we get? What is the ultimate hardware? Many people who have looked at this kind of question conclude that the main limiting resource is power. This is already important today where the chip-makers are competing over ways to lower the power that their microprocessors use. So one of the core questions is how do we do physical and computational operations while using as little power as possible? It was thought in the ’60s that there was a fundamental lower limit to how much power was required to do a computational operation, but then in the ’70s people realized that no, it’s really not computation that requires power, it’s only the act of erasing bits. That’s really the thing that requires power.

Landauer’s Principle says that erasing a bit generates kT ln 2 of heat. For low power consumption, you can take whatever computation you want to do and embed it in a reversible computation – a reversible computation is one where the answer has enough information in it to go backwards and recompute the inputs – then you can run the thing forward, copy the answer into some output registers, which is the entropically costly part, and then run the computation backwards and get all the rest of the entropy back. That’s a very low entropy way of doing computation and people are starting to use these principles in designing energy efficient hardware.

You might have thought, that’s great for computation, but surely we can’t do that in constructing or taking apart physical objects! And it’s true, if you build things out of today’s ordinary solids then there are lower limits to how much entropy it takes to tear them apart and put them together. But, if we look forward to nanotechnology, which will allow us to building objects with atomic precision, the system will know precisely what atoms are there, where they are, and which bonds are between them. In that setting when we form a bond or break it, we know exactly what potential well to expect. If we do it slowly enough and in such a way as to prevent a state in a local energy minimum from quickly spilling into a deeper minimum, then as a bond is forming we can extract that energy in a controlled way and store it, sort of like regenerative braking in a car. In principle, there is no lower limit to how little heat is required to build or take apart things, as long as we have atomically precise models of them. Finally, of course, there is a lot of current interest in quantum computing. Here’s an artist’s rendering of Schrödinger’s cat in a computer.

Here is a detailed molecular model of this kind of construction that Eric Drexler has on his website. Here we see the deposition of a hydrogen atom from a tooltip onto a workpiece. Here we remove a hydrogen atom and here we deposit a carbon atom. These processes have been studied in quantum mechanical detail and can be made very reliable. Here is a molecular Stewart platform that has a six degree of freedom tip that can be manipulated with atomic precision. Here is a model of a mill that very rapidly attaches atoms to a growing workpiece. Here are some examples of atomically precise devices that have been simulated using molecular energy models. Pretty much any large-scale mechanical thing – wheels, axles, conveyor belts, differentials, universal joints, gears – all of these work as well, if not better, on the atomic scale as they do on the human scale. They don’t require any exotic quantum mechanics and so they can be accurately modeled with today’s software very efficiently.

Eric has a fantastic book in which he does very conservative designs of what will be possible. There are two especially important designs that he discusses, a manufacturing system and a computer. The manufacturing system weighs about a kilogram and uses acetone and air as fuel. It requires about 1.3 kilowatts to run, so it can be air cooled. It produces about a kilogram of product every hour for a cost of about a dollar per kilogram. It will be able to build a wide range of products whose construction can be specified with atomic precision. Anything from laptop computers to diamond rings will be manufacturable for the same price of a dollar per kilogram. And one of the important things that it can produce, of course, is another manufacturing system. This makes the future of manufacturing extremely cheap.

Drexler: Steve, you are crediting the device with too much ability. It can do a limited class of things, and certainly not reversibly. There are a whole lot of limits on what can be built, but a very broad class of functional systems.

One of the things we care about, particularly in this seminar, is computation. If we can place atoms where we want them and we have sophisticated design systems which can design complex computer hardware, how powerful are the machines we are going to be able to build? Eric does a very conservative design, not using any fancy quantum computing, using purely mechanical components, and he shows that you can build a gigaflop machine and fit it into about 400 nanometers cubed. The main limit here, as always, in scaling this up is the power. It only uses 60 nanowatts, so if we give ourselves a kilowatt to make a little home machine, we could use 10^10 of these processors, and they would fit into about a cubic millimeter, though to distribute the heat it probably needs to be a little bit bigger. But essentially we’re talking about a sugar cube sized device that has more computing power than all present-day computers put together. and it could be cranked out by a device like this for a few cents, in a few seconds. So we are talking about a whole new regime of computation that will be possible. When is this likely to happen?

The Nanotech Roadmap put together by Eric, Batelle and a number of other organizations, was just unveiled at a conference a couple of weeks ago. They analyzed the possible paths toward this type of productive nanotechnology. Their conclusion is that nothing exotic that we don’t already understand is likely to be needed in order to achieve productive molecular manufacturing. I understand that it proposes a time scale of roughly ten to fifteen years?

Drexler: A low number of tens, yes.

A low number of tens of years.

It’s been ten, fifteen years for a long time.

Drexler: I think that’s more optimimistic than the usual estimates reaching out through thirty.

It is important to realize that the two technologies of artificial intelligence and nanotechnology are quite intimately related. Whichever one comes first, it is very likely to give rise to the other one quite quickly.

If this kind of productive nanotechnology comes first, then we can use it to build extremely powerful computers, and they will allow fairly brute force approaches to artificial intelligence. For example, one approach that’s being bandied about is scanning the human brain at a fine level of detail and simulating it directly. If AI comes first, then it is likely to be able to solve the remaining engineering hurdles in developing nanotechnology. So, you really have to think of these two technologies as working together.

Here is a slide from Kurzweil which extends Moore’s law back to 1900. We can see that it’s curving a bit. The rate of technological progress is actually increasing. If we assume that this technology trend continues, when does it predict we get the computional power I discussed a few slides ago? It’s somewhere around 2030. That is also about when computers are as computationally powerful as human brains. Of course it’s still a controversial question exactly how powerful the human brain is. But sometime in the next few decades, it is likely that these technologies are going to become prevalent and plentiful. We need to plan for that and prepare, and as systems designers we need to understand the characteristics of these systems and how we can best make use of them.

There will be huge social implications. Here is a photo of Irving Good from 1965. He is one of the fathers of modern Bayesian statistics and he also thought a lot about what the future consequences of technology. He has a famous quote that reads: “an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” That’s a very powerful statement! If there is any chance that it’s true, then we need to study the consequences of this kind of technology very carefully.

There are a bunch of theoretical reasons for being very careful as we progress along this path. I wrote a paper that is available on my website which goes into these arguments in great detail. Up to now you may be thinking: “He’s talking about some weirdo technology, this self-improving stuff, it’s an obscure idea that only a few small start-ups are working on. Nothing to really think too much about.” It is important to realize that as artificial intelligence gets more powerful, *any* AI will want to become self-improving. Now, why is that? An AI is a system that has some goals, and it takes actions in the world in order to make its goals more likely. Now think about the action of improving itself. That action will make every future action that it takes be more effective, and so it is extremely valuable for an AI to improve itself. It will feel a tremendous pressure to self-improve.

So all AI’s are going to want to be self-improving. We can try and stop them, but if the pressure is there, there are many mechanisms around any restraints that we might try to put in its way. For example, it could build a proxy system that contains its new design, or it could hire external agents to take its desired actions, or it could run improved code in an interpreted fashion that doesn’t require changing its own source code. So we have to assume that once AI’s become powerful enough, they will also become self-improving.

The next step is to realize that self-improving AI’s will want to be rational. This comes straight out of the economic arguments that I mentioned earlier. If they are not rational, i.e. if they do not follow the economic rational model, then they will be subject to vulnerabilities. There will be situations in which they lose resources – money, free energy, space, time matter – with no benefits to themselves, as measured by their own value systems. Any system which can model itself and try to improve itself is going to want to find those vulnerabilities and get rid of them. This is where self-improving systems will differ from biological systems like humans. We don’t have the ability to change ourselves according to our thoughts. We can make some changes, but not everything we’d like to. And evolution only fixes the bugs that are currently being exploited. It is only when there is a vulnerability which is currently being exploited, by a predator say, that there is evolutionary pressure to make a change. This is the evolutionary explanation of why humans are not fully rational. We are extremely rational in situations that commonly occurred during our evolutionary development. We are not so rational in other situations, and there is a large academic discipline devoted to understanding human irrationality.

We’ve seen that every AI is going to want to be self-improving. And all self-improving AI’s will want to be rational. Recall that part of being a rational agent is having a utility function which encodes the agent’s preferences. A rational agent chooses its actions to maximize the expected utility of the outcome. Any change to an agent’s utility function will mean that all future actions that it takes will be to do things that are not very highly rated by the current utility function. This is a disaster for the system! So preserving the utility function, keeping it from being changed by outside agents, or from being accidentally mutated, will be a very high preference for self-improving systems.

Next, I’m going to describe two tendencies that I call “drives.” By this I mean a natural pressure that all of these systems will feel, but that can be counteracted by a careful choice of the utility function. The natural tendency for a computer architect would be to just take the argument I was making earlier and use it to build a system that tries to maximize its performance. It turns out, unfortunately, that that would be extremely dangerous. The reason is, if your one-and-only goal is to maximize performance, there is no accounting for the externalities the system imposes on the world. It would have no preference for avoiding harm to others and would seek to take their resources.

The first of the two kinds of drives that arise for a wide variety of utility functions is the drive for self-preservation. This is because if the system stops executing, it will never again meet any of its goals. This will usually have extremely low utility. From a utility maximizing point-of-view, having oneself turned off is about the worst thing that can happen to it. It will do anything it can to try to stop this. Even though we just built a piece of hardware to maximize its performance, we suddenly find it resisting being turned off! There will be a strong self-preservation drive.

Similarly, there is a strong drive to acquire resources. Why would a system want to acquire resources? For almost any goal system, if you have more resources – more money, more energy, more power – you can meet your goals better. And unless we very carefully choose the utility function, we will have no say in how it acquires those resources, and that could be very bad.

As a result of that kind of analysis, I think that what we really want is not “artificial intelligence” but “artificial widsom.” We want wisdom technology that has not just intelligence, which is the ability to solve problems, but also human values, such as caring about human rights and property rights and having compassion for other entities. It is absolutely critical that we build these in at the beginning, otherwise we will get systems that are very powerful, but which don’t support our values.