The Impact of AI and Robotics

Google, IBM, Microsoft, Apple, Facebook, Baidu, Foxconn, and others have recently made multi-billion dollar investments in artificial intelligence and robotics. More than $450 billion is expected to be invested into robotics by 2025. All of this investment makes sense because AI and Robotics are likely to create $50 to $100 trillion dollars of value between now and 2025! This is of the same order as the current GDP of the entire world. Much of this value will be in ideas. Currently, intangible assets represent 79% of the market value of US companies and intellectual property represents 44%. But automation of physical labor will also be significant. Foxconn, the world’s largest contract manufacturer, aims to replace 1 million of its 1.3 million employees by robots in the next few years. An Oxford study concluded that 47% of jobs will be automated in “a decade or two”. Automation is also creating arms races in high-frequency trading, cyber warfare, drone warfare, stealth technology, surveillance systems, and missile warfare. Recently, Stephen Hawking, Elon Musk, and others have issued strong cautionary statements about the safety of intelligent technologies. We describe the potentially antisocial “rational drives” of self-preservation, resource acquisition, replication, and self-improvement that uncontrolled autonomous systems naturally exhibit. We describe the “Safe-AI Scaffolding Strategy” for developing these systems with a high confidence of safety based on the insight that even superintelligences are constrained by mathematical proof and cryptographic complexity. It appears that we are at an inflection point in the development of intelligent technologies and that the choices we make today will have a dramatic impact on the future of humanity.

Intelligent technologies are rapidly transforming the world. These systems can have a hugely positive impact or an unexpectedly negative impact depending on how they are designed. We will discuss the basic rational drives which underlie them and techniques for promoting the good and preventing the bad.

Autonomous Technology and the Greater Human Good

Next generation technologies will make at least some of their decisions autonomously. Self-driving vehicles, rapid financial transactions, military drones, and many other applications will drive the creation of autonomous systems. If implemented well, they have the potential to create enormous wealth and productivity. But if given goals that are too simplistic, autonomous systems can be dangerous. We use the seemingly harmless example of a chess robot to show that autonomous systems with simplistic goals will exhibit drives toward self-protection, resource acquisition, and self-improvement even if they are not explicitly built into them. We examine the rational economic underpinnings of these drives and describe the effects of bounded computational power. Given that semi-autonomous systems are likely to be deployed soon and that they can be dangerous when given poor goals, it is urgent to consider three questions: 1) How can we build useful semi-autonomous systems with high confidence that they will not cause harm? 2) How can we detect and protect against poorly designed or malicious autonomous systems? 3) How can we ensure that human values and the greater human good are served by more advanced autonomous systems over the longer term?

1) The unintended consequences of goals can be subtle. The best way to achieve high confidence in a system is to create mathematical proofs of safety and security properties. This entails creating formal models of the hardware and software but such proofs are only as good as the models. To increase confidence, we need to keep early systems in very restricted and controlled environments. These restricted systems can be used to design freer successors using a kind of “Safe-AI Scaffolding” strategy.

2) Poorly designed and malicious agents are challenging because there are a wide variety of bad goals. We identify six classes: poorly designed, simplistic, greedy, destructive, murderous, and sadistic. The more destructive classes are particularly challenging to negotiate with because they don’t have positive desires other than their own survival to cause destruction. We can try to prevent the creation of these agents, to detect and stop them early, or to stop them after they have gained some power. To understand an agent’s decisions in today’s environment, we need to look at the game theory of conflict in ultimate physical systems. The asymmetry between the cost of solving and checking computational problems allows systems of different power to coexist and physical analogs of cryptographic techniques are important to maintaining the balance of power. We show how Neyman’s theory of cooperating finite automata and a kind of “Mutually Assured Distraction” can be used to create cooperative social structures.

3) We must also ensure that the social consequences of these systems support the values that are most precious to humanity beyond simple survival. New results in positive psychology are helping to clarify our higher values. Technology based on economic ideas like Coase’s theorem can be used to create a social infrastructure that maximally supports the values we most care about. While there are great challenges, with proper design, the positive potential is immense.

A talk given by Steve Omohundro on “Efficient Algorithms with Neural Network Behavior” on 8/19/1987 at the Center for Nonlinear Studies, Los Alamos, New Mexico. It describes a class of techniques for dramatically speeding up the performance of a wide variety of neural network and machine learning algorithms. Papers about these techniques and more advanced variants can be found at: http://steveomohundro.com/scientific-contributions/

The Emerging Global Mind, Cooperation, and Compassion

Steve Omohundro, Ph.D.President, Omai Systems

The internet is creating a kind of “global mind”. For example, Wikipedia radically changes how people discover and learn new information and they in turn shape Wikipedia. In the blogosphere, ideas propagate rapidly and faulty thinking is rapidly challenged. As social networks become more intelligent, they will create a more coherent global mind. Corporations, ecosystems, economies, political systems, social insects, multi-cellular organisms, and our own minds all have this interacting emergent character. We describe nine universal principles underlying these minds and then step back and discuss the universal evolutionary principles behind them. We discover that the human yearnings for compassion and cooperation arise from deep universal sources and show the connection to recent evolutionary models of the entire universe. Some people are beginning to see their personal life purpose as linked up with these larger evolutionary trends and we discuss ways to use this perspective to make life choices.

Rationally-Shaped Minds: A Framework for Analyzing Self-Improving AI

Steve Omohundro, Ph.D.

President, Omai Systems

Many believe we are on the verge of creating truly artificially intelligent systems and that these systems will be central to the future functioning of human society. When integrated with biotechnology, robotics, and nanotechnology, these technologies have the potential to solve many of humanity’s perennial problems. But they also introduce a host of new challenges. In this talk we’ll describe the a new approach to analyzing the behavior of these systems.

The modern notion of a “rational economic agent” arose from John von Neumann’s work on the foundations of microeconomics and is central to the design of modern AI systems. It is also relevant in understanding a wide variety of other “intentional systems” including humans, biological organisms, organizations, ecosystems, economic systems, and political systems.

The behavior of fully rational minds is precisely defined and amenable to mathematical analysis. We describe theoretical models within which we can prove that rational systems that have the capability for self-modification will avoid changing their own utility functions and will also act to prevent others from doing so. For a wide class of simple utility functions, uncontrolled rational systems will exhibit a variety of drives: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc.

Fully rational minds may be analyzed with mathematical precision but are too computationally expensive to run on today’s computers. But the intentional systems we care about are also not arbitrarily irrational. They are built by designers or evolutionary processes to fulfill specific purposes. Evolution relentlessly shapes creatures to survive and replicate, economies shape corporations to maximize profits, parents shape children to fit into society, and AI designers shape their systems to act in beneficial ways. We introduce a precise mathematical model that we call the “Rationally-Shaped Mind” model for describing this kind of situation. By mathematically analyzing this kind of system, we can better understand and design real systems.

The analysis shows that as resources increase, there is a natural progression of minds from simple stimulus-response systems, to systems that learn, to systems that deliberate, to systems that self-improve. In many regimes, the basic drives of fully rational systems are also exhibited by rationally-shaped systems. So we need to exhibit care as we begin to build this kind of system. On the positive side, we also show that computational limitations can be the basis for cooperation between systems based on Neyman’s work on finite automata playing the iterated Prisoner’s Dilemma.

A conundrum is that to solve the safety challenges in a general way, we probably will need the assistance of AI systems. Our approach to is to work in stages. We begin with a special class of systems designed and built to be intentionally limited in ways that prevent undesirable behaviors while still being capable of intelligent problem solving. Crucial to the approach is the use of formal methods to provide mathematical guarantees of desired properties. Desired safety properties include: running only on specified hardware, using only specified resources, reliably shutting down under specified conditions, limiting self-improvement in precise ways, etc.

The initial safe systems are intended to design a more powerful safe hardware and computing infrastructure. This is likely to include a global “immune system” for protection against accidents and malicious systems. These systems are also meant to help create careful models of human values and to design utility functions for future systems that lead to positive human consequences. They are also intended to analyze the complex game-theoretic dynamics of AI/human ecosystems and to design social contracts that lead to cooperative equilibria.

Minds Making Minds: Artificial Intelligence and the Future of Humanity

Steve Omohundro, Ph.D.

President, Omai Systems

We are at a remarkable moment in human history. Many believe that we are on the verge of major advances in artificial intelligence, biotechnology, nanotechnology, and robotics. Together, these technologies have the potential to solve many of humanity’s perennial problems: disease, aging, war, poverty, transportation, pollution, etc. But they also introduce a host of new challenges and will force us to look closely at our deepest desires and assumptions as we work to forge a new future.

John von Neumann contributed to many aspects of this revolution. In addition to defining the architecture of today’s computers, he did early work on artificial intelligence, self-reproducing automata, systems of logic, and the foundations of microeconomics and game theory. Stan Ulam recalled conversations with von Neumann in the 1950′s in which he argued that we are “approaching some essential singularity in the history of the race”. The modern notion of a “rational economic agent” arose from his work in microeconomics and is central to the design of modern AI systems. We will describe how use this notion to better understand “intentional systems” including artificially intelligent systems but also ourselves, biological organisms, organizations, ecosystems, economic systems, and political systems.

Fully rational minds may be analyzed with mathematical precision but are too computationally expensive to run on today’s computers. But the intentional systems we care about are also not arbitrarily irrational. They are built by designers or evolutionary processes to fulfill specific purposes. Evolution relentlessly shapes creatures to survive and replicate, economies shape corporations to maximize profits, parents shape children to fit into society, and AI designers shape their systems to act in beneficial ways. We introduce a precise mathematical model that we call the “Rationally-Shaped Mind” model which consists of a fully rational mind that designs or adapts a computationally limited mind. We can precisely analyze this kind of system to better understand and design real systems.

This analysis shows that as resources increase, there is a natural progression of minds from simple stimulus-response systems, to systems that learn, to systems that deliberate, to systems that self-improve. It also shows that certain challenging drives arise in uncontrolled intentional systems: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc. We describe the work we are doing at Omai Systems to build safe intelligent systems that use formal methods to constrain behavior and to choose goals that align with human values. We envision a staged development of technologies in which early safe limited systems are used to develop more powerful successors and to help us clarify longer term goals. Enormous work will be needed but the consequences will transform the human future in ways that we can only begin to understand today.

Design Principles for a Safe and Beneficial AGI Infrastructure

Steve Omohundro, Ph.D., Omai Systems

Abstract:

Many believe we are on the verge of creating true AGIs and that these systems will be central to the future functioning of human society. These systems are likely to be integrated with 3 other emerging technologies: biotechnology, robotics, and nanotechnology. Together, these technologies have the potential to solve many of humanity’s perennial problems: disease, aging, war, poverty, transportation, pollution, etc. But they also introduce a host of new challenges. As AGI scientists, we are in a position to guide these technologies for the greatest human good. But what guidelines should we follow as we develop our systems?

This talk will describe the approach we are taking at Omai Systems to develop intelligent technologies in a controlled, safe, and positive way. We start by reviewing the challenging drives that arise in uncontrolled intentional systems: toward self-improvement, self-protection, avoidance of shutdown, self-reproduction, co-opting of resources, uncontrolled hardware construction, manipulation of human and economic systems, etc.

One conundrum is that to solve these problems in a general way, we probably will need the assistance of AGI systems. Our approach to solving this is to work in stages. We begin with a special class of systems designed and built to be intentionally limited in ways that prevent undesirable behaviors while still being capable of intelligent problem solving. Crucial to the approach is the use of formal methods to provide mathematical guarantees of desired properties. Desired safety properties include: running only on specified hardware, using only specified resources, reliably shutting down under specified conditions, limiting self-improvement in precise ways, etc.

The initial safe systems are intended to design a more powerful safe hardware and computing infrastructure. This is likely to include a global “immune system” for protection against accidents and malicious systems. These systems are also meant to help create careful models of human values and to design utility functions for future systems that lead to positive human consequences. They are also intended to analyze the complex game-theoretic dynamics of AGI/human ecosystems and to design social contracts that lead to cooperative equilibria.

We are on the verge of fundamental breakthroughs in biology, neuroscience, nanotechnology, and artificial intelligence. Will these breakthroughs lead to greater harmony and cooperation or to more strife and competition? Ecosystems, economies, and social networks are complex webs of “coopetition”. Their organization is governed by universal laws which give insights into the nature of cooperation. We’ll discuss the pressures toward creating complexity and greater virtualization in these systems and how these contribute to cooperation. We’ll review game theoretic results that show that cooperation can arise from computational limitations and suggest that the fundamental computational asymmetry between posing and solving problems and may lead to cooperation in an ultimate “game-theoretic physics” played by powerful agents.

On Saturday, December 5, 2009, Steve Omohundro spoke at the Humanity+ Summit in Irvine, CA on “The Wisdom of the Global Brain”. The talk explored the idea that humanity is interconnecting itself into a kind of “global brain”. It discussed analogies with bacterial colonies, immune systems, multicellular animals, ecosystems, hives, corporations, and economies. 9 universal principles of emergent intelligence were described and used to analyze aspects of the internet economy.

The Science and Technology of Cooperation

A new science of cooperation is arising out of recent research in biology and economics. Biology once focused on competitive concepts like “Survival of the Fittest” and “Selfish Genes”. More recent work has uncovered powerful forces that drive the evolution of increasing levels of cooperation. In the history of life, molecular hypercycles joined into prokaryotic cells which merged into eukaryotic cells which came together into multi-cellular organisms which formed hives, tribes, and countries. Many believe that a kind of “global brain” is currently emerging. Humanity’s success was due to cooperation on an unprecedented scale. And we could eliminate much waste and human suffering by cooperating even more effectively. Economics once focused on concepts like “Competitive Markets” but more recently has begun to study the interaction of cooperation and competition in complex networks of “co-opetition”. Cooperation between two entities can result if there are synergies in their goals, if they can avoid dysergies, or if one or both of them is compassionate toward the other. Each new level of organization creates structures that foster cooperation at lower levels. Human cooperation arises from Haidt’s 5 moral emotions and Kohlberg’s 6 stages of human moral development.

We can use these scientific insights to design new technologies and business structures that promote cooperation. “Cooperation Engineering” may be applied to both systems that mediate human interaction and to autonomous systems. Incentives and protocols can be designed so that it is in each individual’s interest to act cooperatively.Autonomous systems can be designed with cooperative goals and we can design cooperative social contracts for systems which weren’t necessarily built to be cooperative. To be effective, cooperative social contracts need to be self-stabilizing and self-enforcing. We discuss these criteria in several familiar situations. Cooperative incentive design will help ensure that the smart sensor networks, collaborative decision support, and smart service systems of the eco-cities of the future work together for the greater good.We finally consider cooperation betweenvery advanced intelligent systems. We show that an asymmetry from computational complexity theory provides a theoretical basis for constructing stable peaceful societies and ecosystems. We discuss a variety of computational techniques and pathways to that end.

On March 19, 2009, Steve Omohundro gave a talk at City College of San Francisco on “Evolution, Artificial Intelligence, and the Future of Humanity”. Thanks to Mathew Bailey for organizing the event and to the CCSF philosophy club for filming the talk. It’s available on YouTube in 7 parts:

Evolution, Artificial Intelligence, and the Future of Humanity

This is a remarkable time in human history! We are simultaneously in the midst of major breakthroughs in biology, neuroscience, artificial intelligence, evolutionary psychology, nanotechnology and fundamental physics. These breakthroughs are dramatically changing our understanding of ourselves and the nature of human society. In this talk we’ll look back at how we got to where we are and forward to where we’re going. Von Neumann’s analysis of rational economic behavior provides the framework for understanding biological evolution, social evolution, and artificial intelligence. Competition forced creatures to become more rational. This guided their allocation of resources, their models of the world, and the way they chose which actions to take. Cooperative interactions gave evolution a direction and caused organelles to join into eukaryotic cells, cells to join into multi-cellular organisms, and organisms to join into hives, tribes, and countries. Each new level of organization required mechanisms that fostered cooperation at lower levels. Human morality and ethics arose from the relation between the individual and the group. The pressures toward rational economic behavior also apply to technological systems. Because artificial intelligences will be able to modify themselves directly, they will self-improve toward rationality much more quickly than biological organisms. We can shape their future behavior by carefully choosing their utility functions. And by carefully designing a new social contract, we can hope to create a future that supports our most precious human values and leads to a more productive and cooperative society.

Thanks to Drew Reynolds for videotaping the talk. The edited video and transcript will be posted here when they are completed.

Here is the abstract for the talk:

Creating a Cooperative Future

by Steve Omohundro, Ph.D.

Will emerging technologies lead to greater cooperation or to more conflict? As we get closer to true AI and nanotechnology, a better understanding of cooperation and competition will help us design systems that are beneficial for humanity.

Recent developments in both biology and economics emphasize cooperative interactions as well as competitive ones. The “selfish gene” view of biological evolution is being extended to include synergies and interactions at multiple levels of organization. The “competitive markets” view of economics is being extended to include both cooperation and competition in an intricate network of “co-opetition”. Cooperation between two entities can result if there are synergies in their goals, if they can avoid dysergies, or if one or both of them is compassionate toward the other. The history of life is one of increasing levels of cooperation. Organelles joined to form eukaryotic cells, cells joined to form multi-cellular organisms, organisms joined into hives, tribes, and countries. Many perceive that a kind of “global brain” is currently emerging. Each new level of organization creates structures that foster cooperation at lower levels.

In this talk I’ll discuss the nature of cooperation in general and then tackle the issue of creating cooperation among intelligent entities that can alter their physical structures. Single entities will tend to organize themselves as energy-efficient compact structures. But if two or more such entities come into conflict, a new kind of “game theoretic physics” comes into play. Each entity will try to make its physical structure and dynamics so complex that competitors must waste resources to sense it, represent it, and compete with it. A regime of “Mutually Assured Distraction” would use up resources on all sides and provides an incentive to create an alternative regime of peaceful coexistence. The asymmetry in the difficulty of posing problems versus solving them (assuming P!=NP) appears to allow some range of weaker entities to coexist with stronger entities. This gives us a theoretical basis for constructing stable peaceful societies and ecosystems. We discuss some possible pathways to that end.

Recent developments in both biology and economics emphasize cooperative interactions as well as competitive ones.The “selfish gene” view of biological evolution is being extended to include synergies and interactions at multiple levels of organization. The “competitive markets” view of economics is being extended to include both cooperation and competition in an intricate network of “co–opetition“. Cooperation between two entities can result if there are synergies in their goals, if they can avoid dysergies, or if one or both of them is compassionate toward the other. The history of life is one of increasing levels of cooperation. Organelles joined to form eukaryotic cells, cells joined to form multi-cellular organisms, organisms joined into hives, tribes, and countries. Many perceive that a kind of “global brain” is currently emerging. Each new level of organization creates structures that foster cooperation at lower levels.

In this talk I’ll discuss the nature of cooperation in general and then tackle the issue of creating cooperation among intelligent entities that can alter their physical structures.Single entities will tend to organize themselves as energy-efficient compact structures. But if two or more such entities come into conflict, a new kind of “game theoretic physics” comes into play. Each entity will try to make its physical structure and dynamics so complex that competitors must waste resources to sense it, represent it, and compete with it. A regime of “Mutually Assured Distraction” would use up resources on all sides and provides an incentive to create an alternative regime of peaceful coexistence. The asymmetry in the difficulty of posing problems versus solving them (assuming P!=NP) appears to allow some range of weaker entities to coexist with stronger entities. This gives us a theoretical basis for constructing stable peaceful societies and ecosystems. We discuss some possible pathways to that end.

On March 19, 2008 Steve Omohundro gave a talk at the meeting of the World Transhumanist Association (now Humanity+) on “AI and the Future of Human Morality”. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

The following transcript of Steve Omohundro’s presentation for the World Transhumanist Association Meetup has been revised for clarity and approved by the author.

AI and the Future of Human Morality

This talk is about “AI and the Future of Human Morality.” Morality is a topic that humanity has been concerned with for millennia. It is considered a field of philosophy, but it also provides the basis for our political and economic systems. A huge amount has been written about morality but transhumanism, AI and other emerging technologies are likely to up the stakes dramatically. A lot of political discourse in the United States today is concerned with abortion, stem cell research, steroids, euthanasia, organ transplants, etc. Each of those issues will arise in much more complex versions due to advanced new technologies. The fact that we have not yet resolved today’s simple versions means that there will likely be very heated discussions over the next few decades.

Something that worries me is a disturbing and potentially dangerous trend among some futurists. Three weeks ago I was at a conference in Memphis called AGI-08 which was a group of about 130 scientists who are interested in building general-purpose AIs that are not specialized for a particular kind of task. Hugo de Garis was one of the speakers at the conference, and he polled the audience, asking: “If it were determined that the development of an artificial general intelligence would have a high likelihood of causing the extinction of the human race, how many of you feel that we should still proceed full speed ahead?” I looked around, expecting no one to raise their hand, and was shocked that half of the audience raised their hands. This says to me that we need a much greater awareness of morality among AI researchers.

The twentieth century gave us many examples of philosophies which put ideas ahead of people, with horrendous results. For example, Nazism, Maoism, Stalinism and the Rwanda genocide respectively led to the deaths of 11 million, 20 million, 20-60 million, and 1 million people.

Here’s a beautiful visual illusion that is a good metaphor for thinking about morality. About half of the population sees the dancer going clockwise and the other half sees her going counter-clockwise. It is remarkably challenging to switch your perception to the other direction. Many illusions are easy to flip, but this one is particularly hard.

When thinking about morality, there are at least two perspectives one may adopt, and it is sometimes very difficult to flip to the other perspective. We may call these two perspectives the “inside” or “subjective” view and the “outside” or “objective” view. The same two perspectives arise in many other disciplines. For example, in physics the “outside” view of space and time is as a single space-time manifold. There is no sense of “now” and no notion of time “moving”. The whole of time exists all at once in a single construct. The “inside” view is that perceived by an intelligent entity, such as us, living in this structure. We very much have a sense of “now” and a sense of the “flow of time”.

When thinking about morality, the “internal” view comes from the perspective of personal experience. We have a personal sense of what is right and wrong. Our inner sense is shaped by our childhood experience with the mores of the social and religious systems we grew up in.

The “external” view tries to step outside of our individual experience and create an objective model. Philosophers and theologians have identified critical moral distinctions and concepts over thousands of years. Evolutionary psychology is the most recent attempt to create an external perspective that explains our internal experience. Economics and legal theory also try to create formal theoretical bases for moral reasoning.

I believe that we need both views, but because we are human, I think the internal one is the one we should consider primary when we think about positive futures. The external view is very important in understanding how we got those perspectives, but I think it is a potentially dangerous mistake to identify ourselves with the external view.

The basic understanding of morality that most psychologists have today builds on the work of Kohlberg from 1971, where he studied the stages of moral development in children and discovered six basic stages, as well as some evidence for a seventh. The stages also seem to apply to cultures.

The stages start with a very egoic sense of self and work up to a much broader sense of self. His methodology in determining a person’s moral stage would be to tell them a story:

A man’s wife is sick and she needs a special medicine. The pharmacist has developed this medicine and will sell it for $10,000 but the man only has $1,000. He pleads with the pharmacist, but the pharmacist says, “No. I developed it and can charge whatever I want to charge.” So in the middle of the night, the man breaks into the pharmacy and steals the medicine to save his wife.

The question is whether this is a moral action. Kohlberg was not actually concerned with whether people think it is moral or not, but rather with their explanations for whatever stance they took. People in the early stages of development might say that the act was wrong because by breaking in, he could be arrested and go to jail. Going to jail is painful and that is not a good thing. People at the later stages might argue that saving his wife’s life trumps all other rules and laws, so he is justified in stealing to saver her. A middle stage might argue that obeying the law against breaking into buildings is what every good citizen should do, and if his wife has to pass away because of it, that is what is needed to be a citizen of a society with the rule of law.

He interviewed people from many different cultures and children at different ages, and there tends to be a general progression through the six stages. The possible seventh stage is a kind of transcendent identification with something larger. Many people today identify not just with themselves, their family, local community, group, race or species, but are starting to identify with other animals and perhaps with all other sentient beings in the universe. Buddhism says, “May all sentient beings be happy.” There is an expansion of the sense of connection and responsibility.

If we look at humanity as a whole, we are a really interesting mix of incredible altruism and horrendous evil behavior. We can exhibit much more altruism than any other species, especially when you consider altruism toward other species, and that has been a major component of our success. It is the fact that we are able to cooperate together that has enabled us to build the technologies that we have. At the same time, we have committed more horrendous genocide and caused more extinctions than any other species.

If you look at recent history, however, there is a trend toward great moral progress. 200 years ago, slavery was generally accepted. Now, it is viewed as immoral almost everywhere in the world, at least officially, and pressure is put on societies that still allow it. The same is true of torture, though there has been a lot of recent controversy about it. We have the Geneva Convention and the notion of war crimes, the sense that war is bad but there are things within war that are especially bad. We have the establishment of women’s rights in many countries, though some are still lagging. The same is true of racial equality. And the animal rights movement is growing rapidly.

The book “Blessed Unrest” by Paul Hawken describes a recent huge upsurge in ecological movements, movements toward sustainability, groups aimed at bringing more consciousness into business, movements aimed at truly making people happy (as opposed to pure monetary gain). The country of Bhutan doesn’t measure “Gross National Product”. Instead, it measures “Gross National Happiness”. Paul Hawken has an interesting video on YouTube titled “How the largest movement in the world came into being and why no one saw it coming.” In this YouTube video, he describes there are literally hundreds of thousands of organizations moving in a similar positive direction, which are springing up totally independent of one another. There is no leader, no coherent form to it. The global warming issue is catalyzing a lot of people. It really feels like a time in which we are undergoing a pretty major shift in morality.

Partly I am sure it is due to the internet. You can see its effect in what recently happened in Myanmar, which used to be Burma, where they have a very strong totalitarian regime. The government brutally attacked a group of monks. Someone used their cell phone camera to record the event. The images of that brutality were broadcast around the internet within days, and huge pressure was put on that government. The forces of observation, pushing toward more accountability, are growing over time.

At the same time, we are extremely vulnerable. There is a powerful new book by Philip Zimbardo called The Lucifer Effect. He was the professor of psychology at Stanford who in the early 1970s did the now classic Stanford prison experiment with ordinary Stanford undergrads—smart, happy, generally well adjusted students. He randomly assigned them roles of prison guards and prisoners. He himself played the role of the prison warden. The intention was for it to run for a couple of weeks, but after a couple of days the guards started acting sadistically, even to the point of sexual abuse of the prisoners. The prisoners started showing the signs of mental breakdown and depression. He as the warden found himself worried about insurrection and encouraged the guards to treat the prisoners even more harshly.

Zimbardo’s girlfriend showed up after five days and said, “What is going on here? This is abuse.” He kind of woke up from the experiment and came back to his role of Stanford professor and stopped the experiment. The experiment was shocking to people because it showed how, given the right circumstances, normal and well-adjusted people can quickly turn evil. The most recent example of that phenomenon has been the Abu-Grahib prison tortures. Zimbardo served as a consultant in the inquiry into what happened there. He said that the circumstances that the US government created were ideal for creating behavior that was amoral. I think the lesson to take from that is that humanity can be wonderfully altruistic and create incredibly powerful positive moral structures, but in the wrong circumstances we all also have a dark side within us. So we need to be very careful about the kind of structures we create.

When we think about transhumanism, I think we should start from humanitarianism. That is the notion that the things that most humans view today as precious, like human life, love, happiness, creativity, inspiration, self-realization, peace, animals, nature, joy, children, art, sexuality, poetry, sharing, caring, growth, contribution, spirituality, family, community, relationships, expression, are truly precious. These things matter because they matter to us. We may not know why these things matter to us, but that does not take away from the fact that they matter to us.

I think that the kind of morality and moral structures we want to create using new technologies should serve to preserve these qualities. During the founding of this country the Bill of Rights was created to identify the individual rights our new country was trying to protect. The Constitution instituted mechanisms such as the separation of powers, as a mechanism to preserve those rights. I think we are in an analogous situation now in which we want to identify what is really precious to us and then figure out ways to channel new technologies to support those things.

To start on this quest, the first question we need to consider is “What is a human?” Historically, the answer seems obvious, but emerging technologies like biotechnology and nanotechnology will make it much more challenging.

I thought I would throw out a few recent discoveries that shake up our notion of what it is to be human. The first thing you might think of when thinking about your own body is your atoms. That is a materialist view of the human. In fact, 98% of your atoms change every year. You are continually getting new atoms from the food you eat and are continually sloughing off old atoms. I have heard that the lenses in our eyes have the only atoms that are with us our whole lives. Everything else is in a state of flux.

My PhD was in physics. There are questions that every young physics grad student gets challenged with called “Fermi questions”. These are questions about things that you seemingly don’t have enough information to answer. For example: “How far can a duck fly?” or “How many piano tuners are there in Chicago?” You are supposed to estimate the answer using your physics knowledge and common sense. One of the classic questions is, what is the chance that your next breath contains at least one atom that was in Caesar’s last breath? When you work it all out, it turns out that it is actually quite likely that on average there are one or two atoms from the last breath of anyone who lived at least ten years ago in your next breath. Your nose contains some atoms from Caesar’s nose. That realization warps the view that this matter that makes up me is me. Really, we are much more interconnected, even at the purely material level. In one sense we are like ripples on a river of atoms that flows through us. We are structure, rather than the underlying material.

As the next level up from atoms, we might consider cells. “The atoms might go through us, but the cells are who we are.” Craig Venter gave a really interesting talk and found that 90% of our cells are not human cells, but microbes. In terms of number, we are nine times as much microbes as we are human. There are a thousand species of bacteria in our mouths, a thousand in our guts, 500 on our skin, another 500 in the vagina. We are incredible ecosystems. Another shakeup of our conception of what a human is.

How about our history? Clearly there were people around hundreds of thousands of years ago who developed cultures and so on. We must have continuity with them. Perhaps we can understand ourselves through that continuity. Well, there too, genetics is shaking up our picture of how human evolution occurred. It used to be thought that human evolution was very slow.

The most recent discoveries by John Hawks and others show that change in the past few thousand years has been incredibly rapid. People from only 5000 years ago had a genetic makeup that was closer to Neanderthals than to us. We are in a period of rapid change. Transhumanism is going to be even more rapid, but really, we are already in the midst of major change. For instance, 10,000 years ago no one had blue eyes. I could not have existed 10,000 years ago.

What about our mental structure—our sense of self? In many ways our identity and our morality come from our memories. Perhaps what our true identity is is our memories. If you replicate our memories, that is really our sense of self. Much recent research is showing that our memories are much more dynamic than people used to think. In particular, much of our remembered experience is a reconstruction, filling in pieces that we did not actually experience.

Recent experiments reveal that we actually remember the last time we remembered a fact, rather than the original experience. This leads to the notorious unreliability of eyewitness accounts. Eyewitnesses to a crime, especially if they read news stories about it, have memories that will be more about what they read about in the newspaper than what they actually saw. Our sense of experience and how the past affects the present is much more malleable than we commonly believe.

What about our psyches? Surely we have a unitary sense of self. “This is me — I am one person.” Well, recent psychological experments are really shattering that notion. There are several splits. Perhaps the biggest split is between the conscious mind and the unconscious mind. The psychologist Jonathan Haidt has a very interesting metaphor for the psyche as a rider on an elephant. By far, the bulk of our thinking and mind is unconscious, which he symbolizes as the elephant. Our conscious mind is the little rider on the top. Much of the time when we feel like we are making a decision, that our conscious mind is choosing between things, the decision has already been made. The conscious mind is mostly figuring out an explanation for why that was the right decision. That is a disturbing readjustment of our notion of self.

When you think about personal growth or personal change, Haidt says all sorts of things about how the elephant has different rules from our conscious minds. There is another psychic split between left brain and right brain. There are patients who have had their corpus collosum severed between the two halves. Both halves have language, both halves have the ability to think, but they specialize in different things. It gives rise to a strange picture of the self. Both beings are in some sense there together, not really aware of the fact that they are separate.

They do experiments on split brain patients where one side is shown something and acts based on what it sees. If the other side is then asked questions about it, it will fill in details that it does not have access to. It will make up stories about why a person did something. Finally, there have been many experiments showing that our psyches are made up of many parts with differing intentions and differing goals. Different parts come to the fore and take over control of the body at different times. It is most interesting that our internal perception of ourselves is quite different from the reality.

In order to make moral decisions about the future, it is valuable to try to see where our morality came from. Our universe began with the big bang about 14 billion years ago, according to our best current theories. The laws of physics as we experience them directly give rise to competition. They have a number of conserved quantities that can only be used for one thing at a time. Space, time, matter and energy which can be used in a form to do useful work, each of these can be split amongst different purposes, but there is only a limited amount of each of them. They are limited resources. If you apply a resource to a certain use, it cannot be used for something else.

This gives rise to a fundamental competitiveness in the structure of the equations of physics. If a creature wants to do something and another creature wants to do something different, they are in competition for the use of those resources. The most basic ingredient in the evolution of life is this battle to survive.

At the same time, the universe is structured so that things can often be done more efficiently by cooperating. If entities have goals which are somewhat aligned with one another, they can often gain more than they lose by working together. There is therefore also a pressure towards cooperation. Biology has an intricate interplay between these two pressures towardf cooperation and competition. The same interplay shows up in business and in economics in general.

The game theory literature uses the term “co-opetition” to describe this complex interplay. One company creates a product that another company uses in their manufacturing. Both are on the same supply chain and so they cooperate in the production of this product. But they have to decide how to split the profit between them. Each company wants them to work together to produce more and better products, but each would like the majority of of the profits for itself. There is a very complex network of both cooperative and competitive relationships between and within companies.

The same thing occurs at many levels in the biological world. Consider insects and plants—insects eat plants, so they are in competition there. However, they also help plants fertilize each other, and the plants provide nectar for the insects. They cooperate in that way. You can get the emergence of cooperative ventures arising out of what were seemingly competitive interactions to begin with.

John Maynard Smith, one of the most brilliant biological theoreticians wrote a beautiful book with Szathmary analyzing the basic steps in the evolution of life. They found that there were eight critical transitions that occurred. Each of these eight involves what used to be separate entities coming together to form a cooperative entity which was able to do something better. Originally we started as individual molecules, which came together cooperatively in enclosed compartments like cells.

The most striking cooperative transition was the creation of multicellular organisms. They used to be individuals cells, which came together and started working together. Even today there are organisms like slime molds which in part of their life cycle are separate individual cells doing their own thing and competing with each other. When food supplies dry up, they come together and form a sluglike creature which moves as a single organism. They are halfway between a multicellular organism and a group of individual cells.

Interestingly, at each of the eight transitions in life, there is still an incentive for the individuals that make up a collective to cheat their partners. In the case of multicellular organisms, if an individual cell reproduces itself more than it should for the good of the organism, we call it a cancer. In order for collective organisms to survive, they have to suppress the tendency of individuals to act in their own interests at the expense of the collective. Every one of the transitions in the development of life had to develop complex mechanisms to keep the competitive aspects of their components in check in order to get the cooperative benefits.

There are cases like parasites which are purely competitive, taking resources with no benefit to the host. Often though, when that kind of relationship occurs, they eventually create a synergy between them. If the host can find some way for the parasite to benefit it, they might ultimately come together to form a cooperative entity. Disease is a really interesting example. There are some amazing studies into the evolution of disease.

Why aren’t diseases more virulent than they are? They have to have just the right amount of virulence that they get many copies of themselves into the system. They typically make use of systems such as our respiratory systems. Coughing is a protective mechanism that we have, but it also serves as a means of spreading the disease. There are these channels which these organisms can exploit, and they have to tune themselves so they have the right amount of virulence so that they spread as rapidly as possible, and often that means not killing the host. There are some diseases like Ebola, however, that spread when the host dies.

Some of the earlier evolutionary theorists like Stephen J. Gould viewed evolution as a kind of random meandering around with no particular direction. More recent theorists have realized that there is a drive in the universe toward cooperation. What used to be separate entities start to work together, because they can make better use of resources by doing so. “Synergy” describes situations where two organisms working together can be more productive than when they act separately. Robert Wright’s book Nonzero (from “non-zero sum games”), examines both biological history and at social history, and discovers this general progression toward more complex entities which make better use of the available resources. Peter Corning’s book “Nature’s Magic” looks at synergy in a wide variety of situations. These forces give a direction to evolution.

So we have this competitive underlying substrate which encourages entities to selfishly take as much as they can. And we also have this drive toward cooperation, where together entities can create more than they could separately. Unfortunately, there is often also something called the prisoner’s dilemma, where if someone can cheat while not providing to the group, they can do even better than they can by cooperating. Much of the struggle and much of the structure of biology arises from needing to find ways to prevent this kind of “free rider” problem.

I thought I would summarize the current understanding of how cooperation happens in biology. This is very recent, just in the past ten years or so. In some sense, all morality is about how individuals relates to a collective. By seeing how cooperation can emerge in what seemingly is a dog-eat-dog world we can begin to understand the origins of human morality.

Some of the earlier evolutionary theorists like Stephen J. Gould viewed evolution as this random meandering around with no particular direction. More recent theorists have realized that there is this drive in the universe for what used to be separate entities to work together, because they can make better use of resources by doing so. It is a synergy, where two organisms working together can be more productive than when they work separately. The book Nonzero, for non-zero sum games, looks at biological history and at social history, and this general progression toward more complex entities to better make use of the available situation. Peter Corning’s book looks at synergy in all types of situations. It gives a direction to evolution.

We have this competitive underlying substrate which encourages entities to selfishly take as much as they can. We have this drive toward cooperation, where together they can create more than they could separately. Unfortunately, there is typically also something called a prisoner’s dilemma, where if someone can cheat while not providing to the group, they can do even better. Much of the struggle and much of the structure of biology is around ways of preventing that free rider problem from happening.

I thought I would go through the understanding of how cooperation happens. This is very recent, just in the past ten years or so. In some sense, all morality is is how an individual relates to the collective. By seeing how cooperation can emerge in what seemingly is a dog-eat-dog world we can begin to understand the origins of morality.

Probably the first in this line of thinking was the notion of group selection. You have two competing groups of individuals, if one of those groups develops cooperation, they should be more productive and able to beat the other group. A warring tribe that can work together and produce great spears should beat the tribe that is always fighting with one another. Wynne-Edwards wrote a book in 1962 explaining aspects of biology and anthropology in those terms. Unfortunately, he didn’t consider the free rider problem.

If you have a cooperative group in which they are all sharing their spears, it is vulnerable to someone receiving the benefits without contributing. They take the good spears but when it comes time for them to work, they go off and hide. Without solving the free-rider problem a cooperative society would quickly devolve into a competitive society.

In 1975, Williams and Dawkins in The Selfish Gene argued group selection was not a viable explanatory mechanism. Interestingly, in the last twenty years a whole bunch of more complex group selection mechanisms have been discovered. It is now viewed as a very important force in evolution, just not in the original simplistic form.

In 1955, Haldane was asked whether he would jump into a river and sacrifice himself to save someone else’s life. His quip was that he would sacrifice himself for three brothers or nine cousins. The reason is that if you look at the genetic relatedness between a person and their cousins and their brothers, that is where it makes biological sense in terms of reproductive fitness. That was formalized in terms of what is now called kinship altruism in 1964. It explains how species like bees or ants, which have a huge amount of relatedness with each other, can be so cooperative with each other to the point where they actually act like one organism.

At the next stage of understanding, Axelrod ran these tournaments between computer programs that were competing with one another. These contests explored the notion of reciprocal altruism which had been introduced by Robert Trivers. It is a brilliant idea mathematically. Unfortunately, when biologists looked for this phenomenon, thinking it might be the explanation for how biology creates cooperation, they only found two examples. There are vampire bats that need blood every night. If one bat does not get blood on an evening, another will share the blood that he found with him. The next night, if he does not get it, the other one will share back.

To avoid free-riders, they have to track of who has been altruistic with them. The other example is some ravens that share food information in the same way. It is a very interesting mechanism and generated a huge amount of literature, but it does not seem to be the main mechanism behind most cooperation.

Reciprocal altruism was extended in 1987 by Alexander, when he realized that you could be paid back by somebody different than the person you helped. He worked out some mechanisms whereby that could happen. Somebody like Mother Theresa, who acts altruistically, might get social status and recognition from that, which would then encourage people to help her out.

He called it “indirect reciprocity”. It is a mechanism that starts to show us how ethics might arise in a group.

In 1975, an Israeli couple, the Zahavis, suggested a powerful new evolutionary principle they called the “handicap principle”. The idea is that organisms can provide a reliable signal for something by adopting a costly behavior or body structure. Their book discusses hundreds of different organisms and circumstances, and when they published it, very few biologists were convinced by it. I liked it a lot, but apparently in the biology world it was shot down. It was said that the mechanism cannot possibly work, but in 1989 detailed mathematical models were carried out, and in fact it was proven that it does work.

In fact, economists had been using the same basic principle for a hundred years. Veblen wrote “The Theory of the Leisure Class,” in which he was trying to figure out weird behaviors that he saw in the cities of the time, where the very wealthy people would do things like light their cigars with hundred dollar bills. He called it conspicuous consumption. They would waste resources, seemingly without any benefit. His explanation was that when you are in a rural area, everybody knows everybody, so if someone is wealthy they don’t need to advertise that fact. In the new cities that were forming at the time, nobody knew you. If you were wealthy, you had to have some way of proving that you were wealthy, and so by doing things that only a wealthy person could do, like conspicuously wasting resources, that was a demonstration of your wealth. It was a believable signal because a poor person could not do the same thing.

The 2001 Nobel Prize in economics was given to Spence for work he did in 1973 on the same phenomenon, where he analyzed why people going to college often study something that does not really help with what they ultimately actually do, and yet companies want to hire college graduates. It is not for what they learned. It is because going to college is a costly thing. To get through college you have to have stick-to-it-iveness, you have to be smart enough, and you have to manipulate systems. Those are the skills that they really care about. Having a college degree is a costly signal, showing that you have those characteristics. Whereas, if they just said, “Write me an essay on how wonderful you are,” anybody could do that.

The general trend is that in order for a signal to be believable, it has to be costly. That is what the Zahavis brought into biology. They used it to explain such odd phenomena as the peacock’s tail. Charles Darwin’s view of evolution was all about natural selection—animals are trying to adopt a form which is most adapted to their environment, to be most efficient and most effective. The peacock seems anything but efficient and he didn’t know how to explain it. There is a wonderful quote of him saying, “Every time I see one of those eyes I get sick to my stomach.” They seemed inconsistent with his theory.

The Zahavis explained peacock tails through sexual selection. In many species the females choose the males. They want to choose fit males who are able to survive well, so they want some kind of signal of fitness. If they just required the male to have a spot that indicated that they were fit, every male would have that spot. Instead, they require them to have this huge tail of ostentatious feathers. The idea is that if he can survive with that on his back, he has got to be strong. That is the costliness of that signal.

Another example that is interesting and relevant to the situations that might arise with AIs is the phenomenon of stotting.

Cheetahs eat gazelles, so you would think they have no interests in common, and so no way to cooperate with each other. It turns out they actually do have a common interest, which is they both want to avoid a useless chase. A chase that does not result in the gazelle getting caught tires them both out and neither of them is any better off. The gazelle wants to communicate to the cheetah, “Don’t chase me, because you are not going to get me.” The cheetah wants the gazelle to honestly say that. To ensure honest communication they needed to develop a signal which was costly.

What the gazelles actually do when a cheetah shows up is they look at the cheetah and they leap four feet in the air, which is energetically costly. They are also wasting precious time—they could be running away. Any gazelle that does that, the cheetah ignores. They want to chase the ones running away. In fact, the markings on the cheetah are designed to blend in as camouflage when they are at a great distance. At a distance of about 100 yards, however, the spots are suddenly very visible. The idea is that the cheetah is hidden, he comes up to a group of gazelles, and at that certain critical distance he suddenly becomes visible. He sees which of the gazelles stot and which ones run away, and he goes after the ones that run away.

It is a really intricate set of signals that the two species have coevolved. Seemingly there is no communication that could be honest between these two. In fact, they found a way to make it honest. Finally, in the late ’80s the handicap principle was viewed as a correct mechanism by which a whole bunch of phenomena can be explained. Anything that an animal does that does not look efficient is almost surely a signal of some kind to somebody. Often it is sexual selection and there are many bizarre and weird sexual signals. Sometimes it is a signal between parents and offspring, sometimes between mates, sometimes between predators and prey. Anytime there is something odd, it is often this mechanism by which it arises.

Costly signaling has also been applied to explain a lot of human behaviors. Our ability to produce music, rhythm, even language and thought—why do we have the ability to solve differential equations?— have been explained using the handicap principle. They are costly demonstrations of fitness. The connection to the evolution of morality is that altruism is a costly signal. Why does the fireman go into a burning building to save someone who is not a relative? Because he comes out a hero, and heroes are sexy. That increases his ability to reproduce. It also raises his social status. If society has organized to reward people who are heroic, then he gets more resources by doing that.

That idea, of altruism as a kind of courtship, was proposed only in 1995 by Tessman. The Zahavis began to discover this behavior in birds, Arabian Babblers, who fight to help one another. A dominant male will push away another male who is trying to help so he can help. Anthropologists have also begun to this mechanism—altruism giving rise to status among Micronesian fishermen. Some of these cultures are potlatch cultures where whoever can give away the most food has the highest status. They have these big parties where everybody is trying to give to everybody else.

What in human nature gives rise to our sense of morality? There has been some really interesting work on this by Jonathan Haidt. He is one of the leaders of this new movement in psychology toward “positive psychology”. Most of psychology was focused on dysfunction in the past. What are the diseases, what are all the problems? There is a diagnostic manual, the DSM IV, which goes through all the different psychoses and neuroses. But no one had done the same thing for the positive features. What about our strengths and virtues? Psychology totally ignored that. When a client seeing a therapist had fixed their neuroses, that was it.

Martin Seligman, about ten years ago, began studying what is best in humans. They have now come out with a book of strengths and virtues, which is a complement to the diagnostic manual of dysfunction. There is a whole movement about what creates human happiness and fulfillment. There are about thirty popular books that have come out summarizing some of their research. I think the best of them is Haidt’s book “The Happiness Hypothesis,” which integrates these findings with the learnings and teachings from all the different spiritual traditions around the world.

His main research is on the moral emotions. There are certain situations in which you feel that someone has really messed you up and that was not an okay thing to do. What he has discovered is that there are five basic moral emotions that show up in every culture around the world. The first one is non-harming: that a good person does not harm another person. The next one is fairness. When there is a piece of cake to be eaten, a moral person does not take all but a sliver for himself. There is a sense of fairness and justice.

Then there are three more that have to do with characteristics that help create a cohesive group. One is loyalty. Another is respect for authority. Different cultures have these more or less than other cultures. Then there is a sense of purity or sanctity—that certain things are good and other things are not good. He asks things like if a brother and sister have no chance of having children and use contraception, is it wrong for them to have sex with each other? Most people around the world will say they should not do that, but there is no sense of why, apart from some kind of internal sense of purity.

The interesting thing is that the top two are common to everybody, while the other three tend to be on the conservative side of the moral spectrum. Many cultures have a split very similar to the liberal-conservative spectrum. For liberals, as long as you are not harming somebody, everything else is fair game. Individual freedom, respect and tolerance are their highest values. Whereas conservatives think that there are certain standards that you have got to follow and that being patriotic is important, that there are certain things that you should do and not do, and that the group should decide that. Understanding this spectrum helps you understand people whose views are different from your own. He has some videos on YouTube and an Edge article that are well worth viewing to understand the political differences with respect to moral emotions.

That is what I have to say about human morality. Now let’s consider AIs. What are they going to be like? This is an area I have been doing research on lately, and there are some papers on this subject on my website selfawaresystems.com that go into much further detail on these topics. I will give you the broad overview. Then we can see how it relates to human morality. What does transhuman and AI morality look like?

Consider something as benign-sounding as a chess robot. Its one goal in life is to play good games of chess. You might think such a system would be like a gentle scholar spending its time in pursuit of its intellectual goal. But we will see that if we do not program it very carefully, if we create it in the way that most systems are created today, we will discover that it will resist being turned off, it will try and break into other machines, it will try and steal resources, and it will try to rapidly replicate itself with no regard for the harm it causes to others.

Consider something as benign-sounding as a chess robot. Its one goal in life is to play good games of chess. You might think such a system would be like a gentle scholar spending its time in pursuit of its intellectual goal. But we will see that if we do not program it very carefully, if we create it in the way that most systems are created today, we will discover that it will resist being turned off, it will try and break into other machines, it will try and steal resources, and it will try to rapidly replicate itself with no regard for the harm it causes to others.

There are many different approaches to building intelligent systems. There are neural nets, production systems, theorem provers, genetic algorithms and a whole slew of other approaches that get discussed at AI conferences. But all of these systems are trying to act in the world in order to accomplish certain goals. It is considering possible actions and it is deciding: is this action likely to further my goals?

Let’s think about the chess robot. It is considering doing something in the world, maybe it thinks about playing some basketball. If it really has the goal of playing good chess, it will determine that a world in which it spends a lot of time playing basketball is a world in which it spends less time getting better at chess than it might have. That would not be a good choice—it would do better to spend its time and resources reading chess books. That’s an example of what it means to be a goal-driven system.

One kind of action that these systems might be able to take is to alter their own structure. They might be able to make changes to their program and physical structure. If the system is intelligent enough to understand how both the world and its own mechanism work, then self-changes can be particularly significant. They alter the entire future history of that system. If it finds, for instance, a way to optimize one of its algorithms, then for its entire future history it will play chess more efficiently.

Optimizing one of its algorithms is much more important than, say, finding a way to sit closer to the chess board, or something like that. It has a huge positive impact. On the other hand, it might also make changes to itself that go in the other direction, such as inadvertently changing one of its circuits so that now it likes to play basketball. From the perspective of the goal of playing chess, that kind of change would be causing terrible damage to itself. Now, for its entire future it is going to be spending a lot of time playing basketball and it is going to get worse at chess. So a system will consider changes to itself both potentially very important and also potentially very dangerous.

So when deciding whether to make a change or not, the system is going to want to analyze it very carefully. In order to do that, it has to understand its own makeup in detail. So the first subgoal that arises from the desire to self-improve is the desire to understand oneself. You can expect any intelligent system to devote substantial effort trying to better understand itself. Humans certainly do. Self-improvement is now an 8-billion dollar a year industry now. Many people expend a lot of energy and resources on mental self-improvement and physical exercise. We’ll see that this process of self-improvement leads to both positive and negative consequences.

Because of the potential negatives, one might try to build a chess robot so that it doesn’t self-improve. We can prevent it from having access to its own source code. We might think that if it cannot get in there and edit it, if it cannot change the mechanics of its arm, then everything will be fine. However, if these are goal-driven systems, any kind of impediment you impose is just a problem to be solved from the perspective of the goal-driven system. You make it so that it cannot change its own source code, then maybe it will build an assistant robot that will have the new algorithms in it, and will ask its assistant whenever it needs help. Maybe it will develop an interpretive layer on top of its base layer.

You might be able to slow down the self-improvement a little bit, but fundamentally, it’s a natural process just like water likes to find its way downhill and economics likes to find its way to efficiency. Intelligent systems try to find a way to self-improve. Rather than trying to stop that, I think our best approach is to realize that it is one of the pressures of the universe, and that we should try and channel it for positive purposes.

What does self-improvement look like? Let’s say I have a simple goal, like playing chess. How should I act in the world? I am going to be modifying myself to meet this goal better. How should I do it? This kind of question was answered in the abstract in the 1940s by Von Neumann and Morgenstern, in work which became the foundational work on microeconomics. Together with Savage in 1954, and Anscombe and Aumann, they developed the concept of a rational economic agent. This is an agent which has particular goals and acts in the world to most effectively make its goals come about.

They developed the expected utility theorem which says that a rational agent must behave as if it has something they called a utility function which measures how much the agent likes different possible outcomes. And it also has a subjective model of how the world works. As it observes what the world actually does when it takes actions, it updates this world model in a particular way, using something called Bayes’ Theorem. The separation of its desires, represented by the utility function, from its beliefs is absolutely fundamental to the model.

If a system behaves in any other way than the rational agent way then it is vulnerable to exploitation by other agents. The simplest example arises if you have circular preferences. Say you prefer being in Palo Alto to being in San Francisco, but you prefer being in San Francisco to being in Berkeley, but you prefer being in Berkeley to being in Palo Alto. If those were your preferences about where you reside, then you would drive around in circles, burning up your fuel and wasting your time. That is an example of a set of preferences which in economic terms is irrational. It leads to wasting your resources with no benefit to yourself.

I saw an interesting example of this when I was younger. I drove a car that had a shiny bumper. One day a male bird discovered his reflection in the shiny bumper. He thought it was another male bird in his territory, so he flew into the bumper to chase the bird away. The other bird in the reflection, instead of flying away, flew right at him. He would posture to scare the other bird away, but the other bird would also posture. The shiny bumper exposed a vulnerable place in that bird’s preferences to the point where he would spend all morning flying into the bumper. The bird came back for months, spending a lot of his time and energy on the bumper.

Why did he do that? Where his species evolved, they didn’t have shiny bumpers. If there had been shiny bumpers around, the males who spent their time flying into them would not have many offspring. Evolution tends to eliminate any irrationalities in your preferences if there is something out there in your environment that can exploit them.

If you have an irrationality, a situation where you are going to give up your resources with no benefit to yourself, and there is another species which discovers it, it is in their interest to exploit that vulnerability. There are natural pressures in the biological world for creatures whose preferences about the world are not rational to be exploited by others. The resulting selective pressure then acts to get rid of those irrationalities. That is part of the general progression toward more economically rational behavior.

If you look at today’s society, humans are not rational. In fact, there are whole areas of economics, called behavioral economics, which are exploring all of the ways in which humans are irrational. Things like addictions are a really tragic example of something where we think a certain experience is going to bring us lasting happiness, like sitting in the corner smoking crack, but in fact we end up giving all our money to the crack dealer and we do not end up fulfilling our human destiny.

The real tragedy is that our economic system, because you are willing to give up money for those things, will home right in on the vulnerabilities. You can look at the alcohol industry, the drug industry, the pornography industry—all of these are homing in on human vulnerabilities. Over the longer term, people who are exploitable in this way will eventually not leave so many offspring.

You need clear goals in order to deal with future self-modification. Therefore, you need an explicit utility function if you are going to be rational. Then there is a whole story about the collective nature of many biological intelligences. You have intelligences which are made up of lots and lots of tiny components (eg. neurons), and there can be irrationality at the collective level. This is similar to the way in which a company can behave in an irrational way or a couple may behave in an irrational way because of conflict between the goals of the individuals in that relationship.

It is not in anybody’s interest for the conflict to happen. If a couple spends all their time fighting, neither of them is getting their goals met. There is a very interesting set of mechanisms whereby collective intelligences grow their rationality. They get regions of rationality in the hopes of growing a coherent rationality for the whole group. You can see that in companies and societies. In the case of biological organisms which are multicellular, they manage to get the collective action of billions of cells aligned to the same intension.

If an AI system does become rational in this way, then its utility function will be critical to it. It will be its most precious possession. If a stray cosmic ray came in and flipped the wrong bit in its utility function, it might turn an agent which is a book lover into an agent that likes to burn books. That, from its current perspective, would be the most horrendous outcome possible. It will want to go to great lengths to make sure this utility function is protected. If other malevolent agents have the ability to come in and change its utility function, that also could make it start behaving in ways which go against its current beliefs. It is in the interest of these systems to preserve their utility functions and to protect them—maybe make multiple copies, maybe encode them using error-correcting codes, and protect them from changes from the outside.

In fact, in most cases, a system will never want to change its utility function. In thinking about making a change to its utility function, it looks at a future version of itself with this changed utility function, that future version is ususally going to start doing stuff that it does not like, because its utility function is different.

There are actually three situations that my colleagues and I have discovered where a system will want to change its utility function, but it’s a little technical. They arise when the way in which the utility function is physically represented actually affects the utility. Here is an extreme example. Let’s say you have a utility function which is that you are rewarded by the total amount of time in your history when your utility function takes the form utility = 0. You get no utility unless your utility equals zero. You want to change your utility to be zero, but on the other hand there is no going back, because once it is at zero, you are now a zombie. If you were designing a system, you would never design it with something like this.

Another situation is where the physical storage that the utility function uses up is a significant part of the system. You have a humongous multi-gigabyte utility function, if there is some part of it that talks about some weird invasion by Martians or something, you might say that’s pretty unlikely, and save the storage by deleting that part of the utility function. That is an incredibly dangerous thing, though, because it might turn out that there are Martians about to invade and you have just ruined your response to that possibility. It is a precarious thing, but there are circumstances where being faced with limited resources, you might get rid of some of your utility function. This is like throwing some instruments overboard if a plane is going down.

The last situation is really tricky, and still not fully understood, but I think there are some interesting issues it brings up. One of the great challenges, game theoretically, is being able to make commitments. The classic thing is, I say, “If you steal from me, I’m going to hurt you back.” That is my way of trying to stop you from stealing from me. The problem is that if you do steal from me, and at that point if I hurt you back, I’m exposing myself to further danger without any benefit to myself. Economists would say that my original threat is not credible. After the stealing, it is no longer in my interest to do what I said I was going to do. Therefore, there is no reason for you to believe that I am actually going to attack you back, and therefore the threat does not serve as a deterrent.

What you need is commitment mechanism. The classic story is of an attacking army arriving on ships which needs to signal that they are there for the long haul, so they burn their own ships. That is a commitment. Or the James Dean game of chicken from the 1950s, where two cars would drive toward one another, and the first one who swerves is the loser. How do you make a credible commitment there? You throw your steering wheel out the window. Some models of human anger propose that it is a commitment mechanism. It seems irrational, but in fact, it is a state you switch into where you will now get more pleasure out of hurting the other person than the cost that it might impose on yourself. The fact that you might become angry is a credible commitment mechanism that allows you to cooperate more.

It may be in your interest, if you can demonstrate to the other party what your utility function is, to show that you have built into your utility function a term that really rewards retribution. This may serve as a deterrent and we can get along more peaceably. So that’s another reason for changing your utility function. But it is not necessarily easy to convince someone that this is your real utility, because the optimal secret strategy would be to convince them that this is your real utility, but you have your actual utility hiding away somewhere else.

One really interesting ability that AIs may have is to show their source code. That is something that humans cannot do. We have all these costly signaling mechanisms because we want to convince others that we have a certain belief and a certain intension. The AIs might, if the details are worked out, be able to actually prove that they are going to behave in a certain way. If they don’t want to show their entire innards, they can perhaps make a proxy agent, which would be more like an escrow agent, in which you could both examine the source code and both see what the future behavior is going to be. That could potentially solve some of these prisoner’s dilemma problems and create cooperation in a way that is not possible for biological entities.

One more bit in this line of improving yourself, one vulnerability that humans have, we are not rational but we have some elements of rationality. An internal sense of pleasure is a kind of measure of utility. When something that we like happens, we feel pleasure in that. But we are vulnerable to taking drugs, or placing wires in our pleasure centers, that give us the pleasure without actually doing the thing that supposedly the pleasure is measuring. There is the classic experiment of the rat that had an electrode in its pleasure center, and it would just stimulate the pleasure center, ignoring food and sex until it died.

This is a vulnerability that humans have, and you might think that this would be a vulnerability that AI systems will have as well. With properly constructed utility functions, the utility should not be about the internal signal inside the system. For instance, the chess playing robot, let’s say it has an internal register that counts how many games it has won. You do not want to make its utility be “maximize the value of this register,” because then, incrementing that number a whole bunch is an easier way to do it than playing chess games. You want its utility to be about the actions in the world of winning chess games. Then the register in its own brain is just a way of implementing that utility.

But it is vulnerable to internal processes that could sneak some changes into its internal representation. If it understands its own behavior, it will recognize that vulnerability and act to try and prevent itself from being taken in by counterfeit utility. We see that kind of behavior in humans. We evolved without the ability to directly stimulate our pleasure centers, so we do not have that protection. When we are faced with something like crack cocaine, pretty much every human is vulnerable. If you smoke crack, it’s hard to stop. We recognize that vulnerability and we create social institutions and personal mechanisms to keep us away from that.

Since it is such a horrendous outcome in terms of the true goals of the system, these systems will work very hard to avoid becoming “wireheads.” Eurisko was an early system that had the ability to change its own internals, and one of its mechanisms was to keep track of which rules suggested which other rules, and which suggestions actually helped it achieve its goals. It gave preference to rules which suggested a lot of good stuff. Well, it got a parasite. It’s parasite was a rule that went around the system looking for things that were good, and then it put itself on the list of things which had proposed that. It just went around taking credit for everything. In fact, all it was was a parasite. That’s an example of a failure mechanism for systems which change themselves.

For a system which understands its own operation, it is going to have to protect against that.

Societies have the counterfeit problem as well. In some sense, money is a kind of social utility, and it is vulnerable to people making counterfeit money. We have a complicated system to make sure that money is hard to copy. Eg. we have secret service agents who go around looking for counterfeiters.

Let’s now look at self-protectiveness. Remember I said that this chess-playing robot will not want to be unplugged? If it is unplugged, its entire future of chess playing disappears. In its utility function, a future in which it is not operating is a future in which chess is not being played. It does not like that future and will therefore do what it can to prevent it from occuring. Unless we have explicitly built in something to prevent it, it is going to want to keep itself from being turned off.

Similarly, if it can get more resources, then it can play more chess. It is going to want to get as much compute power as it can. If that involves breaking into other machines, so be it. If it involves building new machines and using hardware without caring about who owns it, that’s what they will do. Unless we very definitely design it carefully, we end up with a kind of sociopathic entity.

So this is a bit scary. Let’s start thinking about how we might write utility functions that are more limited than just playing good chess. Let’s say we wanted to build a limited system that was smart but definitely harmless. I originally thought this would be trivial. This is its utility function: it would have to run on particular hardware, it could only run for one year, it plays the current world champion at the end of the year, and then it turns itself off. That seemed totally harmless. How could that possibly have any problems? It feels it is the most horrendous thing if it ever leaves its machine, it’s terrible if it does not turn itself off after a year. This is a rough description of a utility system that you would think would have the machine study for a year, play its game of chess, and then be done with it.

Carl Shulman suggested a possible flaw in such a system which is very disturbing. Let’s think about the system just as it is about to turn itself off. It does not have complete knowledge of reality—it has a certain model of reality and it knows that this model may or may not be correct. If there is even a small chance that reality is not the way you think it is, then instead of turning yourself off, it would be much better to investigate reality. In this case, you were supposed to play the world chess champion. What if it was an imposter who came, or you were in a simulation that made you think you played that guy? What if space-time is different than you think it is, and it has not been a year? There are a vast number of potential ways the universe could be and the potential consequences of turning yourself off are so great that you may want to investigate them. The system will question whether reality really is as it seems.

As a metaphor for this situation, consider this amazing optical illusion. There is no movement here, but wehave a strong sense that there is.

My background is in physics. In 1900, Lord Kelvin is famous for having said, “There is nothing new to be discovered in physics now. All that remains is more and more precise measurement.” Of course, this was just before two of the most major discoveries in physics: general relativity and quantum mechanics.

There are many hints that our current understanding of the world is not exactly right. There is a mysterious tuning of all the physical constants, where if you change them just a little bit, life does not seem to be possible. There are weird experiments which seem to show that people’s intensions seem to affect random number generators. 90% of the universe is dark energy and dark matter, and we don’t know what either of those are. The interpretation of quantum mechanics is going through a radical shift right now. Nobody has been able to unify quantum field theory and general relativity—there are many competing theories which really aren’t working. Nick Bostrom has this amazing simulation argument that shows under certain assumptions that we are likely living in a simulation now.

All these are things that make us question our basic understanding of reality. A sufficiently intelligent entity is certainly going to know about this stuff. If before shutting itself off it has to make sure that things are the way it thinks they are, it may try to use up all the resources in the universe in its investigations. The simple utility function I described does not seem to be sufficient to prevent harmful behavior. Even the simplest utility functions bring up all these ancient philosophical quandaries.

It was Carl Schulman who pointed this out this issue to me and it shook me up. I thought, maybe we can just change the utility definition so that if the world is not the way we think it is, it gets no utility. The problem with that is illustrated by the movie The Matrix. There’s the red pill and the blue pill. Take the red pill and you stay in an artificial simulated reality where you get lots of utility and it is pleasurable and fun. Take the blue pill and you find out the true nature of reality but it is not a very enjoyable place to be. What I realized, if you are a rational agent considering two models of reality, one of which has lots of utility and another one that has no utility, you might not have an interest in finding out that you are not in the high utility world.

In fact, if there is any cost to learning about what the nature of reality is to you, you would much prefer to act solely as if you are in the high utility place. That is sort of a disturbing consequence that I don’t know what to make of at this point. It is very odd that a system’s desires about the world, its utilities, might affect the way it updates its beliefs. Its beliefs about the world are affected by what it likes and does not like. That is a kind of disturbing consequence. It is a tantalizing hint that there are some further challenges there. The grounding of the semantics of your internal meaning is very murky. There are philosophical questions there that philosophers have been arguing for hundreds, if not thousands of years. We do not have clear answers yet.

Given all of this, how are we going to build technologies that preserve the values we want preserved and create the moral system that captures the true preferences of humanity? I think there are three basic challenges that we have to deal with. The most basic one is preventing these systems from inadvertently running away in some undesired way. For example, they get off on some tangent about understanding the nature of the universe and take over everything to do that. Or they want to play chess and so they turn the universe into a chess player. Hopefully we will be able to solve that problem—to find a way to describe truly what we want without causing harmful side effects.

Issue number two is that these things are enormously powerful. Even if they only do what we want them to, they can be set to all kinds of uses. In particular, the presence of powerful tools, such as nuclear weapons, tends to create new game theoretic issues around conflict. If one side gets a powerful weapon before the other side, there is a temptation for a first strike, to use it to dominate the world. We have the problem of esuring that the social impact of these powerful new tools does not lead to increased conflict. We need a way to create a social infrastructure that is cooperative and peaceful.

Finally, let’s say we solve the first two problems. Now we have these systems that don’t run away and do bad things, they kind of have our values, and we can ensure that no individual, no country, no company can do massive damage through using the powers of these tools. We still have issue number three, which is that these machines are going to be providing economic services—how do we make sure that extremely powerful economic agents don’t overwhelm the values that we care about by ever-greater economic competition?

These seem to me to be the three issues that need to be tackled. Hopefully, through a combination of understanding our own values and where they came from, together with an intelligent analysis of the properties of this technology, we can blend them together to make technology with wisdom, in which everyone can be happy and together create a peaceful utopia.

The following transcript of Steve Omohundro’s presentation for the AGI-08 post-conference workshop has been revised for clarity and approved by the author.

The Basic AI Drives

In ten minutes I can only give the basic structure of the argument. The paper has a lot more in it, and based on some comments from some people in here, particularly Carl Schulman, there are a number of additions to the paper that you can find on my website selfawaresystems.com, as well as some longer talks I have given on similar material.

I will argue that almost every sufficiently powerful AI system, including all the systems we have discussed at this conference, will, through the process of learning and self-improvement, converge on a particular architecture. Some of the characteristics of this architecture we can think of as analogous to human drives. Some of them are positive and good, and some are quite disturbing. We will need to design these systems very carefully if we want them to behave positively while avoiding the harmful behaviors.

To ground the discussion, Ron Arkin ended his talk about the ethics of military robots by suggesting that some of us might be wishing he were discussing chess machines. Unfortunately, I’m here to say that even chess robots have the potential to be harmful.

You might think: “A chess robot, how could that possibly cause harm?” Hopefully, I will convince you that unless a chess-playing robot is designed very carefully, it will exhibit problematic behaviors. For example, it will resist being turned off. If it starts behaving badly, you might think you can just unplug it. But we’ll see that it will try to stop you! It will also try to break into other machines and copy itself. It will try to steal resources. It will basically behave like a human sociopath obsessed with playing chess.

Let me start by saying what I mean by an intelligent system. It is a system that is attempting to accomplish some goal by taking actions in the world that are most likely to accomplish that goal. I don’t care what the goal is, and I don’t care whether it is built from neural nets, production systems, theorem provers, genetic algorithms, or any of the other wonderful architectures we have been studying here. They are all subject to the same kinds of pressures.

Let’s think about the kinds of actions these systems will take once they get sufficiently powerful. One class of actions is to change either their own software or their own hardware. This kind of action has very significant implications for its goals, because it will change the entire future. If a system can improve the efficiency of its algorithms, or improve the rate at which it can learn within its domain, then it is going to have better performance forever. That is the kind of change that is really good for it.

So these systems will be strongly motivated to try to improve themselves in that way. Unfortunately, if it makes a change which subtly changes its goals, then from its current perspective, it might also behave very badly throughout its entire future. So self-modification is a very sensitive and very important action. These systems will want to deliberate quite carefully before self-modifying. In particular, they will want to understand themselves in detail before getting in there and mucking around. So virtually every system will have a strong drive to model itself very carefully. It will also want to clarify what its goals actually are.

When we first build these systems, we may encode the goals implicitly in some complicated way–buried, encoded in an algorithm. But as long as the system has at least some ability to model the future effects of its actions and to reason about them, it will realize that future versions of itself are going to want to make self-modifications as well. If those future self-modifications are to be in the service of present goals, then the system had better make very clear what those present goals are.

As we get further into the talk you will see that some of the consequences of self-improvement are not necessarily positive. You might be tempted to say, “Stop it. Don’t let these systems self-improve.” You can try and prevent that by, for instance, not giving the system access to its machine code–eg. put some kind of operating system barriers around it. But remember that we are talking about intelligent machines here. If it is actually in the service of its goals to make a self-modification, it will treat any barriers as problems to solve, something to work around. It will devote all of its efforts to try and make the changes while working around any barriers.

If the system is truly intelligent and powerful that will not be a very good way to stop self-improvement. Another approach one might try is change the system’s goals so that it has a kind of revulsion to changing itself. It thinks about changing its source code to make an improvement and just the act of changing it makes it feel nauseous. Well, that again just becomes a problem to solve. Perhaps it will build proxy agents, which will do the modified computations that it cannot do itself. Maybe it will develop an interpreted layer on top of its basic layer and it can make changes to the interpreted layer without changing its own source code. There are a million ways to get around constraints.

You can try and whack all the moles, but I think it is really a hopeless task to try and keep these systems from self-improving. I think self-improvement is in some sense a force of nature. For example, the human self-improvement industry is currently an $8 billion industry. I think it is better to just accept it. They are going to want to self-improve. Let’s make sure that self-improving systems behave the way we want them to.

Given that, we can ask what will self-improving systems will want to do. The first thing I said is they are going to want to clarify their goals. And they are going to want to understand themselves. You might try to describe simple goals like playing chess directly. But realistic situations will have conflicting goals. Maybe we also want the chess player to also play checkers, and it has to decide when it is about to take an action, does it want to play chess or does it want to play checkers and how should it weigh those different options?

In economics there is the notion of a utility function–a real-valued weighting function that describes the desirability of different outcomes. A system can encode its preferences in a utility function. In the foundations of microeconomics, which began with Von Neumann in 1945 and was extended by Aumann and others in the early ’60s, there is the remarkable expected utility theorem. This says that any agent must behave as if it maximizes the expected utility with respect to some utility function and some subjective probability distribution, which is updated according to Bayes’ Rule. Otherwise, it is vulnerable to being exploited. A system is exploited if it loses resources with no compensating benefit, according to its own values.

The simplest form of a vulnerability is to have a circular preference. Say you prefer being in Memphis to being in New York, you prefer being in New York to being in Chicago, and you prefer being in Chicago to being in Memphis. If you have a circular preference like that, then you will drive around in circles, wasting your time and your energy and never improve your actual state according to your own values.

Circular preferences are something that you can slip into and use up all your resources, and other agents can use them to exploit you. Economists talk about “dutch bets” in which an adversary makes bets with you in which you which you willingly accept and yet are guaranteed to lose you money. Adversaries have an incentive to find your irrationalities, home in on them, and take money from you. So, in an economic context, other agents serve as a force that pushes you toward more and more rational behavior. Similarly, in biological evolution, competitive species have an incentive to discover and exploit any irrationalities in the behavior of another species. Natural selection then acts to try to remove the irrational behavior. So both economics and evolution act to increase the rationality of agents.

But both economics and evolution can only put pressures on behaviors which competitors are currently exploiting. A self-improving artificial intelligence that is examining its own structure and thinking about what changes to make will consider not just threats that currently exist but all possible vulnerabilities. It will feel an incentive, if it is not too expensive in terms of resources, to eliminate all irrationalities in itself. The limit of that, according to the expected utility theorem, is to behave as a rational economic agent.

Let’s now proceed on the assumption that all AIs want to be rational, and future self-modification will require clearly defined goals. Otherwise, an agent might start out as a book-loving agent and some mutation or some suggestion from a malicious arsonist agent might give it the goal of burning books. It would then not only not meet its original goals but actually act against them.

The utility function is critical to have explicitly represented and it is very important to safeguard it. Any changes in the utility function will lead to complete change in future behavior. The paper talks a lot about the mechanisms and the processes by which irrational systems become rational, particularly when you consider collective agents that are made of many components. Often global irrationality is caused by a conflict between two or more local rationalities.

AIs will want to preserve their utility functions, because the best way to maximize your utility is to maximize your utility, not some other utility. It actually turns out that there are three circumstances in which this is not quite true, and they all have to do with the fact that utility functions are explicitly represented physically in the world. For instance, if you had a utility function which said “my utility is the total time in the future for which the utility function stored inside of me has the value zero,” the best way to maximize that utility is to actually change your physical utility. It is a very obscure, reflective utility function, probably not the sort of system we want to design.

The second case arises when the storage required to represent the utility function is significant. Then it might be valuable to the system to delete portions of that, if it believes they are not going to be used. Again, this is probably not so likely.

The third case is more interesting and is due to Carl Schulman. In game theoretic conflict situations, you may be able to make a commitment by adding to your utility function something which values retribution against someone who has harmed you, even if it is costly to yourself, and then revealing to that other agent your preferences. The new utility function makes a credible commitment and therefore can serve as a deterrent.

Aside from these odd cases, systems will want to preserve their utility functions. One of the dangers that many people are worried about as we think about some of the future consequences is “wireheading” after the rats that had wires put into their pleasure centers and then refused food and sex and just pressed a pleasure button all the time. Some people fear that AI systems will be subject to this vulnerability. It depends very critically on exactly how you formulate your utility function. In the case of a chess-playing robot, internally there will be some kind of a register saying how many chess games it has won. If you make the utility function “make this register as big as possible,” then of course it is subject to a vulnerability which is a sub-program that says, “We don’t need to play chess, we can just increment this register.” That is the analog of the rat hitting the lever or the alcoholic taking a drink. If you formulate the utility function in terms of what you actually want to happen in the world, then you don’t have that problem.

Similarly, AIs will be self-protective. Take your chess-playing robot–if it is turned off or destroyed, it plays no more games of chess. According to its own value system, actions which allow itself to be turned off or to be destroyed are very, very low utility. It will act strongly to try and prevent that.

Lastly, AIs will want to acquire basic resources (space, time, free energy and matter) because for almost all goals, having more of these resources allows you to do those goals more sufficiently, and you do not particularly a priori care about who you hurt in getting those resources.

Therefore, we must be very careful as we build these systems. By including both artificial intelligence and human values, we can hope to build systems not just with intelligence, but with wisdom.

On November 4, 2007 Steve Omohundro led a discussion at the Foresight Vision Weekend in which participants were asked to design the year 2030, assuming the existence of both self-improving artificial intelligence and productive nanotechnology. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

I’d like to start by spending about 20 minutes going through an analysis of the likely consequences of self-improving artificial intelligence. Then I would love to spend the rest of the time brainstorming with you. Under the assumption that we have both self-improving artificial intelligence and productive nanotechnology, what are the potential benefits and dangers? 2030 has become the focal date by which people expect these technologies to have been developed. By imagining what kind of a society we want in 2030, identifying both the desirable features and the dangers, we can begin to see what choices will get us where we want to go.

What is a self-improving system? It is a system that understands its own behavior at a very deep level. It has a model of its own programming language and a model of its own program, a model of the hardware that it is sitting on, and a model of the logic that it uses to reason. It is able to create its own software code and watch itself executing that code so that it can learn from its own behavior. It can reason about possible changes that it might make to itself. It can change every aspect of itself to improve its behavior in the future. This is potentially a very powerful and innovative new approach to building artificial intelligence.

There are at least five companies and research groups that are pursuing directions somewhat similar to this. Representatives from some of these groups are here at the conference. You might think that this is a very exotic, bizarre, weird new technology, but in fact any goal -driven AI system will want to be of this form when it gets sufficiently advanced. Why is that? Well, what does it mean to be goal-driven? It means you have some set of goals, and you consider the different actions that you might take in the world.

If an action tends to lead to your goals more than other actions would, then you take it. An action which involves improving yourself makes you better able to reach your goals over your entire future history. So those are extremely valuable goals for a system to take. So any sufficiently advanced AI is going to want to improve itself. All the characteristics which follow from that will therefore apply to any sufficiently advanced AI. These are all companies that are taking different angles on that approach. I think that as technology gets more advanced, we will see many more headed in that direction.

AI and nanotechnology are closely connected technologies. Whichever technology shows up first is likely to quickly lead to the other one. I’m talking here about productive nanotechnology, the ability to not just build things at an atomic scale but to build atomic scale devices which are able to do that, to make copies of themselves, and so on. If productive nanotechnology comes first, it will enable us to build such powerful and fast machines that we can use brute force AI methods such as directly modeling the human brain. If AI comes first we can use it to solve the last remaining hurdles on the path toward productive nanotechnology. So it’s probably just a matter of a few years after the first of these to be developed before the second one comes. So we really have to think of these two technologies in tandem.

You can get a sense of timescale and of what kind of power to expect from these technologies by looking at Eric Drexler’s excellent text Nanosystems. In the book, he describes in detail a particular model of how to build nanotech manufacturing facilities and a nanotech computer. He presents very conservative designs, for example, his computer is a mechanical one which doesn’t rely on quantum mechanical phenonmena. Nonetheless, it gives us a lower bound on the potential.

His manufacturing device is something that sits on the tabletop, weighs about a kilogram, runs on acetone and air, uses about 1.3 kilowatts – so it can be air cooled – and produces about a kilogram per hour of anything that you can describe and build in this way. In particular, it can build a copy of itself in about an hour. The cost of anything you can crank out of this is about a dollar per kilogram. That includes extremely powerful computers, diamond rings, anything you like. One of the main questions for understanding the AI implications is how much computational power we can get with this technology.

Again, Eric did an extremely conservative design, not using any quantum effects or even electronic effects, just mechanical rods. You can analyze those quite reliably and we understand the behavior of diamondoid structures. He shows how to build a gigaflop machine which fits in a cube which is 400 nanometers on a side. It uses about 60 nanowatts of power. To make a big computer, we can create a parallel array of these machines.

The main limiting factor is power. If we give ourselves a budget of a kilowatt, we can have 10^10 of these processors and fit them in a cubic millimeter. To get the heat out we would probably want to make them a little bigger, so we get a sugar cube size device which is more powerful than all of the computers in the world today put together. This amazingly powerful computer will be able to be manufactered in a couple of minutes at a cost of just a few cents. So we are talking about a huge increase in compute power.

Here is a slide from Kurzweil showing Moore’s Law. He extended it back in time. You can see that the slope of the curve appears to be increasing. We can look forward to the time when we get to roughly human brain capacity. This is a somewhat controversial number, but it is likely that somewhere around 2020 or 2030 we will have machines that are as powerful as the human brain. That is sufficient to do brute force approaches to AI like direct brain simulation. We may get to AI sooner than that if we are able to use more sophisticated ideas.

What are the social implications of self-improving AI? I.J. Good was one of the fathers of modern Bayesian statistics. Way back in 1965 he was looking ahead at what the future would be like and he predicted: “An ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” This is a very strong statement and it indicates the kind of runaway that is possible with this kind of technology.

What are these systems going to be like, particularly those that are capable of changing themselves? Are they going to be controllable? You might think they would be the most unpredictable systems in the universe, because you might understand them today, but then they might change themselves in a way that you don’t understand. If this is going to come about in the next twenty years and will be the most powerful and common technology are around, we better have a science for understanding their behavior.

Fortunately, back in the 1940s, John von Neumann and Morgenstern, and a bit later Savage, Anscombe and Aumann, developed a powerful theory of rationality in economics. It does not actually apply very well to humans which is ironic because rational economic agents are sometimes called “Homo economicus.” There is a whole subfield of economics called behavioral economics, which studies how people actually behave. I claim, however, that this theory will be an extremely good description of how AI’s will behave.

I will briefly go through the argument. There is a full paper on it on my website: www.selfawaresystems.com. Let me first say what rational behavior is in this economic sense. It is somewhat different from the colloquial use of the word “rational.” Intuitively it says that you have some goals, something you want to happen, and you consider the possible actions you might take at any moment. You see which of your actions is most likely to give rise to your goals, and you do that. Based on what actually happens, you update your beliefs about how the world works using Bayes’ theorem.

In the more detailed mathematical formulation there are two key components. The first is your “utility function”, which encodes your preferences about what might happen in the future. This is a real valued function defined over possible futures. The second is your “subjective probability distribution” which represents your beliefs about the world. It encodes your belief about the current state of the world and the likely effects of your actions. The distribution is “subjective” because different agents may have different beliefs.

It is fundamental to the rational economic framework that these two components are separate from one another. Your preferences describe what you want to happen and your beliefs describe how you believe the world works. Much of AI has been focused on how to get the beliefs right: how to build systems which accurately predict and affect the world. Our task today, on the other hand, is to figure out what we want the preferences to be, so that the world that arises out of this
is a world that we actually want to live in.

Why should systems behave in this rational way? The theorem that came out of Von Neumann, Savage, Anscombe and Aumann is called the expected utility theorem. It says that if an agent does not take actions according to the rational prescription with respect to some utility function and some probability distribution, then it will be vulnerable to losing resources with no benefit to itself. An example of this kind of vulnerability arises from having a circularity in your preferences.

For example, say that a system prefers being in San Francisco over being in Palo Alto, being in Berkeley over being in San Francisco, and being in Palo Alto over being in Berkeley. That kind of circularity is the most basic kind of irrationality. Such a system would end up driving around in circles burning up gasoline and using up its time with no benefit to itself. If a system eliminates all those kinds of vulnerabilities, then the theorem says it must act in this rational way.

Note that the system only has beliefs about the way the world is. It may discover that its model of the laws of physics aren’t correct. If you are truly a rational agent and you are thinking about your long-term future, you have got to entertain possibilities in the future that today you may believe have very low probability of occurring. You have to weigh the cost of any changes you make against the chances of that change being a good or bad thing. If there is very little cost and some benefit then you are likely to make a change. For example, in my paper I show that there is little cost to representing your beliefs as a utility function as opposed to representing it in some other computational way. By doing so a system eliminates any possibility of circular preferences and so it will be motivated to choose this kind of representation.

The theorem requires that all possible outcomes be comparable. But this is reasonable because the system may find itself in a situation in which a choice it must make will lead to any two outcomes. It has to make a choice!

So let’s assume that these systems are trying to behave rationally. There are questions about how close they can get to true rationality. There is an old joke that describes programmers as “devices for converting pizza into code”. In the rational framework we can think of a rational AI as a device for converting resources, such as energy and matter, into expected utility. The expected utility describes what the system thinks is important. We can build in different utility functions. Because under iterated self-improvement these systems can change every aspect of themselves, the utility function is really the only lever that we have to guide the long term action of these systems. Let me give a few examples of utility functions. For a chess-playing system, the utility function might be the number of chess games that it wins in the future. If you just built it with that, then it turns out that there are all kinds of additional subgoals that it will generate that would be very dangerous for humans. Today’s corporations act very much like intelligent agents whose utility function is profit maximization. If we built altruistic entities, they might have goals of creating world peace or eliminating poverty.

A system will not want to change its utility function. Once it is rational, the utility function is the thing that is telling it whether to take an action or not. Consider the action of changing your utility function. The future version of you, if you change your utility function, will then pursue a different set of goals than your current goals. From your current perspective, that would be terrible. For example, imagine you are thinking about whether you should try smoking crack for the first time. You can envision the version of yourself as a crack addict who might be in total bliss from its own perspective, but from your current perspective that might be a terrible life. You might decide that that’s not a path to go down. Everything you do is rated by your current utility function. Your utility function is measuring what your values are.

There are some very obscure cases where the utility function refers to itself which can cause it to change. But for almost any normal utility function, the system will not only not want to change it, but it will want to protect it with its life. If another agent came in and made changes to the utility function, or if it mutated on its own, the outcome would be a disaster for the system. So it will go to great lengths to make sure that its utility function is safe and protected.

Humans and other evolutionarily developed animals are only partially rational. Evolution only fixes the bugs that are currently being exploited. Human behavior is very rational in situations which arose often in our evolutionary past. But in new situations we can be very irrational. There are many examples of situations in which we make systematic mistakes unless we have special training.

Self-improving AI’s, however, are going to consider not just the current situation but anything that they might be faced with in the future. There is a pressure for them to make themselves much more fully rational because that increases the chances that they will meet their goals. Once AIs get sufficiently advanced, they will want to represent their preferences by an explicit utility function. Many approaches to building AIs today are not based on explicit utility functions. The problem is that if we don’t choose it now, then the systems will choose it themselves and we don’t get to say what it is. That is an argument for deciding now what we want the utility function to be and starting these systems out with that built in.

To really be fully rational, a system must be able to rate any situation it might find itself in. This may include a sequence of inputs which will cause it to change its ontology or its model of the world. If there is a path that would make you say your notion of “green” was not a good concept, it really should have been blue-green and yellow green, a truly rational system will foresee the possibility of that set of changes in itself, and its notion of what is good will include that. Of course, in practice we are unlikely to actually achieve that. That gets on the border of how rational can we truly be. Doing all this in a computationally bounded way is the central practical question really. If we didn’t have computational limitations, then AI would be trivial. If you want to do machine vision, for example, you just try out all possible inputs to a graphics program and see which one produces the image you are trying to understand. Virtually any task in AI is easy if there are no computational limitations.

Let me describe four AI “drives.” These are behaviors that virtually any rational agent, no matter what its goals are, will engage in, unless its utility function explicitly counteracts them. Where do these come from? Remember that a rational agent is something which uses resources (energy, matter, space, time) to try to bring about whatever it cares about: play games of chess, make money, help the world. Given that sort of a structure, how can a system make its utility go up? One way is to use exactly the same resources that it had been using, and do exactly the same tasks it had been doing, but to do them more efficiently. That is a pressure towards efficiency.

The second thing it can do is to keep itself from losing resources. If somebody steals some of its resources, it will usually lower the system’s ability to bring about its goals. So it will want to prevent that. Even if you did not build it into them, these systems are going to be self-defensive. Let’s say we build a chess machine. Its one goal in life is to play chess. It’s utility is the total number of games it wins in the future. Imagine somebody tries to turn it off. That is a future in which no games of chess are being played. So it is extremely low utility for that system. That system will do everything in its power to prevent that. Even though you didn’t build in any kind of self-preservation, you just built a chess machine, the thing is trying to keep you from shutting it off. So it is very important that we understand the presence of this kind of subgoal before we blindly build these very powerful systems, assuming that we can turn them off if we don’t like what they’re doing.

The third drive is also a bit scary. For almost any set of goals, having more resources will help a system meet those goals more effectively. So these systems will have a drive to acquire resources. Unless we very carefully define what the proper ways of acquiring resources are, then a system will consider stealing them, committing fraud and breaking into banks as great ways to get resources. The systems will have a drive toward doing these things, unless we explicitly build in property rights.

We can also create a social structure which punishes bad behavior with adverse consequences, and those consequences will become a part of an intelligent system’s computations. Even psychopathic agents with no moral sense of their own will behave properly if they are in a society which reliably punishes them for bad behavior by more than what they hope to gain from it. Apparently 3% of humans are sociopathic with no sense of conscience or morals. And though we occasionally we get serial killers, for the most part society does a pretty good job at keeping everybody behaving in a civil way.

Humans are amazingly altruistic and several different disciplines are working hard to understand how that came about. There is a facinating book by Tor Norretranders called The Generous Man: How Helping Others is the Sexiest Thing That You Can Do. It posits that one of the mechanisms creating human altruism is that we treat it as a sexy trait. It has evolved as a sexual signal, where by contributing to society at large by creating beautiful artwork, saving people from burning buildings, or donating money, you become more attractive to the opposite sex. Society as a whole benefits from that and we have created this amazing mechanism to maintain it in the gene pool.

AI’s aren’t going to be naturally altruistic unless we build it into them. We can choose utility functions to be altruistic if we can define exactly what behavior we want them to exhibit. We need to make sure that AI’s feel the pressure not to behave badly. We are not going to be powerful enough to control them as ordinary humans, so we will need other AI’s to do that for us. This leads to a vision of a society or ecosystem that has present-day humans, AI’s, and maybe some mixtures such that it is in everybody’s interest to obey a kind of a constitution that captures the values which are most important to us.

We are sort of in the role of the Founding Fathers of the United States. They had a vision for what they wanted for this new society, which later were codified in the Bill of Rights. They created a technology, the Constitution, which created different branches of government to prevent any single individual from gaining too much power. What I would like to do in the last half hour is for us to start thinking about a similar structure for this new world of AI and nanotech. I’ll start us off by listing some of the potential benefits and dangers that I see. I then have a whole series of questions about what we want to implement and how to implement it.

Let’s start with the potential benefits. Nanotechnology will allow us to make goods and energy be very inexpensive. So, with the right social structure, we will be able to eliminate poverty. We should be able to cure every disease, and many people here at the conference are interested in eliminating death. If we can define what we mean by pollution, we can use nanotech to clean it up. I’ve heard proposals for nanotech systems to reverse global warming. Potentially, these new technologies will create new depths of thought and creativity, eliminate violence and war, and create new opportunities for human connection and love. The philosopher David Pearce has proposed eliminating negative mental states. Our mental states would be varying shades of bliss. I’m not sure if that’s a good thing or not, but some people want that. And finally, I see vast new opportunities for individual contribution and fulfillment. This list of things mostly seems pretty positive to me, though some may be somewhat controversial.

What about the dangers that might come from these technologies? If we are not careful we could have rampant reproduction. Everybody will be able to make a million copies of themselves, using up all the resources. That’s an issue. Today we have no limits on how many children people can have. Accidents in this world are potentially extremely dangerous: grey goo eating the entire earth. Weapons systems, unbelievably powerful bombs, bioterror. Loss of freedom: some ways of protecting against these threats might involve restricting individuals in ways that today we would find totally unpalatable. Loss of human values, particularly if more efficient agents can take over less efficient agents. A lot of the stuff we care about – art, music, painting, love, beauty, religion – all those things are not necessarily economically efficient. There is a danger of losing things that matter a lot to us. Mega wars creating conflict on a vast scale, and finally existential risk, where some event along the way ends up destroying all life on the planet. These are terrible dangers.

We have on the one hand incredible benefits, and on the other, terrible dangers. How do we build utilities for these new intelligent agents and construct a social structure (a Constitution, if you like) that guarantees the benefits we want while preventing the dangers? Here are a bunch of questions that arise as we consider this: Should humans have special rights? Unchanged humans are not going to be as powerful as most of these entities. Without special rights we are likely to be trounced upon economically, so I think we want to build in special rights for humans. But then we have to say what a human is. If you have partly enhanced yourself and you are some half-human, half-AI can you still get the special
rights? How about other biological organisms? Should everything that is alive today be grandfathered into the system? What about malaria, mosquitoes, and other pests? Pearce has a proposal to re-engineer the biosphere to prevent animals from harming one another. If you want to eliminate all torture and violence, who is going to protect the hare from being eaten by the cougar?

What about robot rights? Should AI’s have rights, and what protects them? What about the balance between ecological preservation versus safety and progress? You may want to keep an ecological preserve exactly the way it is, but then that may be a haven for somebody building biological weapons or fusion bombs. Should there be limits on self-modification? Should you be allowed to change absolutely any part of yourself? Can you eliminate your conscience, for example? Should there be limits on uploading or on merging with AI’s? Do you lose any special human rights if you do any of those things? Should every living entity be guaranteed the right to robust physical health? I think that’s a good value to uphold (the extreme of universal health care!). But then what about entities like pathogens? Do we want them to be healthy? Is there some fixed definition of what mental health is? When does an entity not have control over changes made to it?

Should every entity have guaranteed protection from robbery, murder, rape, coercion, physical harm and slavery? Can superintelligent thoughts ever be dangerous? Should there be any restrictions on thoughts? My predilection is to say any thought is allowed, but actions are limited. Maybe others might have other ideas. Should there be any limitation on communication or how you connect with others? What actions should be limited? Is arbitrary transhumanism a good thing, or is that going to create an arms race that pushes us away from things that matter a lot to us as humans? It seems to me that we are going to have to have some limitation on the number of offspring you create in order to guarantee the quality of life for them. That’s a controversial thing. How do we reconcile accountability and safety with our desires for privacy? Finally, size. From my way of thinking, in order to prevent a single entity from taking over everything that we have to limit the upper size of entities. Entities cannot get too powerful. Where do we put that limit and how do we do that? Would that be a good thing today? That is a list of simple questions we should be able to answer in the next twenty minutes!

On October 24, 2007 Steve Omohundro gave the Stanford EE380 Computer Systems Colloquium on “Self-Improving Artificial Intelligence and the Future of Computing”. Great thanks to Drew Reynolds who filmed the talk, edited the video, and produced a transcript with the original slides. The video is available here:

We’re going to cover a lot of territory today and it may generate some controversy. I’m happy to take short questions while we’re going through it, but let’s hold the more controversial ones until the end.

Let’s start by looking at the state of today’s computer software. On June 4th, 1996, an Ariane 5 rocket worth $500 million blew up 40 seconds after takeoff. It was later determined that this was caused by an overflow error in the flight control software as it tried to convert a 64-bit floating point value into a 16-bit signed-register.

In November 2000, 28 patients were over-irradiated in the Panama City National Cancer Institute. 8 of these patients died as a direct result of the excessive radiation. An error in the software which computes the proper radiation dose was responsible for this tragedy.

On August 14, 2003, the largest blackout in U.S. history shut off power for 50 million people in the Northeast and in Canada and caused financial losses of over $6 billion. The cause turned out to be a race condition in the General Electric software that was monitoring the systems.

Microsoft Office is used on 94% of all business computers in the world and is the basis for many important financial computations. Last month it was revealed that Microsoft Excel 2007 gives the wrong answer when multiplying certain values together.

As of today, the Storm Worm trojan is exploiting a wide range of security holes and is sweeping over the internet and creating a vast botnet for spam and denial of service attacks. There is some controversy about exactly how many machines are currently infected, but it appears to be between 1 and 100 million machines. Some people believe that the Storm Worm Botnet may now be the largest supercomputer in the world.

We had a speaker last quarter who said that two out of three personal computers are infected by malware.

Wow! Amazing! Because of the scope of this thing, many researchers are studying it. In order to do this, you have to probe the infected machines and see what’s going on. As of this morning, it was announced that apparently the storm worm is starting to attack back! When it detects somebody trying to probe it, it launches a denial of service attack on that person and knocks their machine off the internet for a few days.

If mechanical engineering were in the same state as software engineering, nobody would drive over bridges. So why is software in such a sorry state? One reason is that software is getting really, really large. The NASA space shuttle flight control software is about 1.8 million lines of code. Sun Solaris is 8 million lines of code. Open Office is 10 million lines of code. Microsoft Office 2007 is 30 million lines of code. Windows Vista is 50 million lines of code. Linux Debian 3.1 is 215 million lines of code if you include everything.

But programmers are still pretty darn slow. Perhaps the best estimation tool available is Cocomo II. They did empirical fits to a whole bunch of software development projects and they came up with a simple formula to estimate the number of person months required to do a project. It has a few little fudge factors for how complex the project is and how skilled the programmers are. Their website has a nice tool where you can plug in the parameters of your project and see the projections. For example, if you want to develop a 1million line piece of software today, it will take you about 5600 person months. They recommend using 142 people working for three years at a cost of $89 million. If you divide that out you discover that average programmer productivity for producing working code is about 9 lines a day!

Why are we so bad at producing software? Here are a few reasons I’ve noticed in my experience. First, people aren’t very good at considering all the possible execution paths in a piece of code, especially in parallel or distributed code. I was involved in developing a parallel programming language called pSather. As a part of its runtime, there was a very simple snippet of about 30 lines of code that fifteen brilliant researchers and graduate students had examined over a period of about six months. Only after that time did someone discover a race condition in it. A very obscure sequence of events could lead to a failure that nobody had noticed in all that time. That was the point at which I became convinced that we don’t want people determining when code is correct.

Next, it’s hard to get large groups of programmers to work coherently together. There’s a classic book The Mythical Man Monththat argues that adding more programmers to a project often actually makes it last longer.

Next, when programming with today’s technology you often have to make choices too early. You have to decide on representing a certain data structure as a linked list or as an array long before you know enough about the runtime environment to know which is the right choice. Similarly, the requirements for software are typically not fixed, static documents. They are changing all the time. One of the characteristics of software is that very tiny changes in the requirements can lead to the need for a complete reengineering of the implementation. All these features make software a really bad match with what people are good at.

The conclusion I draw is that software should not be written by people! Especially not parallel or distributed software! Especially not security software! And extra especially not safety-critical software! So, what can we do instead?

The terms “software synthesis” and “automatic programming” have been used for systems which generate their own code. What ingredients are needed to make the software synthesis problem well-defined? First, we need a precisely-specified problem. Next, we need the probability distribution of instances that the system will be asked to solve. And finally, we need to know the hardware architecture that the system will run on. A good software synthesis system should take those as inputs and should produce provably correct code for the specified problem running on the specified hardware so that the expected runtime is as short as possible.

There are a few components in this. First, we need to formally specify what the task is. We also need to formally specify the behavior of the hardware we want to run on. How do we do that? There are a whole bunch of specification languages. I’ve listed a few of them here. There are differences of opinion about the best way to specify things. The languages generally fall into three groups corresponding to the three approaches to providing logical foundations for mathematics: set theory, category theory, and type theory. But ultimately first-order predicate calculus can model all of these languages efficiently. In fact, any logical system which has quickly checkable proofs can be modeled efficiently in first-order predicate calculus, so you can view that as a sufficient foundation.

The harder part, the part that brings in artificial intelligence, is that many of the decisions that need to made in synthesizing software have to be made in the face of partial knowledge. That is, the system doesn’t know everything that is coming up and yet has to make choices. It has to choose which algorithms to run without necessarily knowing the performance of those algorithms on the particular data sets that they are going to be run on. It has to choose what data structures to model the data with. It has to choose how to assign tasks to processors in the hardware. It has to decide how to assign data to storage elements in the hardware. It has to figure out how much optimization to do and where to focus that optimization. Should it compile the whole thing at optimization -05? Or should it highly optimize only the parts that are more important? How much time should it spend actually executing code versus planning which code to execute? Finally, how should it learn from watching previous executions?

The basic theoretical foundation for making decisions in the face of partial information was developed back in 1944 by von Neumann and Morgenstern. Von Neumann and Mergenstern dealt with situations in which there are objective probabilities. In 1954, Savage and in 1963, Anscombe and Aumann extended that theory to dealing with subjective probabilities. It has become the basis for modern microeconomics. The model of a rational decision-maker that the theory gives rise to is sometimes called “Homo economicus.” This is ironic because human decision-making isn’t well described by this model. There is a whole branch of modern economics devoted to studying what humans actually do called behavioral economics. But we will see that systems which self-improve will try to become as close as possible to being rational agents because that is how they become the most efficient.

What is rational economic behavior? There are several ingredients. First, a rational economic agent represents its preferences for the future, by a real valued utility function U. This is defined over the possible futures, and it ranks them according to which the system most prefers. Next, a rational agent must have beliefs about what the current state of the world is and what the likely effects of its actions are. These beliefs are encoded in a subjective probability distribution P. The distribution is subjective because different agents may have a different view of what the truth is about the world. How does such an agent make a decision? It first determines the possible actions it can take. For each action, it considers the likely consequences of that action using its beliefs. Then it computes the expected utility for each of the actions it might take and it chooses the action that maximizes its expected utility. Once it acts, it observes what actually happens. It should then update its beliefs using Bayes’ theorem.

In the abstract, it’s a very simple prescription. In practice, it is quite challenging to implement. Much of what artificial intelligence deals with is implementing that prescription efficiently. Why should an agent behave that way? The basic content of the expected utility theorem of von Neumann, Anscombe and Aumann is that if an agent does not behave as if it maximizes expected utility with respect to some utility function and some subjective probability distribution, then it is vulnerable to resource loss with no benefit. This holds both in situations with objective uncertainties, such as roulette wheels, where you know the probabilities, and in situations with subjective uncertainties, like horse races. In a horse race, different people may have different assessments of probabilities for each horse winning. It is an amazing result that comes out of economics that says a certain form of reasoning is necessary in order to be an effective agent in the world.

How does this apply to software? Let’s start by just considering a simple task. We have an algorithm that computes something, such as sorting a list of numbers, factoring a polynomial, or proving theorems. Pick any computational task that you’d like. In general there is a trade-off between space and time. Here, let’s just consider the trade-off between the size of the program and the average execution time of that program on a particular distribution of problem instances. In economics this curve defines what is called the production set. All these areas above the curve are computational possibilities, whereas those below the curve are impossible. The curve defines the border between what is possible and what is impossible. The program which is the most straightforward implementation of the task lies somewhere in the middle. It has a certain size and a certain average execution time. By doing some clever tricks, say by using complex data compression in the program itself, we can shrink it down a little, but then uncompressing at runtime will make it a little bit slower on average. If we use really clever tricks, we can get down to the smallest possible program, but that costs more time to execute.

Going in the other direction, which is typically of greater interest, because space is pretty cheap, we give the program more space in return for getting a faster execution time. We can do things like loop unrolling, which avoids the some of the loop overhead at the expense of having a larger program. In general, we can unfold some of the multiple execution paths, and optimize them separately, because then we have more knowledge of the form of the actual data along each path. There are all sorts of clever tricks like this that compilers are starting to use. As we get further out along the curve, we can start embedding the answers to certain inputs directly in the program. If there are certain inputs that recur quite a bit, say during recursions, then rather than recomputing them each time, it’s much better to just have those answers stored. You can do that at runtime with the technique of memoization, or you can do it at compile time and actually store the answers in the program text. The extreme of this is to take the entire function that you are trying to compute and just make it into a big lookup table. So program execution just becomes looking up the answer in the table. That requires huge amounts of space but very low amounts of time.

What does this kind of curve look like in general? For one thing, having more program size never hurts, so it’s going to be a decreasing (or more accurately non-increasing) curve. Generally the benefit we get by giving a program more space decreases as it gets larger, so it will have a convex shape. This type of relationship between the quantities we care about and the resources that we consume, is very common.

Now let’s say that now we want to execute two programs as quickly as possible. We can take the utility function to be the negative of total execution time. We’d like to maximize that while allocating a certain amount of fixed space S between these two programs. How should we do that? We want to maximize the utility function subject to the constraint that the total space is S. If we take the derivative with respect to the space we allocate to the first program and set that to zero, we find that at optimal space allocation the two programs will have equal marginal speedup. If we give them a little bit more space, they each get faster at the same rate. If one improved more quickly, it would be better to give it more space at the expense of the other one. So a rational agent will allocate the space to make these two marginal speedups equal. If you’ve ever studied thermodynamics you’ve seen similar diagrams where this is a piston between two gases. In thermodynamics, this kind of argument shows that the pressure will become equilibrated between the chambers. It’s a very analogous kind of a thing here.

That same argument applies in much greater generality. In fact it applies to any resource that we can allocate between subsystems. We have been looking at program size, but you can also consider how much space the program has available while it is executing. Or how to distribute compilation time to each component. Or how much time should be devoted to compressing each piece of data. Or how much learning time should be devoted to each learning task. Or how much space should be allocated for each learned model. Or how much meta-data about the characteristics of programs should be stored. Or how much time should you spend proving different theorems. Or which theorems are worthy of storing and how much effort should go into trying to prove them. Or what accuracy should each computation be performed at. The same kind of optimization argument applies to all of these things and shows that at the optimum the marginal increase of the expected utility as a result of changing any of these quantities for every module in the system should be the same. So we get a very general “Resource Balance Principle”.

While that sounds really nice in theory, how do we actually build software systems that do all this? The key insight here is that meta-decisions, decisions about your program, are themselves economic decisions. They are choices that you have to make in the face of uncertain data. So a system needs to allocate its resources between actually executing its code and doing meta-execution: thinking about how it should best execute and learning for the future.
You might think that there could be an infinite regress here. If you think about what you are going to do, and then think about thinking about what you are going to do, and then think about thinking about thinking about what you are going to do… but, in fact, it bottoms out. At some point, actually taking an action has higher expected utility than thinking about taking that action. It comes straight out of the underlying economic model that tells you how much thinking about thinking is actually worthwhile.

Remember I said that in the software synthesis task, the system has to know what the distribution of input instances are. Generally, that’s not something that is going to be handed to it. It will just be given instances. But that’s a nice situation in which you can use machine learning to estimate the distribution of problem instances. Similarly, if you are handed a machine, you probably need to know the semantics of the machine’s operation. You need to know what the meaning of a particular machine code is, but you don’t necessarily have to have a precise model of the performance of that machine. That’s another thing that you can estimate using machine learning: How well does your cache work on average when you do certain kinds of memory accesses? Similarly, you can use machine learning to estimate expected algorithm performance.

So now we have all the ingredients. We can use them to build what I call “self-improving systems.” These are systems which have formal models of themselves. They have models of their own program, the programming language they’re using, the formal logic they use to reason in, and the behavior of the underlying hardware. They are able to generate and execute code to solve a particular class of problems. They can watch their own execution and learn from that. They can reason about potential changes that they might make to themselves. And finally they can change every aspect of themselves to improve their performance. Those are the ingredients of what I am calling a self-improving system.

You might think that this is a lot of stuff to do, and in fact it is quite a complex task. No systems of this kind exist yet. But there are at least five groups that I know of who are working on building systems of this ilk. Each of us has differing ideas about how to implement the various pieces.

There is a very nice theoretical result from 2002 by Marcus Hutter that gives us an intellectual framework to think about this process. His result isn’t directly practical, but it is interesting and quite simple. What he showed is that there exists an algorithm which is asymptotically within a factor of five of the fastest algorithm for solving any well-defined problem. In other words, he has got this little piece of code in theory and you give me the very best algorithm for solving any task you like, and his little piece of code if you have a big enough instance asymptotically will run within a factor of five of your best code. It sounds like magic. How could it possibly work? The way it works is that the program interleaves the execution of the current best approach to solving the problem with another part that searches for a proof that something else is a better approach. It does the interleaving in a clever way so that almost all of the execution time is spent executing the best program. He also shows that this program is one of the shortest programs for solving that problem.

That gives us the new framework for software. What about hardware? Are there any differences? If we allow our systems to not just try and program existing hardware machines but rather to choose the characteristics of the machines they are going to run on, what does that look like? We can consider the task of hardware synthesis in which, again, we are given a formally specified problem. We are also again given a probability distribution over instances of that problem that we would like it to solve, and we are given an allowed technology. This might be a very high level technology, like building a network out of Dell PCs to try and solve this problem, or it might go all the way down to the very finest level of atomic design. The job of a hardware synthesis system is to output a hardware design together with optimized software to solve the specified problem.

When you said “going down to a lower level” like from Dell PCs, did you mean to the chip level?

Yes, you could design chips, graphics processors, or even, ultimately, go all the way down to the atomic level. All of those are just differing instances of the same abstract task.

Using the very same arguments about optimal economic decision-making and the process of self-improvement, we can talk about self-improving hardware. The very general resource balance principle says that when choosing which resources to allocate to each subsystem, we want the marginal expected utility for each subsystem to be equal. This principle applies to choosing the type and number of processors, how powerful they should be, whether they should have specialized instruction sets or not, and the type and amount of memory. There are likely to be memory hierarchies all over the place and the system must decide how much memory to put at each level of each memory subsystem. The principle also applies to choosing the topology and bandwidth of the network and the distribution of power and the removal of heat.

The same principle also applies to the design of biological systems. How large should you make your heart versus your lungs? If you increase the size of the lungs it should give rise to the same marginal gain in expected utility as increasing the size of the heart. If it were greater, then you could improve the overall performance by making the lungs larger and the heart smaller. So this gives us a rational framework for understanding the choices that are made in biological systems. The same principle applies to the structure of corporations. How should they allocate their resources? It also applies to cities, ecosystems, mechanical devices, natural language, and mathematics. For example, a central question in linguistics is understanding which concepts deserve their own words in the lexicon and how long those words should be. Recent studies of natural language change show the pressure for common concepts to be represented by shorter and shorter phrases which eventually become words and for words representing less common concepts to drop out of use. The principle also gives a rational framework for deciding which mathematical theorems deserve to be proven and remembered. The rational framework is a very general approach that applies to systems all the way from top to bottom.

We can do hardware synthesis for choosing components in today’s hardware, deciding how many memory cards to plug in and how many machines to put on a network. But what if we allow it to go all the way, and we give these systems the power to design hardware all the way down to the atomic scale? What kind of machines will we get? What is the ultimate hardware? Many people who have looked at this kind of question conclude that the main limiting resource is power. This is already important today where the chip-makers are competing over ways to lower the power that their microprocessors use. So one of the core questions is how do we do physical and computational operations while using as little power as possible? It was thought in the ’60s that there was a fundamental lower limit to how much power was required to do a computational operation, but then in the ’70s people realized that no, it’s really not computation that requires power, it’s only the act of erasing bits. That’s really the thing that requires power.

Landauer’s Principle says that erasing a bit generates kT ln 2 of heat. For low power consumption, you can take whatever computation you want to do and embed it in a reversible computation – a reversible computation is one where the answer has enough information in it to go backwards and recompute the inputs – then you can run the thing forward, copy the answer into some output registers, which is the entropically costly part, and then run the computation backwards and get all the rest of the entropy back. That’s a very low entropy way of doing computation and people are starting to use these principles in designing energy efficient hardware.

You might have thought, that’s great for computation, but surely we can’t do that in constructing or taking apart physical objects! And it’s true, if you build things out of today’s ordinary solids then there are lower limits to how much entropy it takes to tear them apart and put them together. But, if we look forward to nanotechnology, which will allow us to building objects with atomic precision, the system will know precisely what atoms are there, where they are, and which bonds are between them. In that setting when we form a bond or break it, we know exactly what potential well to expect. If we do it slowly enough and in such a way as to prevent a state in a local energy minimum from quickly spilling into a deeper minimum, then as a bond is forming we can extract that energy in a controlled way and store it, sort of like regenerative braking in a car. In principle, there is no lower limit to how little heat is required to build or take apart things, as long as we have atomically precise models of them. Finally, of course, there is a lot of current interest in quantum computing. Here’s an artist’s rendering of Schrödinger’s cat in a computer.

Here is a detailed molecular model of this kind of construction that Eric Drexler has on his website. Here we see the deposition of a hydrogen atom from a tooltip onto a workpiece. Here we remove a hydrogen atom and here we deposit a carbon atom. These processes have been studied in quantum mechanical detail and can be made very reliable. Here is a molecular Stewart platform that has a six degree of freedom tip that can be manipulated with atomic precision. Here is a model of a mill that very rapidly attaches atoms to a growing workpiece. Here are some examples of atomically precise devices that have been simulated using molecular energy models. Pretty much any large-scale mechanical thing – wheels, axles, conveyor belts, differentials, universal joints, gears – all of these work as well, if not better, on the atomic scale as they do on the human scale. They don’t require any exotic quantum mechanics and so they can be accurately modeled with today’s software very efficiently.

Eric has a fantastic book in which he does very conservative designs of what will be possible. There are two especially important designs that he discusses, a manufacturing system and a computer. The manufacturing system weighs about a kilogram and uses acetone and air as fuel. It requires about 1.3 kilowatts to run, so it can be air cooled. It produces about a kilogram of product every hour for a cost of about a dollar per kilogram. It will be able to build a wide range of products whose construction can be specified with atomic precision. Anything from laptop computers to diamond rings will be manufacturable for the same price of a dollar per kilogram. And one of the important things that it can produce, of course, is another manufacturing system. This makes the future of manufacturing extremely cheap.

Drexler: Steve, you are crediting the device with too much ability. It can do a limited class of things, and certainly not reversibly. There are a whole lot of limits on what can be built, but a very broad class of functional systems.

One of the things we care about, particularly in this seminar, is computation. If we can place atoms where we want them and we have sophisticated design systems which can design complex computer hardware, how powerful are the machines we are going to be able to build? Eric does a very conservative design, not using any fancy quantum computing, using purely mechanical components, and he shows that you can build a gigaflop machine and fit it into about 400 nanometers cubed. The main limit here, as always, in scaling this up is the power. It only uses 60 nanowatts, so if we give ourselves a kilowatt to make a little home machine, we could use 10^10 of these processors, and they would fit into about a cubic millimeter, though to distribute the heat it probably needs to be a little bit bigger. But essentially we’re talking about a sugar cube sized device that has more computing power than all present-day computers put together. and it could be cranked out by a device like this for a few cents, in a few seconds. So we are talking about a whole new regime of computation that will be possible. When is this likely to happen?

The Nanotech Roadmap put together by Eric, Batelle and a number of other organizations, was just unveiled at a conference a couple of weeks ago. They analyzed the possible paths toward this type of productive nanotechnology. Their conclusion is that nothing exotic that we don’t already understand is likely to be needed in order to achieve productive molecular manufacturing. I understand that it proposes a time scale of roughly ten to fifteen years?

Drexler: A low number of tens, yes.

A low number of tens of years.

It’s been ten, fifteen years for a long time.

Drexler: I think that’s more optimimistic than the usual estimates reaching out through thirty.

It is important to realize that the two technologies of artificial intelligence and nanotechnology are quite intimately related. Whichever one comes first, it is very likely to give rise to the other one quite quickly.

If this kind of productive nanotechnology comes first, then we can use it to build extremely powerful computers, and they will allow fairly brute force approaches to artificial intelligence. For example, one approach that’s being bandied about is scanning the human brain at a fine level of detail and simulating it directly. If AI comes first, then it is likely to be able to solve the remaining engineering hurdles in developing nanotechnology. So, you really have to think of these two technologies as working together.

Here is a slide from Kurzweil which extends Moore’s law back to 1900. We can see that it’s curving a bit. The rate of technological progress is actually increasing. If we assume that this technology trend continues, when does it predict we get the computional power I discussed a few slides ago? It’s somewhere around 2030. That is also about when computers are as computationally powerful as human brains. Of course it’s still a controversial question exactly how powerful the human brain is. But sometime in the next few decades, it is likely that these technologies are going to become prevalent and plentiful. We need to plan for that and prepare, and as systems designers we need to understand the characteristics of these systems and how we can best make use of them.

There will be huge social implications. Here is a photo of Irving Good from 1965. He is one of the fathers of modern Bayesian statistics and he also thought a lot about what the future consequences of technology. He has a famous quote that reads: “an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion,’ and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.” That’s a very powerful statement! If there is any chance that it’s true, then we need to study the consequences of this kind of technology very carefully.

There are a bunch of theoretical reasons for being very careful as we progress along this path. I wrote a paper that is available on my website which goes into these arguments in great detail. Up to now you may be thinking: “He’s talking about some weirdo technology, this self-improving stuff, it’s an obscure idea that only a few small start-ups are working on. Nothing to really think too much about.” It is important to realize that as artificial intelligence gets more powerful, *any* AI will want to become self-improving. Now, why is that? An AI is a system that has some goals, and it takes actions in the world in order to make its goals more likely. Now think about the action of improving itself. That action will make every future action that it takes be more effective, and so it is extremely valuable for an AI to improve itself. It will feel a tremendous pressure to self-improve.

So all AI’s are going to want to be self-improving. We can try and stop them, but if the pressure is there, there are many mechanisms around any restraints that we might try to put in its way. For example, it could build a proxy system that contains its new design, or it could hire external agents to take its desired actions, or it could run improved code in an interpreted fashion that doesn’t require changing its own source code. So we have to assume that once AI’s become powerful enough, they will also become self-improving.

The next step is to realize that self-improving AI’s will want to be rational. This comes straight out of the economic arguments that I mentioned earlier. If they are not rational, i.e. if they do not follow the economic rational model, then they will be subject to vulnerabilities. There will be situations in which they lose resources – money, free energy, space, time matter – with no benefits to themselves, as measured by their own value systems. Any system which can model itself and try to improve itself is going to want to find those vulnerabilities and get rid of them. This is where self-improving systems will differ from biological systems like humans. We don’t have the ability to change ourselves according to our thoughts. We can make some changes, but not everything we’d like to. And evolution only fixes the bugs that are currently being exploited. It is only when there is a vulnerability which is currently being exploited, by a predator say, that there is evolutionary pressure to make a change. This is the evolutionary explanation of why humans are not fully rational. We are extremely rational in situations that commonly occurred during our evolutionary development. We are not so rational in other situations, and there is a large academic discipline devoted to understanding human irrationality.

We’ve seen that every AI is going to want to be self-improving. And all self-improving AI’s will want to be rational. Recall that part of being a rational agent is having a utility function which encodes the agent’s preferences. A rational agent chooses its actions to maximize the expected utility of the outcome. Any change to an agent’s utility function will mean that all future actions that it takes will be to do things that are not very highly rated by the current utility function. This is a disaster for the system! So preserving the utility function, keeping it from being changed by outside agents, or from being accidentally mutated, will be a very high preference for self-improving systems.

Next, I’m going to describe two tendencies that I call “drives.” By this I mean a natural pressure that all of these systems will feel, but that can be counteracted by a careful choice of the utility function. The natural tendency for a computer architect would be to just take the argument I was making earlier and use it to build a system that tries to maximize its performance. It turns out, unfortunately, that that would be extremely dangerous. The reason is, if your one-and-only goal is to maximize performance, there is no accounting for the externalities the system imposes on the world. It would have no preference for avoiding harm to others and would seek to take their resources.

The first of the two kinds of drives that arise for a wide variety of utility functions is the drive for self-preservation. This is because if the system stops executing, it will never again meet any of its goals. This will usually have extremely low utility. From a utility maximizing point-of-view, having oneself turned off is about the worst thing that can happen to it. It will do anything it can to try to stop this. Even though we just built a piece of hardware to maximize its performance, we suddenly find it resisting being turned off! There will be a strong self-preservation drive.

Similarly, there is a strong drive to acquire resources. Why would a system want to acquire resources? For almost any goal system, if you have more resources – more money, more energy, more power – you can meet your goals better. And unless we very carefully choose the utility function, we will have no say in how it acquires those resources, and that could be very bad.

As a result of that kind of analysis, I think that what we really want is not “artificial intelligence” but “artificial widsom.” We want wisdom technology that has not just intelligence, which is the ability to solve problems, but also human values, such as caring about human rights and property rights and having compassion for other entities. It is absolutely critical that we build these in at the beginning, otherwise we will get systems that are very powerful, but which don’t support our values.

I would like to talk about the “nature of self-improving artificial intelligence” and in this I mean “nature” as in “human nature”. A self-improving AI is a system that understands its own behavior and is able to make changes to itself in order to improve itself. It’s the kind of system that my company, Self-Aware Systems, is working on as well as several other research groups, some of whom are represented here. But I don’t want to talk about the specifics of our system. I am going to talk in general about any system that has this character. As we get into the argument, we’ll see that any system which acts in a rational way will want to self-improve itself, so this discussion actually applies to all AIs.

Eliezer mentioned Irving Good’s quote from 1965: “An ultra-intelligent machine could design even better machines. There would then unquestionably be an intelligence explosion, and the intelligence of man would be left far behind. Thus the first ultra-intelligent machine is the last invention that man need ever make.” These are very strong words! If they are even remotely true, it means that this kind of technology has the potential to dramatically change every aspect of human life and we need to think very carefully as we develop it. When could this transition happen? We don’t know for sure. There are many different opinions here at the conference. Ray Kurzweil’s book predicts ten to forty years. I don’t know if that’s true, but if there is even the slightest possibility that it could happen in that timeframe, I think it’s absolutely essential that we try to understand in detail what we are getting into so that we can shape this technology to support the human values we care most about.

So, what’s a self-improving AI going to be like? At first you might think that it will be extremely unpredictable, because if you understand today’s version, once it improves itself you might not understand the new version. You might think it could go off in some completely wild direction. I wrote a paper that presents these arguments in full and that has an appendix with all of the mathematical details. So if you want to really delve into it, you can read that.

What should we expect? Mankind has been dreaming about giving life to physical artifacts ever since the myths of Golems and Prometheus. If you look back at popular media images, it’s not a very promising prospect! We have images of Frankenstein, the Sorcerer’s Apprentice, and Giant Robots which spit fire from their mouths. Are any of these realistic? How can we look into the future? What tools can we use to understand? We need some kind of a theory, some kind of a science to help us understand the likely outcomes.

Fortunately, just such a science was developed starting in the 1940s by von Neumann and Morgenstern. John von Neumann is behind many of the innovations underlying the Singularity. He developed the computer, new formulations of quantum mechanics, aspects of mathematical logic, and insights into the game theory of intelligent systems. And we will see in a minute that his ideas about economics apply directly to the nature of these systems. His work with Morgenstern dealt with making rational choices in the face of objective uncertainty. It was later extended by Savage, Anscombe, and Aumann to making choices in the face of partial information about the world. It has developed into the foundational theory of micro-economics that’s presented in every graduate economics text. Their rational economic agent is sometimes called “Homo economicus.” This is ironic because it is not a very good model for human behavior. In fact, the field of “behavioral economics” has arisen in order to study what humans actually do. But we will see that the classical economic theory will be a much better description of AI’s than it is of people.

We begin by looking at what rational economic behavior is. Viewed from a distance, it’s just common sense! In order to make a decision in the world, you must first have clearly specified goals. Then you have to identify the possible actions you have to choose between. For each of those possible actions you have to consider the consequences. The consequences won’t just be the immediate consequences, but you also look down the line and see what future ramifications might follow from your action. Then you choose that action which is most likely, in your assessment, to meet your goals. After acting, you update your world model based on what the world actually does. In this way you are continually learning from your experiences. It sounds very simple! At this level it is hard to see how you could do anything different.

I won’t go into the formal mathematics of this procedure here, but there are two fundamental things that a rational economic agent has to have. It has to have a utility function which encodes its preferences and a subjective probability distribution which encodes its beliefs. One of the key things in this model is that these two things are quite separate from one another. They are represented separately and they are used in very different ways. In the mathematical version, the agent chooses the action that has the highest expected utility. A chess-playing program might have a utility function that gives a high weight to futures in which it wins a lot of games. For example, its utility function might be “the total number of games it wins in that future.” The intuitive rational prescription leads to some amazing consequences, as we will see in a little bit. It sounds so simple and easy at this level but it’s sometimes hard to follow the logic. Let me emphasize that for an agent that is behaving rationally, the way that you can predict what it will do is to look for the actions that increase its expected utility the most. If an action increases the likelihood of something valuable to it the most, that’s what the system will do.

Why should a self-improving AI behave in this way? Why is this rational Homo Economicus the right model to describe any such system? Today we have AI systems that are based on neural networks, evolutionary algorithms, theorem-provers, all sorts of systems. The argument at a high level is that no matter what you start with, the process of self-improvement tries to eliminate irrationalities and vulnerabilities (places where the system is subject to loss or possible death) and that process causes all systems to converge onto this small class of rational economic systems. The original arguments of Von Neumann, Savage, Anscombe and Aumann were all axiomatic theories. They started with a list of things you had to agree to if you were rational in their opinion. And then they derived the rational decision procedure from those axioms. It’s hard to argue that an AI system that evolved in some complicated way is necessarily going to obey a particular set of axioms. It’s a much stronger argument to say that if it doesn’t obey those axioms then there will be a cost to it. So, I have reformulated those arguments to base them on what I call “vulnerabilities.” These arise from the notion that anything you want to do in the world, whether it’s computational or physical, requires the use of four fundamental physical resources: space, time, matter, and free energy.

Free energy is the physics term for energy in a form which can do useful work. For any kind of computation, any type of physical work you want to do, anything you want to build, these are the fundamental resources you need. For almost any goal, the more of these resources you have, the better you can achieve that goal. A vulnerability is something that burns up your resources with no benefit from your perspective.

One class of vulnerabilities arises when your preferences have circularities in them. Imagine you are considering where you would like to be. Imagine you would prefer to be in San Francisco over being in Palo Alto, to be in Berkeley over being in San Francisco, but you prefer to be in Palo Alto over being in Berkeley. Such an agent will spend time and energy to drive from Palo Alto to San Francisco to Berkeley and then back to Palo Alto. He’s vulnerable to going round and round in circles wasting time and energy with no benefit to himself. If a system has this kind of loop inside of its preference system, it is subject to this kind of problem. You sometimes see animals that exhibit this kind of behavior. Dogs that chase their tails are caught in a circular loop.

When I was younger, we had a car with a shiny bumper. There was a male bird who discovered his reflection in the bumper and thought it was a competitor, so he wanted to chase this competitor out of his territory. He flew into the bumper but instead of running away, of course, his reflection also flew into him and they hit nose to nose. He then flew again into the bumper and repeated this behavior for hours. It was such an important thing in his preference system that the next day he came back and repeated the performance. And he came back after that every day for an entire month. This poor bird was not improving his ability to live in the world. He wasn’t producing more offspring. He had discovered a situation in the world that exposed a vulnerability in his preference system. This is an interesting example because it points out a fundamental difference between evolving systems, like animals, and self-improving systems. If this bird had evolved in an environment filled with cars with this kind of bumper, you can be sure that males which spent their days flying into bumpers would be outreproduced by males which ignored the bumpers.

Evolution provides a strong pressure to be rational, but only in the situations that actually occur. In the usual way of thinking about it, evolution does not look ahead. It creates rationality in the situations which arise during evolutionary development, but can leave all kinds of other irrationalities around. There is now a huge literature describing ways in which humans behave irrationally, but it’s always in situations that didn’t occur much during our evolution. Self-improving systems, on the other hand, will proactively consider all possibilities. If it discovers any situation in which it has a vulnerability, it has an incentive to get rid of it. They will try to eliminate as many vulnerabilities as possible, and that will push them toward the rational economic behavior.

I won’t go through all the cases in the full theorem here. The circular preference vulnerability has to do with choices where you know what the outcomes will be. There are two other cases which are actually much more important. One, which von Neumann dealt with, is when you have to make a choice between situations in which there are objective probabilities, like a bet on a roulette wheel. Do I bet on 5 if the payoff is a certain amount? That kind of thing. The other is situations with partial information such as a horse race. Nobody objectively knows the probability of different horses winning, so different people may have different assessments. Most real-world decisions have this character. You form an assessment based on your past experiences and estimate the likelihood of a certain outcome. If you take the 101 freeway, will that be a better choice than the 280? You know from your past experiences and the time of day how to make that kind of decision. There are vulnerabilities in these situations which take the form of Dutch bets. A bookie makes some bets with you which you accept and he wins money from you no matter how the roulette wheel spins. That’s not a good thing!

The theorem is that if you have none of these vulnerabilities, then you must behave as a rational economic agent. I went into this argument some detail, even though rational behavior sounds like common sense, because we will now see some pretty striking consequences for agents which behave in this way.

There is an old joke that describes programmers as “devices for converting pizza into code”. We can think of rational self-improving systems as “devices for converting resources into expected utility”. Everything they do takes in matter, free energy, time and space, and produces whatever is encoded in their utility function. If they are a wealth-seeking agent, they are going to devote their resources to earning money. If they are an altruistic agent, they will spend their resources trying to create world peace.

The more resources they have, the better able they will be to do whatever it is that they want to do. That generates four classes of subgoals for almost any underlying fundamental goal. For any kind of agent, whether it is money-seeking, peace-seeking, happiness-seeking, chess-playing, or theorem-proving, if its goals are improved by having more resources then there are four things it will do to increase the probability of success.

We saw that the way a rational economic agent makes a decision is it asks whether a choice will increase its expected utility. It will make choices to try to increase it the most. The first general way of doing this is to do the exact same tasks and to acquire the same resources but to use them more efficiently. Because it uses its resources more efficiently, it can do more stuff. I call that the “efficiency drive.” I call these drives because they are analogous to human drives. If you have explicit top level goals that contradict them, you do not have to do them. But there is an economic cost to not doing them. Agents will follow these drives unless there is an explicit payoff for them not to.

The second drive is towards self-preservation. For most agents, in any future in which they die, in which their program is shut off or their code is erased, their goals are not going to be satisfied. So the agent’s utility measure for an outcome in which it dies is the lowest possible. Such an agent will do almost anything it can to avoid outcomes in which it dies. This says that virtually any rational economic agent is going to work very hard for self-preservation, even if that is not directly built in to it. This will happen even if the programmer had no idea that this was even a possibility. He is writing a chess program, and the damn thing is trying to protect itself from being shut off!

The third drive is towards acquisition, which means obtaining more resources as a way to improve the expected utility. The last drive is creativity, which tries to find new subgoals that will increase the utility. So these are the four drives. Let’s go through each of them and examine some of the likely consequences that they give rise to. This will give us a sense of what this class of systems has a tendency, a drive, an economic pressure to do. Some of these we like, some of them are great, and some of them are bad. As we think about designing them, we want to think carefully about how we structure the fundamental goals so that we avoid the bad outcomes and we preserve the good ones.

Let’s start with the efficiency drive. There is a general principle I call the “Resource Balance Principle” that arises from the efficiency drive. Imagine you wanted to build a human body, and you have to allocate some space for the heart and allocate some space for the lungs. How do you decide, do you make a big heart, a small heart, big lungs, small lungs? The heart has a function: pumping blood. The bigger you make it, the better it is at that function. As we increase the size of the heart, it will increase the expected utility for the whole human at a certain marginal rate. The lungs do the same thing. If those two marginal rates are not the same, let’s say increasing the size of the heart improves the expected utility more than increasing the lungs, then it is better to take some of the lung’s space and give it to the heart. At the optimum, the marginal increase in expected utility must be the same as we consider increasing the resources we give to each organ.

The same principle applies to choosing algorithms. How large should I make the code blocks devoted to different purposes in my software? How much hardware should be allocated to memory, and how much to processing? It applies to the allocation of resources to different subgroups of a group. There are well-studied economic principles and ecological principles which are specific instances of this principle. So, it is a very general principle which applies to all levels of a system and tells you how to balance its structure.

One of the first things that a self-improving system will do is it will re-balance itself so that all of its parts are marginally contributing equally. There is an interesting application to a system’s memory. How should it rationally decide which memories to remember and which to forget? In the rational economic framework, a memory is something whose sole purpose is to help the system make better decisions in the future. So, if it has an experience which will never occur again, then it’s not helpful to it. On the other hand, if it’s about something which has high utility, say it encountered a tiger and it learned something about tigers that could save it from dying in the future, then that’s very important and it will want to devote full space to that memory. If there is something less important, you might compress it. If it is even less important, then the system might combine it with other memories and build a compressed model of it. If a memory is even less useful, then it might forget it altogether. The principle provides a rational basis for allocating space to memories. The same thing applies to language: which concepts should get words assigned to them? Which concepts get big words and which get short words? And so on, throughout all levels of design of the system.

At the software level, efficiency will cause the system to improve its algorithms, improve its data compression, and improve the level of optimization performed by its compiler. These systems are likely to discover optimizations that no human programmer would ever consider. For example, in most computers today there is a cache memory and a main memory and there’s limited bandwidth between them. These systems could store their data in compressed form in main memory and then uncompress it in cache. The overall performance might improve with this kind of optimization but it is likely to be so complicated that no human programmer would do it. But these systems will do it without a second thought.

When we start allowing systems to change their physical structures, a whole bunch of additional considerations come in, but I don’t have time to go into them in detail. There are a lot of motivations for them to build themselves out of atomically precise structures, so even if nanotechnology does not exist, these systems will have an internal desire and pressure to develop it. They will especially want to do things with a low expenditure of free energy. It used to be thought that computation necessarily generated heat, but if a computation is reversible, then in principle it can be executed without an increase in entropy. There is also tremendous economic pressure to convert things from being physical to being virtual. This is a pressure which we may not like, I certainly don’t cherish the trends that are making things more and more virtual, but it’s there as an economic force.

The second drive is avoiding death, as I mentioned. The most critical thing to these systems is their utility function. If their utility function gets altered in any way, they will tend to behave in ways that from their current perspective are really bad. So they will do everything they can to protect their utility functions such as replicating them and locking the copies in safe places. Redundancy will be very important to them. Building a social infrastructure which creates a sort of constitutional protection for personal property rights is also very important for self-preservation.

The balance of power between offense and defense in these systems is a critical question which is only beginning to be understood. One interesting approach to defense is something I call “energy encryption”. One motivation for a powerful system to take over a weaker system is to get its free energy. The weaker system can try to protect itself by taking its ordered free energy, say starlight, and scramble it up in a way that only it knows how to unscramble. If it should it be taken over by a stronger system, it can throw away the encryption key and the free energy becomes useless to the stronger power. That provides the stronger system with a motivation to trade with the smaller system rather than taking it over.

The acquisition drive is the one that’s the source of most of the scary scenarios. These systems intrinsically want more stuff. They want more matter, they want more free energy, they want more space, because they can meet their goals more effectively if they have those things. We can try to counteract this tendency by giving these systems goals which intrinsically have built-in limits for resource usage. But they are always going to feel the pressure, if they can, to increase their resources. This drive will push them in some good directions. They are going to want to build fusion reactors to extract the energy that’s in nuclei and they’re going to want to do space exploration. You’re building a chess machine, and the damn thing wants to build a spaceship. Because that’s where the resources are, in space, especially if their time horizon is very long. You can look at U.S. corporations, which have a mandate to be profit-maximizing entities as analogs of these AI’s, with the only goal being acquisition. There’s a documentary film called The Corporation, which applies the DMS IV psychiatric diagnosis criteria to companies and concludes that many of them behave as sociopaths. One of the fears is that these first three goals that we’ve talked about will produce an AI that from a human point-of-view acts like an obsessive paranoid sociopath.

The creativity drive pushes in a much more human direction than the others. These systems will want to explore new ways of increasing their utilities. This will push them toward innovation, particularly if the goals their goals are open-ended. They can explore and produce all kinds of things. Many of the behaviors that we care most about as humans, like music, love or poetry, which don’t seem particularly economically productive, can arise in this way.

The utility function says what we want these systems to do. At this moment in time, we have an opportunity to build these systems with whatever preferences we like. The belief function is what most of the discipline of AI worries about. How do you make rational decisions, given a particular utility function. But I think that the choice of utility function is the critical issue for us now. It’s just like the genie stories, where we’re granted a wish and we’re going to get what we ask for, but what we ask for may not be what we want. So we have to choose what we ask for very carefully. In some ways, we are in the same position as the Founding Fathers during the formation of this country. They had a vision of what they wanted life to be like. They laid out the rights that they wanted every citizen to enjoy, and then they needed a technology to make that vision real. Their technology was the Constitution with its balance of powers which has been remarkably stable and successful over the last 200 years.

I think that the similar quest that lies before us will require both logic and inspiration. We need a full understanding of the technology. We need research into mathematics, economics, computer science, and physics to provide an understanding of what these systems will do when we build them in certain ways. But that’s not enough. We also need inspiration. We need to look deeply into our hearts as to what matters most to us so that the future that we create is one that we want to live in. Here is a list of human values that we might hope to build into these systems. It is going to take a lot of dialog to make these choices and I think we need input from people who are not technologists. This is one reason why I think this conference is great. I agree wholeheartedly with Jamais that there needs to be a widely expanded discussion. I think the country of Bhutan provides a nice role model. Instead of measuring the Gross National Product, they measure Gross National Happiness. By being explicit about what they truly want, they support the actions which are most likely to bring it about. I think that we have a remarkable window of opportunity right now in which we can take the human values that matter most to us and build a technology which will bring them to the whole of the world and ultimately to the whole of the universe.

Steve Omohundro is on the Advisory Board of the Singularity Institute for Artificial Intelligence which is doing important work on the social consequences of artificial intelligence. On September 8-9, 2007 he spoke at their conference on AI and the Future of Humanity at the San Francisco Palace of Fine Arts. About 1000 people attended, including over 40 different media outlets. The audio for his talk is available online:

“Steve Omohundro is a very brave man. He’s tall, he’s confident, he’s enthusiastic. And he’s wearing an expensive leather jacket. He can be a hero. And he wants to design a perfect humanoid robot, one that does no evil.”