JB: There are decent Wikipedia articles on “optimism bias” and “positive illusions”, which suggest that unrealistically optimistic people are more energetic, while more realistic estimates of success go hand-in-hand with mild depression. If this is true, I can easily imagine that most people working on challenging projects like quantum gravity (me, 10 years ago) or artificial intelligence (you) are unrealistically optimistic about our chances of success.

Indeed, I can easily imagine that the first researchers to create a truly powerful artificial intelligence will be people who underestimate its potential dangers. It’s an interesting irony, isn’t it? If most people who are naturally cautious avoid a certain potentially dangerous line of research, the people who pursue that line of research are likely to be less cautious than average.

I’m a bit worried about this when it comes to “geoengineering”, for example—attempts to tackle global warming by large engineering projects. We have people who say “oh no, that’s too dangerous”, and turn their attention to approaches they consider less risky, but that may leave the field to people who underestimate the risks.

So I’m very glad you are thinking hard about how to avoid the potential dangers of artificial intelligence—and even trying to make this problem sound exciting, to attract ambitious and energetic young people to work on it. Is that part of your explicit goal? To make caution and rationality sound sexy?

EY: The really hard part of the problem isn’t getting a few smart people to work on cautious, rational AI. It’s admittedly a harder problem than it should be, because there’s a whole system out there which is set up to funnel smart young people into all sorts of other things besides cautious rational long-term basic AI research. But it isn’t the really hard part of the problem.

The scary thing about AI is that I would guess that the first AI to go over some critical threshold of self-improvement takes all the marbles—first mover advantage, winner take all. The first pile of uranium to have an effective neutron multiplication factor greater than 1, or maybe the first AI smart enough to absorb all the poorly defended processing power on the Internet—there’s actually a number of different thresholds that could provide a critical first-mover advantage.

And it is always going to be fundamentally easier in some sense to go straight all out for AI and not worry about clean designs or stable self-modification or the problem where a near-miss on the value system destroys almost all of the actual value from our perspective. (E.g., imagine aliens who shared every single term in the human utility function but lacked our notion of boredom. Their civilization might consist of a single peak experience repeated over and over, which would make their civilization very boring from our perspective, compared to what it might have been. That is, leaving a single aspect out of the value system can destroy almost all of the value. So there’s a very large gap in the AI problem between trying to get the value system exactly right, versus throwing something at it that sounds vaguely good.)

You want to keep as much of an advantage as possible for the cautious rational AI developers over the crowd that is just gung-ho to solve this super interesting scientific problem and go down in the eternal books of fame. Now there should in fact be some upper bound on the combination of intelligence, methodological rationality, and deep understanding of the problem which you can possess, and still walk directly into the whirling helicopter blades. The problem is that it is probably a rather high upper bound. And you are trying to outrace people who are trying to solve a fundamentally easier wrong problem. So the question is not attracting people to the field in general, but rather getting the really smart competent people to either work for a cautious project or not go into the field at all. You aren’t going to stop people from trying to develop AI. But you can hope to have as many of the really smart people as possible working on cautious projects rather than incautious ones.

So yes, making caution look sexy. But even more than that, trying to make incautious AI projects look merely stupid. Not dangerous. Dangerous is sexy. As the old proverb goes, most of the damage is done by people who wish to feel themselves important. Human psychology seems to be such that many ambitious people find it far less scary to think about destroying the world, than to think about never amounting to much of anything at all. I have met people like this. In fact all the people I have met who think they are going to win eternal fame through their AI projects have been like this. The thought of potentially destroying the world is bearable; it confirms their own importance. The thought of not being able to plow full steam ahead on their incredible amazing AI idea is not bearable; it threatens all their fantasies of wealth and fame.

Now these people of whom I speak are not top-notch minds, not in the class of the top people in mainstream AI, like say Peter Norvig (to name someone I’ve had the honor of meeting personally). And it’s possible that if and when self-improving AI starts to get real top-notch minds working on it, rather than people who were too optimistic about/attached to their amazing bright idea to be scared away by the field of skulls, then these real stars will not fall prey to the same sort of psychological trap. And then again it is also plausible to me that top-notch minds will fall prey to exactly the same trap, because I have yet to learn from reading history that great scientific geniuses are always sane.

So what I would most like to see would be uniform looks of condescending scorn directed at people who claimed their amazing bright AI idea was going to lead to self-improvement and superintelligence, but who couldn’t mount an adequate defense of how their design would have a goal system stable after a billion sequential self-modifications, or how it would get the value system exactly right instead of mostly right. In other words, making destroying the world look unprestigious and low-status, instead of leaving it to the default state of sexiness and importance-confirmingness.

JB: “Get the value system exactly right”—now this phrase touches on another issue I’ve been wanting to talk about. How do we know what it means for a value system to be exactly right? It seems people are even further from agreeing on what it means to be good than on what it means to be rational. Yet you seem to be suggesting we need to solve this problem before it’s safe to build a self-improving artificial intelligence!

When I was younger I worried a lot about the foundations of ethics. I decided that you “can’t derive an ought from an is”—do you believe that? If so, all logical arguments leading up to the conclusion that “you should do X” must involve an assumption of the form “you should do Y”… and attempts to “derive” ethics are all implicitly circular in some way. This really bothered the heck out of me: how was I supposed to know what to do? But of course I kept on doing things while I was worrying about this… and indeed, it was painfully clear that there’s no way out of making decisions: even deciding to “do nothing” or commit suicide counts as a decision.

Later I got more comfortable with the idea that making decisions about what to do needn’t paralyze me any more than making decisions about what is true. But still, it seems that the business of designing ethical beings is going to provoke huge arguments, if and when we get around to that.

Do you spend as much time thinking about these issues as you do thinking about rationality? Of course they’re linked….

EY: Well, I probably spend as much time explaining these issues as I do rationality. There are also an absolutely huge number of pitfalls that people stumble into when they try to think about, as I would put it, Friendly AI. Consider how many pitfalls people run into when they try to think about Artificial Intelligence. Next consider how many pitfalls people run into when they try to think about morality. Next consider how many pitfalls philosophers run into when they try to think about the nature of morality. Next consider how many pitfalls people run into when they try to think about hypothetical extremely powerful agents, especially extremely powerful agents that are supposed to be extremely good. Next consider how many pitfalls people run into when they try to imagine optimal worlds to live in or optimal rules to follow or optimal governments and so on.

Now imagine a subject matter which offers discussants a lovely opportunity to run into all of those pitfalls at the same time.

That’s what happens when you try to talk about Friendly Artificial Intelligence.

And it only takes one error for a chain of reasoning to end up in Outer Mongolia. So one of the great motivating factors behind all the writing I did on rationality and all the sequences I wrote on Less Wrong was to actually make it possible, via two years worth of writing and probably something like a month’s worth of reading at least, to immunize people against all the usual mistakes.

Lest I appear to dodge the question entirely, I’ll try for very quick descriptions and google keywords that professional moral philosophers might recognize.

In terms of what I would advocate programming a very powerful AI to actually do, the keywords are “mature folk morality” and “reflective equilibrium”. This means that you build a sufficiently powerful AI to do, not what people say they want, or even what people actually want, but what people would decide they wanted the AI to do, if they had all of the AI’s information, could think about for as long a subjective time as the AI, knew as much as the AI did about the real factors at work in their own psychology, and had no failures of self-control.

There’s a lot of important reasons why you would want to do exactly that and not, say, implement Asimov’s Three Laws of Robotics (a purely fictional device, and if Asimov had depicted them as working well, he would have had no stories to write) or building a superpowerful AI which obeys people’s commands interpreted in literal English, or creating a god whose sole prime directive is to make people maximally happy, or any of the above plus a list of six different patches which guarantee that nothing can possibly go wrong, and various other things that seem like incredibly obvious failure scenarios but which I assure you I have heard seriously advocated over and over and over again.

In a nutshell, you want to use concepts like “mature folk morality” or “reflective equilibrium” because these are as close as moral philosophy has ever gotten to defining in concrete, computable terms what you could be wrong about when you order an AI to do the wrong thing.

For an attempt at nontechnical explanation of what one might want to program an AI to do and why, the best resource I can offer is an old essay of mine which is not written so as to offer good google keywords, but holds up fairly well nonetheless:

You also raised some questions about metaethics, where metaethics asks not “Which acts are moral?” but “What is the subject matter of our talk about ‘morality’?” i.e. “What are we talking about here anyway?” In terms of Google keywords, my brand of metaethics is closest to analytic descriptivism or moral functionalism. If I were to try to put that into a very brief nutshell, it would be something like “When we talk about ‘morality’ or ‘goodness’ or ‘right’, the subject matter we’re talking about is a sort of gigantic math question hidden under the simple word ‘right’, a math question that includes all of our emotions and all of what we use to process moral arguments and all the things we might want to change about ourselves if we could see our own source code and know what we were really thinking.”

The complete Less Wrong sequence on metaethics (with many dependencies to earlier ones) is:

JB: I’ll help you be wise. There are a hundred followup questions I’m tempted to ask, but this has been a long and grueling interview, so I won’t. Instead, I’d like to raise one last big question. It’s about time scales.

Self-improving artificial intelligence seems like a real possibility to me. But when? You see, I believe we’re in the midst of a global ecological crisis—a mass extinction event, whose effects will be painfully evident by the end of the century. I want to do something about it. I can’t do much, but I want to do something. Even if we’re doomed to disaster, there are different sizes of disaster. And if we’re going through a kind of bottleneck, where some species make it through and others go extinct, even small actions now can make a difference.

I can imagine some technological optimists—singularitarians, extropians and the like—saying: “Don’t worry, things will get better. Things that seem hard now will only get easier. We’ll be able to suck carbon dioxide from the atmosphere using nanotechnology, and revive species starting from their DNA.” Or maybe even: “Don’t worry: we won’t miss those species. We’ll be having too much fun doing things we can’t even conceive of now.”

But various things make me skeptical of such optimism. One of them is the question of time scales. What if the world goes to hell before our technology saves us? What if artificial intelligence comes along toolate to make a big impact on the short-term problems I’m worrying about? In that case, maybe I should focus on short-term solutions.

Just to be clear: this isn’t some veiled attack on your priorities. I’m just trying to decide on my own. One good thing about having billions of people on the planet is that we don’t all have to do the same thing. Indeed, a multi-pronged approach is best. But for my own decisions, I want some rough guess about how long various potentially revolutionary technologies will take to come online.

What do you think about all this?

EY: I’ll try to answer the question about timescales, but first let me explain in some detail why I don’t think the decision should be dominated by that question.

If you look up “Scope Insensitivity” on Less Wrong, you’ll see that when three different groups of subjects were asked how much they would pay in increased taxes to save 2,000 / 20,000 / 200,000 birds from drowning in uncovered oil ponds, the respective average answers were $80 / $78 / $88. People asked questions like this visualize one bird, wings slicked with oil, struggling to escape, and that creates some amount of emotional affect which determines willingness to pay, and the quantity gets tossed out the window since no one can visualize 200,000 of anything. Another hypothesis to explain the data is “purchase of moral satisfaction”, which says that people give enough money to create a “warm glow” inside themselves, and the amount required might have something to do with your personal financial situation, but it has nothing to do with birds. Similarly, residents of four US states were only willing to pay 22% more to protect all 57 wilderness areas in those states than to protect one area. The result I found most horrifying was that subjects were willing to contribute more when a set amount of money was needed to save one child’s life, compared to the same amount of money saving eight lives—because, of course, focusing your attention on a single person makes the feelings stronger, less diffuse.

So while it may make sense to enjoy the warm glow of doing good deeds after we do them, we cannot possibly allow ourselves to choose between altruistic causes based on the relative amounts of warm glow they generate, because our intuitions are quantitatively insane.

And two antidotes that absolutely must be applied in choosing between altruistic causes are conscious appreciation of scope and conscious appreciation of marginal impact.

By its nature, your brain flushes right out the window the all-important distinction between saving one life and saving a million lives. You’ve got to compensate for that using conscious, verbal deliberation. The Society For Curing Rare Diseases in Cute Puppies has got great warm glow, but the fact that these diseases are rare should call a screeching halt right there—which you’re going to have to do consciously, not intuitively. Even before you realize that, contrary to the relative warm glows, it’s really hard to make a moral case for trading off human lives against cute puppies. I suppose if you could save a billion puppies using one dollar I wouldn’t scream at someone who wanted to spend the dollar on that instead of cancer research.

And similarly, if there are a hundred thousand researchers and billions of dollars annually that are already going into saving species from extinction—because it’s a prestigious and popular cause that has an easy time generating warm glow in lots of potential funders—then you have to ask about the marginal value of putting your effort there, where so many other people are already working, compared to a project that isn’t so popular.

I wouldn’t say “Don’t worry, we won’t miss those species”. But consider the future intergalactic civilizations growing out of Earth-originating intelligent life. Consider the whole history of a universe which contains this world of Earth and this present century, and also billions of years of future intergalactic civilization continuing until the universe dies, or maybe forever if we can think of some ingenious way to carry on. Next consider the interval in utility between a universe-history in which Earth-originating intelligence survived and thrived and managed to save 95% of the non-primate biological species now alive, versus a universe-history in which only 80% of those species are alive. That utility interval is not very large compared to the utility interval between a universe in which intelligent life thrived and intelligent life died out. Or the utility interval between a universe-history filled with sentient beings who experience happiness and have empathy for each other and get bored when they do the same thing too many times, versus a universe-history that grew out of various failures of Friendly AI.

(The really scary thing about universes that grow out of a loss of human value is not that they are different, but that they are, from our standpoint, boring. The human utility function says that once you’ve made a piece of art, it’s more fun to make a different piece of art next time. But that’s just us. Most random utility functions will yield instrumental strategies that spend some of their time and resources exploring for the patterns with the highest utility at the beginning of the problem, and then use the rest of their resources to implement the pattern with the highest utility, over and over and over. This sort of thing will surprise a human who expects, on some deep level, that all minds are made out of human parts, and who thinks, “Won’t the AI see that its utility function is boring?” But the AI is not a little spirit that looks over its code and decides whether to obey it; the AI is the code. If the code doesn’t say to get bored, it won’t get bored. A strategy of exploration followed by exploitation is implicit in most utility functions, but boredom is not. If your utility function does not already contain a term for boredom, then you don’t care; it’s not something that emerges as an instrumental value from most terminal values. For more on this see: “In Praise of Boredom” in the Fun Theory Sequence on Less Wrong.)

Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.

I honestly don’t see how a rationalist can avoid this conclusion: At this absolutely critical hinge in the history of the universe—Earth in the 21st century—rational altruists should devote their marginal attentions to risks that threaten to terminate intelligent life or permanently destroy a part of its potential. Those problems, which Nick Bostrom named “existential risks“, have got all the scope. And when it comes to marginal impact, there are major risks outstanding that practically no one is working on. Once you get the stakes on a gut level it’s hard to see how doing anything else could be sane.

So how do you go about protecting the future of intelligent life? Environmentalism? After all, there are environmental catastrophes that could knock over our civilization… but then if you want to put the whole universe at stake, it’s not enough for one civilization to topple, you have to argue that our civilization is above average in its chances of building a positive galactic future compared to whatever civilization would rise again a century or two later. Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored… if you add a marginal resource that can, rarely, be steered by expected utilities instead of warm glows, devoting that resource to environmentalism does not make sense.

Similarly with other short-term problems. Unless they’re little-known and unpopular problems, the marginal impact is not going to make sense, because millions of other people will already be working on them. And even if you argue that some short-term problem leverages existential risk, it’s not going to be perfect leverage and some quantitative discount will apply, probably a large one. I would be suspicious that the decision to work on a short-term problem was driven by warm glow, status drives, or simple conventionalism.

With that said, there’s also such a thing as comparative advantage—the old puzzle of the lawyer who works an hour in the soup clinic instead of working an extra hour as a lawyer and donating the money. Personally I’d say you can work an hour in the soup clinic to keep yourself going if you like, but you should also be working extra lawyer-hours and donating the money to the soup clinic, or better yet, to something with more scope. (See “Purchase Fuzzies and Utilons Separately” on Less Wrong.) Most people can’t work effectively on Artificial Intelligence (some would question if anyone can, but at the very least it’s not an easy problem). But there’s a variety of existential risks to choose from, plus a general background job of spreading sufficiently high-grade rationality and existential risk awareness. One really should look over those before going into something short-term and conventional. Unless your master plan is just to work the extra hours and donate them to the cause with the highest marginal expected utility per dollar, which is perfectly respectable.

Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point.

Which brings me to AI timescales.

If I knew exactly how to make a Friendly AI, and I knew exactly how many people I had available to do it, I still couldn’t tell you how long it would take because of Product Management Chaos.

As it stands, this is a basic research problem—which will always feel very hard, because we don’t understand it, and that means when our brain checks for solutions, we don’t see any solutions available. But this ignorance is not to be confused with the positive knowledge that the problem will take a long time to solve once we know how to solve it. It could be that some fundamental breakthrough will dissolve our confusion and then things will look relatively easy. Or it could be that some fundamental breakthrough will be followed by the realization that, now that we know what to do, it’s going to take at
least another 20 years to do it.

I seriously have no idea when AI is going to show up, although I’d be genuinely and deeply shocked if it took another century (barring a collapse of civilization in the meanwhile).

If you were to tell me that as a Bayesian I have to put probability distributions on things on pain of having my behavior be inconsistent and inefficient, well, I would actually suspect that my behavior is inconsistent. But if you were to try and induce from my behavior a median expected time where I spend half my effort planning for less and half my effort planning for more, it would probably look something like 2030.

But that doesn’t really matter to my decisions. Among all existential risks I know about, Friendly AI has the single largest absolute scope—it affects everything, and the problem must be solved at some point for worthwhile intelligence to thrive. It also has the largest product of scope of marginal impact, because practically no one is working on it, even compared to other existential risks. And my abilities seem applicable to it. So I may not like my uncertainty about timescales, but my decisions are not unstable with respect to that uncertainty.

JB: Ably argued! If I think of an interesting reply, I’ll put it in the blog discussion. Thanks for your time.

Post navigation

75 Responses to This Week’s Finds (Week 313)

Intra-galactic civilizations??? In order to have that, we’ve got bigger problems than “friendly AI” to solve; we need, at a minimum, a faster-than-light means of communication and exploration (see http://en.wikipedia.org/wiki/List_of_nearest_stars) since a conversation with 4.3 years between answers would be difficult at best…

As for “The scary thing about AI is that I would guess that the first AI to go over some critical threshold of self-improvement takes all the marbles—first mover advantage, winner take all. … the first AI smart enough to absorb all the poorly defended processing power on the Internet—there’s actually a number of different thresholds that could provide a critical first-mover advantage”, there’s a quick answer – just unplug the machine, physically disconnect from the Net, and find the software/firmware that got installed to hijack the machine, put up a decent firewall and other measures, and then get back on the net. If that doesn’t fix the problem, then, by God, it’s time to call Moxie Marlinspike.

Do you really think any firewall that humans could build would block a superintelligent AI? We can’t even block other humans. Anyways, there are other points of failure – see http://yudkowsky.net/singularity/aibox.

Firstly, your thinking is bounded by natural human lifespans. There’s no reason for AGIs to be restricted to the same mortal constraints as ourselves. Post-singularity intelligent agents (whether they be AGIs or modified humans) will have indefinite lifespans. A voyage of several millennia may not be that big of a deal.

Secondly, when you accelerate towards the speed of light, things like time dilation and length contraction start to come into play. It’s actually quite possible to travel across the visible universe within a human lifespan (relative to the frame of reference of the person making the trip) if you can supply at least a constant 1G of acceleration, granted that you can also overcome the other technical problems such as ionizing radiation, a viable and working means of propulsion and so on.

That’s not to say interstellar travel won’t be hard. But to a sufficiently intelligent being, it may be a walk in the park so to speak.

“Post-singularity intelligent agents (whether they be AGIs or modified humans) will have indefinite lifespans.” Speculation masquerading as fact; How do you know this? If plastics are used in any way, you should know that plastics depolymerize over time, that solar panels lose quantum efficiency by large amounts over even a 25-year period, much less 100 or 1000 years, and that nuclear reactions have a finite lifespan…

Post-singularity intelligent agents (whether they be AGIs or modified humans) will have indefinite lifespans.

streamfortyseven wrote:

How do you know this?

Of course we don’t really know anything about ‘post-singularity intelligent agents’, and the whole idea of the singularity suggests that if it happens, it’ll be hard for us here to predict what it’s like.

But, we can have fun fooling around with scenarios!

Here’s a post-singularity scenario where interstellar travel is common, but intelligent agents may not have indefinite lifespans, because they kill each other.

In the quest for ever faster growth, our civilization evolves toward ever faster exploitation of natural resources. Soon the Earth is not enough. So, to keep our culture alive, we send out one or more fleets of von Neumann probes that eat up solar systems as they go, turning all available raw material into more probes of the same kind.

Different brands—or species—of probes compete among each other, evolving toward ever faster expansion. Eventually, the winners form a wave expanding outwards at nearly the speed of light—demolishing everything behind them, leaving only wreckage. Any civilization unprepared for their onslaught will be torn to shreds… like a farm hit by a plague of locusts.

Unfortunately, even if we don’t let this happen, some other civilization might.

And even if something is unlikely, in a sufficiently large universe it will happen, as long as it’s possible. And then it will perpetuate itself, as long as it’s evolutionarily fit. Our universe seems pretty darn big. So, even if a given strategy is hard to find, if it’s a winning strategy it will get played somewhere.

So, we can expect a small but nonzero density of solar systems from which these expanding spheres of destruction spread.

At points where these spheres collide, perhaps one will “win”—or perhaps both will die, since each has used most of the resources behind it.

However, even in this nightmare scenario of “spheres of von Neumann probes expanding at near lightspeed”, we don’t need to worry about a bleak future for the universe as a whole—any more than we need to worry that viruses will completely kill off all higher life forms. Some fraction of civilizations will probably develop defenses in time to repel the onslaught of these expanding spheres.

It’s interesting that Eliezer Yudkowsky (in this interview) is only considering two possibilities: (i) a cautious team that’s aiming to produce an independent AI succeed or (ii) a “non-cautious” team that’s aiming to produce an independent AI succeed. In an echo of my comments in previous parts of the interview, I think there’s a possibility (iii), namely a team that is aiming to produce a more sophisticated but not envisaged as being independent AI happen to produce something that is effectively an independent AI. Clearly it’s desirable to find ways to encourage (i) over (ii), but given the relative speeds of development I think that, of course assuming AI is possible at all in the near future, the “race” is between (i) and (iii). As a historical analogy, one could look at the computer industry: the x86 chip which was initially envisaged and designed as the “pragrmatic, fulfill just current customer need” solution whilst the labs worked on a more elegant long-term chip architecture. It’s arguably even 30-odd years on is still the fastest developing chip architecture (hidden under the hood)out there.

What do you mean by “independent” AI? Are we talking about a globally-networked AI which exhibits strong emergence into a silicon-based consciousness, as David Chalmers suggests, indirectly:

“We have seen that strong emergence, if it exists, has radical consequences. The question that immediately arises, then, is: are there strongly emergent phenomena?

My own view is that the answer to this question is yes. I think there is exactly one clear case of a strongly emergent phenomenon, and that is the phenomenon of consciousness. We can say that a system is conscious when there is something it is like to be that system; that is, when there is something it feels like from the system’s own perspective. It is a key fact about nature that it contains conscious systems; I am one such. And there is reason to believe that the facts about consciousness are not deducible from any number of physical facts. …

In our world, it seems to be a matter of law that duplicating physical states will duplicate consciousness; but in other worlds with different laws, a system physically identical to me might have no consciousness at all.

This suggests that the lawful connection between physical processes and consciousness is not itself derivable from the laws of physics but is instead a further basic law or laws of its own. The laws that express the connection between physical processes and consciousness are what we might call fundamental psychophysical laws.

I think this account provides a good general model for strong emergence. We can think of strongly emergent phenomena as being systematically determined by low-level facts without being deducible from those facts. In philosophical language, they are naturally but not logically supervenient on low-level facts.

In any case like this, fundamental physical laws need to be supplemented with further fundamental laws to ground the connection between low-level properties and high-level properties.”

On the other hand, with respect to weak emergence, Chalmers states that

“The concept of emergence is often tacitly invoked by theorists in cognitive science and in the theory of complex systems, in such a way that it is clear that a notion of other than the notion of strong emergence is intended. We can take it that something like weak emergence is at play here, and we can then use the examples to make sense of just what weak emergence comes to.”

and cites the example of connectionist networks, in which “high-level ‘cognitive’ behaviour emerges from simple interactions between simple threshold logic units.”

and states further that

“one might suggest that weak emergence is the phenomenon wherein complex, interesting high-level function is produced as a result of combining simple low-level mechanisms in simple ways. I think this is much closer to a good definition of emergence. Note that COBOL programs, and many biological systems, are excluded by the requirement that not only the mechanisms but also their principles of combination be simple. (Of course simplicity, complexity, and interestingness are observer-relative concepts, at least for now, although some have tried to explicate them in terms of Chaitin–Kolmogorov–Solomonoff complexity.) Note also that most phenomena that satisfy this definition should also satisfy the previous definition, as complex and interesting consequences of simple processes will typically be non-obvious.

This conclusion captures the feeling that weak emergence is a ‘something for nothing’ phenomenon. And most of our examples fit. The game of Life and connectionist networks are clear cases: interesting high-level behaviour emerges as a consequence of simple dynamic rules for low-level cell dynamics. In evolution, the genetic mechanisms are very simple, but the results are very complex. (Note that there is a small difference, in that in the latter case the emergence is diachronic, i.e. over time, whereas in the first two cases the emergence is synchronic, i.e. not over time but over levels present at a given time.)”

I don’t really like the term AI, but I use it because everyone else does. AI is literally just defined to mean something that appears to do something a human does using intelligence. So, e.g., a computer system that is capable of “analysing and producing designs for” ever more efficient computer chips and using those within itself in a feedback loop resulting in design chips even faster, and no other understanding would count as an AI, but I don’t think it’s the kind of thing people are worrying about. For me, an independent AI is something that has a sufficient knowledge domain that it is likely to be able to “think” about new domains and initiate some actions based on those analyses (the kind of thing Yudkowsky worries about).

Abstract: The knowledge of the different types of emergence is essential if we want to understand and master complex systems in science and engineering, respectively. This paper specifies a universal taxonomy and comprehensive classification of the major types and forms of emergence in Multi-Agent Systems, from simple types of intentional and predictable emergence in machines to more complex forms of weak, multiple and strong emergence.

Thanks for mentioning my old paper ;-) It is not directly about emergence of consciousness, which is perhaps the most complex form of emergence. And certainly the most interesting. Although consciousness emerges in a brains which consists of billions of neurons, it does not emerge from neurons alone. The environment plays an essential role. I guess the emergence of (artificial) consciousness can be explained by a collision of worlds or connection between different universes. The moment we make the connection or “tunnel” is the “magic” moment of self-awareness. The precise location of the connection is associated with the position of the connection, and is the root of our subjective experience.

OK, what do you think the odds are Yudkowsky hasn’t thought of that? Assuming he has, what do you think the chances are that the reason he didn’t suggest that as an answer to the problem is because he thinks it won’t work? Assuming he thinks it won’t work, what are the chances that it’s because he has what he thinks are crushing refutations of that idea? Assuming he thinks that, what are the chances he’s right, considering that his job is to think about this issue every day? Assuming he’s not right, what are the chances that it’s because of a reason obscure to both you and him, such that you have no defense of your suggestion and it isn’t really a “quick answer”?

If you think about it for the amount of time you spent typing your comment, I fully expect you to be able to think of at least two reasons why that doesn’t solve the problem.

Historically, The Philippines were administered by Spain through Mexico, a colony. I consider places able to communicate over the span of years potentially part of one civilization for many of the important things I would want a vast civilization for.

“(ii) a “non-cautious” team that’s aiming to produce an independent AI succeed. In an echo of my comments in previous parts of the interview, I think there’s a possibility (iii), namely a team that is aiming to produce a more sophisticated but not envisaged as being independent AI happen to produce something that is effectively an independent AI.”

You think it more likely that by the time technology and computer science advances enough that it’s likely people will have accidentally created AI, it’s very unlikely that people trying to build AI will have already done so?

Note that I said that if unplugging the machine didn’t work, or if it were impossible, then it would be time to call Moxie Marlinspike…

If you’re concerned about things getting out of control, the first thing you do is to design your system so that if you get indications of an out of control state, you can shut down the system or get it in a stable controlled state. There’s no indication in Yudkowsky’s work that he’s taken this critical design element into consideration.

As for “(ego)-crushing refutations” he’s present on this forum, and is perfectly capable of explaining his design for such a system. I avoid making assumptions about what people are thinking unless I know them really well, and even then it’s risky. Finally, so far as ego-crushing goes, that’s sensei’s job.

Right now, with no intelligence beyond the mechanistic designs of mobsters, spooks, and script kiddies, tens of millions of computers are compromised components of hostile botnets. You, personally either installed updated software packages on your computer this month (fixing bugs that would have allowed any sufficiently intelligent attacker to compromise it last month), or you didn’t (leaving unpatched bugs that will allow it to be compromised at any time).

Are you expecting this situation to get any better once a self-improving intelligence is among the attackers? Wouldn’t it instead get much, much worse?

I seriously have no idea when AI is going to show up, although I’d be genuinely and deeply shocked if it took another century (barring a collapse of civilization in the meanwhile).

Here is the problem as I see it: We don’t understand how humans think, we don’t have any ideas about how brains work, and we did not make much progress during the last centuries. We could not even set up a breeding program to breed super-intelligent humans, because we don’t even understand if and how intelligence is passed on from generation to generation.

So, in order to enhance human intelligence, we’ll need a series of major breakthroughs in our understanding that no one can extrapolate.

And speculations that our current computer technology could be used to create AI seems to me as realistic as speculations that a telephone switchboard could develop a conciousness, if only the operators would increase their sticking speed.

I understand that nomenclature of informatics is used metaphorically in the discussion of AI, just like people compared human brains to telephone switchboards a century ago. I think it is optimistic to say that maybe computer technology will be advanced enough in a century for people to laugh at this comparison as I do at the telephone switchboard one now.

I didn’t see Tim van Beek say “even in theory, a large and interconnected telephone switchboard could not serve as a substrate for consciousness”.

Personally, I’m pretty sure that in principle we could get an enormous telephone switchboard to become conscious, at least if it were able to interact with the outside world somehow. But in practice, we’re clueless as to how to achieve this.

Could you explain why you think that a conscious mind could not, even in theory, use a sufficiently large and interconnected telephone switchboard as a substrate?

My comment is all about practical feasibility. An AI is obviously a very complex system, switchboards, or microprocessors, are very simple systems. Since Yudkowsky expects the construction of an AI in this century, there need to be some adequate technology that can be used to implement it, current microprocessor technology is not complex enough.

We don’t understand how humans think, we don’t have any ideas about how brains work, and we did not make much progress during the last centuries.

Last centuries? Detailed neuropsychology and good cognitive science are barely half a century old; I wouldn’t hold past failures like phrenology, Freudianism, animism, strict behaviourism, etc. against the expected rate of progress of the current sciences dealing with the brain and mind as their subject matter, because the approach is fundamentally different. I’m sure if you averaged the amount we know about the mind over all of the time that people have been trying to understand the mind, you’d get a very disappointing rate of progress, but that’s probably not a good way to extrapolate.

And speculations that our current computer technology could be used to create AI seems to me as realistic as speculations that a telephone switchboard could develop a conciousness, if only the operators would increase their sticking speed.

I understand that nomenclature of informatics is used metaphorically in the discussion of AI, just like people compared human brains to telephone switchboards a century ago. I think it is optimistic to say that maybe computer technology will be advanced enough in a century for people to laugh at this comparison as I do at the telephone switchboard one now.

It’s not a metaphor! There are massive qualitative differences between trying to make a mind out of a telephone switchboard and trying to make a mind out of a computer (even a mainframe from 1975). A century ago we didn’t have Turing. We didn’t understand the theoretical limits on what processes could and couldn’t be implemented physically, and, more generally, what processes could implement each other. If, as currently appears to be the case, physics is computable or at least computably approximable to arbitrary precision, then we know for a fact that intelligence and consciousness could, at least in principle, be implemented using current computer technology, because our brains exist within physics, and physics could be implemented using current computer technology. (If you only meant that current computer technology doesn’t have enough space or speed to do intelligence efficiently, then I’d say that’s possible but I’d still be a bit surprised. Intelligence seems like the sort of thing that’ll turn out to be simple in retrospect; it may be another 20 or 40 or 60 years before we know how to code a working mind, but I’d bet that once we do, we find that at least a basic self-improving seed AI would have been able to run on a high-end computer in 2011.)

And of course it could turn out that the universe is uncomputable and human brains actually require some of the exotic uncomputable aspects to run, but that seems like a hugely overkill hypothesis for why we currently don’t understand intelligence. Alternate hypothesis: it’ll eventually turn out to be just as normal as every other mysterious-seeming thing humans have ever studied, and it’s just taking us a while because it’s the most complex object we know of and we’ve been studying it for less than a century (look how long it took for, say, physics to start approaching correctness, even just measuring from the beginning of its serious scientific study).

Not that I think AI will actually require reverse-engineering the human brain, but that puts an upper bound on how difficult it should be, at least.

Intelligence seems like the sort of thing that’ll turn out to be simple in retrospect…

Why?

Note that it took quite a while of evolution to achieve intelligence of the human sort. Starting from the formation of the Earth as a ball of hot rock, it took only 450 million years for life to arise. For intelligence to arise, it took about 10 times as long.

So, one question is whether we’ll make self-sustaining artificial life ‘from scratch’ before or after we make artificial intelligence.

But that raises another question: will we ever bother making artificial life from scratch, or will we just keep tweaking existing life forms until we get them to do whatever we want?

And another: will we start by making artificial intelligence ‘from scratch’, carefully designing every bit of it and understanding how it all works… or will we piggy-back heavily on natural intelligence: that is, copy things brains do to build smart machines, even without understanding exactly why those things work.

It seems quite possible that we’ll copy things brains do, and never quite understand why they work. In this case it might be tricky to “get the value system exactly right”, as Yudkowsky put it.

Note that it took quite a while of evolution to achieve intelligence of the human sort. Starting from the formation of the Earth as a ball of hot rock, it took only 450 million years for life to arise. For intelligence to arise, it took about 10 times as long.

Be very careful here, it’s difficult to infer anything about the algorithmic complexity of the brain from evolutionary timelines, and it’s difficult to infer anything about the set of “intelligent algorithms” (however you might define that) from the brain. In the real world, not only did evolution have to discover the right algorithm to implement, it also had to figure out how to build all of the physical structures in the brain that would implement that algorithm, using an opaque and limited set of commands to build the thing. It’s hard to estimate exactly how much that implementation layer complicates the evolution and encoding of the algorithm, but it certainly bloats the dimensionality of the search space substantially in a way that makes looking for brain algorithms less efficient.

Furthermore, while general intelligence is clearly useful in nature, it’s not the dominant factor involved in natural selection, so it is reasonable to assume that a guided search specifically targeting general intelligence would realize an effective algorithm in much less time than nature did.

It’s worth thinking about the numbers. The pieces of DNA that encode the brain can be compressed down to about 25 million bytes (biologists will object here and say that even if it only takes that many bytes to store the DNA the brain is not really “encoded” in the DNA in any meaningful way, yadda yadda – suffice it to say their objections are valid, but irrelevant to my purposes here, because I’m not saying that we’ll actually simulate the brain from this representation, I’m just getting a complexity estimate). Unless DNA happens to be a particularly compact representation mechanism for expressing intelligence (which is “overwhelmingly unlikely”, a hand-waving statement that I can make more precise if asked), this means that 25mb is probably a good upper bound on the minimum program size needed to create an intelligent algorithm.

Now start trimming all the crap out from our brain that we don’t really need if all we want is an intelligent algorithm, and cut out all the garbage that’s coding specifics of the physical implementation, and we’re starting to whittle that 25mb estimate substantially. In the end, I’d be pretty shocked if we couldn’t do away with more than 90% of the brain’s implementation code and still end up implementing effectively the same underlying algorithm.

So let’s say we can get it down to 5 megabytes or so, which I think is fairly conservative. That’s now looking a lot more like programs that we know how to write; granted we don’t know how to write this particular program, but looking at it this way it seems a lot closer than thinking about billions of neurons all wired up in just the right way.

And we still haven’t even covered the possibility that there’s a much simpler algorithm to solve whatever problem an “atom of intelligence” solves that is not possible (or not easy) to implement in the brain – while DNA may not be an especially efficient way to encode intelligent algorithms, we have every reason to believe that our programming languages are quite a bit better, as they were specifically designed to allow us to pack the types of logic that tend to be useful in problem solving into relatively small spaces. There are a lot of algorithms that are very easy to code that we would never expect evolution to figure out in the brain because the primitive operations in the brain don’t make them very easy.

I’m not saying picking an intelligent algorithm out of the bag of all algorithms less than 5 (or 10, or 2…) megabytes will be easy, by any means. But I think it would be a mistake to think that just because nature took a long time to do it it’s necessarily a difficult problem. The fact that it did it at all indicates that it’s probably not as tough a nut to crack as it seems…

Detailed neuropsychology and good cognitive science are barely half a century old;…

Alright, I’ll not extrapolate future progress based on past failures, but still: What we know today about the functioning of human brains could be compared to being able to predict when a floppy drive will make noise and knowing that electrons move inside a computer. It is not enough to understand how the thing works, it is not enough to reconstruct it, it is not even enough to know it will still be capable of if you damage a part, let alone being able to actually improve it.

Besides that we know that brains can grow new cells and new connections, and that parts of a brain can take over some functionality of other parts when those get damaged. This seems to move brains even farther away from current technologies.

If you only meant that current computer technology doesn’t have enough space or speed to do intelligence efficiently, then I’d say that’s possible but I’d still be a bit surprised. Intelligence seems like the sort of thing that’ll turn out to be simple in retrospect; it may be another 20 or 40 or 60 years before we know how to code a working mind, but I’d bet that once we do, we find that at least a basic self-improving seed AI would have been able to run on a high-end computer in 2011.

I’m also interested in the discussion what is needed to implement an AI in principle, but this comment of mine was concerned with the practical feasability of implementing an AI with current computer technology. Two points:

1. There does not exist any computer program that I’m aware of that is capable to alter itself to fix bugs. Not even the most obvious ones (obvious from the viewpoint of humans). If you have any idea how to achieve this I’d be very interested in investing in your startup :-)

2. Presuming that it can be done in principle (programming an AI): How many lines of code in a 4th generation language do you estimate would be needed to implement an AI? If I had to guess I’d say certainly more than a billion. So, I don’t think that we’ll have the capacity to even type this kind of program in the 21st century. And we certainly would not be able to understand it, think it through, debug it, test it etc.

1. I’ve never heard of a computer program that can alter itself to fix bugs, but computer programs which can alter themselves to improve efficiency are practically standard undergraduate CS material now.

The idea behind a “seed” AI is not that it might be coded with the wrong utility function and then figure out the right one. “If you put into the machine wrong figures, will the right answers come out?” was a ridiculous question a century ago and it will remain ridiculous a century from now. But the possibility of an AI being created which can make exponential improvements to its own accuracy and efficiency is much more serious.

2. “Lines of code” has never been a serious metric; anyone with experience programming can relate an anecdote in which they made a program much better and much shorter at the same time. You are starting to hit on part of the real problem, though. If you’re writing a program whose failure modes look less like “suck up too much memory and crash the computer” and more like “spread across the internet and crash civilization”, then cycles of hacking it together and running it with some debugging options turned on start to look a lot less appealing.

Interesting, do you have a reference? I only know about recompiling mechanisms like Java’s Hot Spot compiler, or trial and error recombination of predeployed code snipptes as done in the FFTW (the “fastest Fourier Transform of the West” library). Both would not count as a “code changing” mechanism.

“Lines of code” has never been a serious metric…

Sure, there isn’t one metric that says it all, but knowing the LoC is better than knowing nothing.

I’m a bit at a loss to explain my point, which is that I don’t think we could make microprocessors do things that would qualify as intelligent behaviour, with current programming paradigms. Speculating about the necessary LoC isn’t a good start, but I did not have a better idea.

For example, there are already billions of lines of code executing to show to you what I type over here, without exhibiting any behaviour that I would classify as “intelligent”. Of course that is not a good proof of anything.

The very concept of an optimizing compiler is what I was alluding to; take a look at the stages in the “bootstrapping process” for building the gcc compilers from a third-party compiler, for an example of a binary effectively rewriting itself. Sorry to disappoint, but I suspect if you came upon the concept from the first time (after writing in bare binary or assembly, or after using an interpreter) you’d find the concept pretty amazing. A computer program that translates code, even its own code, into other code that produces the exact same results faster?

I suspect that the biggest limitation there is “exact same” – even with today’s compilers you can often get numerical results that run twice as fast if you turn on some –enable-scary-fp flags that add a little extra floating point error. And that’s basically just from a little expression reordering and caching. If an artificial intelligence program was heavily dependent on optimizable subproblems like say, uncertainty quantification estimation, where different mathematical approximations can provide orders of magnitude difference in computational efficiency, then similar improvements might be possible but on a vastly more dramatic scale.

How many lines of code in a 4th generation language do you estimate would be needed to implement an AI? If I had to guess I’d say certainly more than a billion.

I mentioned this above, but wanted to respond directly: IMO you’re way off in your estimate. Even if you won’t accept my logic there to its conclusion (which is that the smallest working AI algorithm could probably fit on a couple of floppy disks from the 80s), we can put an upper bound the minimum code size by looking at the human genome, which can be stored in about 800 megabytes.

A large part of that 800 megabytes is unrelated to the problem of intelligence, a lot of it is non-coding, etc., so there’s a whole lot of trimming you can do. Worst case, you can easily get the upper bound down to about 50 megabytes of relevant information, and I’d argue that you can go a lot further by making some assumptions about how inefficient the encoding of a neural algorithm is likely to be when it’s done in DNA.

Personally, I would be shocked if it took more than 100k LoC to implement some algorithm that is worthy of the label “intelligent”. I suspect it could quite reasonably be done in more like 20k LoC, possibly far less if we really understood what we were doing and how to properly define and chunk up the problem of intelligence. Whether it would run very well on today’s machines is another matter altogether, and shoving enough information through it so that it could understand human tasks might require massive amounts of training – it won’t come fresh out of the box speaking English, that’s for sure! But the learning algorithm itself could be coded in a very reasonable amount of space.

Personally, I would be shocked if it took more than 100k LoC to implement some algorithm that is worthy of the label “intelligent”. I suspect it could quite reasonably be done in more like 20k LoC, possibly far less if we really understood what we were doing and how to properly define and chunk up the problem of intelligence.

I think the weak point is the comparison of a 4th generation programming language and the genome

(I’ll skip the part where I argue that the whole genome paradigm – the genome determines everything – is unconvincing. I think we could also get completely lost in a discussion about what an intelligent algorithm is supposed to be, let’s say we would recognise it the moment we see one).

My hypthesis is: The genome is a data encoding for a much more complex and capable machine than a 4th generation programming language is, which is broken down in several steps to machine code that is executable by a microprocessor. A very silly example is the comparison to an instruction manual – read by an intelligent person, the manual will prompt intelligent, problem solving behaviour, but this originates from the instance that interpretes the instructions, not from the instructions themselves.

The problem is that we would have to know more about the mechanism of emergence of intelligent behaviour to discuss if – and if “yes”: where – the analogy really breaks down.

Part of the problem in getting quantitative estimates (beyond upper bounds on existence) from the human brain is that the human brain has evolved to provide certain important capabilities from birth (eg, breathing, muscular control) and to make other capabilities “just a matter of development time” (eg, eye focussing and binocular vision). It’s more advantageous to the species that essentially every individual will reliably develop a certain level of sight than to allow a greater level of experimentation in “growing” eye neural control systems which might eventually yield more compact results.

In some areas of AI it’s a commonly taken position that if you don’t provide some framework constraining the form of the result (which can be viewed as adding a prior) optimisation processes for learning interesting “complex tasks” generally converge to poor, uninteresting solutions, and this inevitably bulks up the amount you’ve got to specify. But the degree to which one has to do that to have a good chance of getting a decent solution, and how much one could leave the system to learn itself if one had sufficient computing power, is unknown. (It’s worth remembering the human brain is essentially the result of an evolutionary optimisation using all the genetic information in the brains of all those who have ever lived, not just all the genetic information in one individual.)

Some people are “someone else” with respect to their specialized skill levels and usual income levels. If a lawyer can work an extra hour and make enough money to pay five unemployed people to work at a soup kitchen for an hour each, then clearly something efficient has happened. And, assuming for argument’s sake that we consider soup kitchens to be doing valuable work, it would seem that causing five hours’ worth of soup kitchen work to get done is better than causing one hour, and trading that off for the sake of fuzzy we’re-all-in-this-together feelings is missing the point.

I believe the stupidest people in the world are the only people working on AI. These people honestly believe that the human mind is simple enough for us to figure out how it works, but if the mind were that simple to figure out, we would all be simple-minded. Using the mind to figure out how the mind works amounts to circular reasoning, which isn’t a very scientific thing to do. The scientific community can’t even come up with an agreed upon definition of IQ, nor have they ever come up with a way to reliably and objectively measure IQ in humans, much less other animals, yet these very same people are going to develop AI? How can something stupid make something as smart or smarter than itself? It’s a ridiculous idea.

Can you imagine a chess-playing computer that was better at chess than it’s programmers?

The problem with this specific example is that chess programs have not much similarity with how humans think about chess: The programs that I know use a brute-force breadth first computation of the graph of all possible constellations that may occur up to a certain level, which is determined by time and a heuristic that tells the program if it can stop prematurely. The roots of the tree are then evaluated based on a set of heuristics again. That is far from how humans think, and from the viewpoint of AI a dead end.

It’s more like someone constructing a car that can drive faster than any human can run.

But I agree with your general point: I don’t see any problem in principle that humans could understand their own brain.

The problem with this specific example is that chess programs have not much similarity with how humans think about chess

It’s far from obvious that an “intelligent machine” has to work in the same way that humans think. Even then, it’s unclear that machines are doing something qualitatively different to chess masters when playing actual games, it’s just that the human chess player is able to do pattern recognition on many more branches of the search tree/”memorised endgames” to see more are going to be unproductive, whereas the chess machine tends to do more work before coming to that opinion.

It’s also not completely clear that this approach is a dead-end. My personal view is that it is, but probably for a different reason to you: a basic assumption on the tree search is that one wants judgements to be completely accurate, whereas humans tend to employ much more heuristics/pattern recognition/approximation and tend to do ok-ish on self-directed intelligent enquiry (even if people tend to get many exact logical questions wrong), so it might well work for “general artificial intelligences” as well.

It’s far from obvious that an “intelligent machine” has to work in the same way that humans think.

Sure, but animals (including humans) are the only clue that we have how intelligence works. Besides, making computers think like humans is an old dream, dating back at least to Mikhail Botvinnik. He would be disappointed, I guess, at the success of the brute-force approach over the “humanoid” approach.

…it’s unclear that machines are doing something qualitatively different to chess masters when playing actual games…

The only clue we have here are chess masters that explain what they think – and the main point is that they choose the best two or three moves instinctivly and almost instantly, and contemplate those, mostly never calculating ahead more than a couple of moves. The difference between world class players and me is not that they think faster and more clearly on a concious level, but that they pick the best moves instinctively and spend their time on them, while I pick suboptimal moves and think about those. We don’t understand how this instinct works.

It’s also not completely clear that this approach is a dead-end.

It is a dead end in the sense that people have given up trying to understand how humans think and to get computers to think in a similar way, as they have tried in the 20th century. In this sense no one is working on a humanoid AI in chess anymore. (Ok, almost no one, I do know about different approaches, we’ll see.)

Brilliant comments by Yudkowsky. His observation that ambitious people would rather destroy the world than be unimportant really gets to the heart of the problem. I’m kind of ambivalent on this issue myself, because as a Cosmist I realize that the world is going to end anyway, and in a universe this vast and meaningless, in which my ego is all I have, why shouldn’t I try for godhood, and if things go badly at least I’ll get to watch the world being destroyed? It beats being a nobody and living a boring life doesn’t it?

I think this is a serious philosophical problem that needs a solution; i.e. finding grand challenges for smart, ambitious people that are more interesting than taking over/destroying the world. The more timid and risk-averse our civilization becomes, the more dangerous the marginalized “mad geniuses” become. The only thing I’ve come up with myself is space exploration, but that seems to have gone out of fashion among the young & the brilliant. :(

But with or without new frontiers and great goals for such people, I fear things are going to end badly, and this problem of preventing an unfriendly AI is essentially unsolvable. I see the Singularity as almost a cosmic fact like supernovas and black holes that we are just going to have to try to adapt to, but probably won’t survive.

I’m kind of ambivalent on this issue myself, because as a Cosmist I realize that the world is going to end anyway, and in a universe this vast and meaningless, in which my ego is all I have, why shouldn’t I try for godhood, and if things go badly at least I’ll get to watch the world being destroyed? It beats being a nobody and living a boring life doesn’t it?

Two things:

1. I always understood power as “power to create”, not “power to destroy”. A “powerful” dictator that becomes famous because he destroys much is not “powerful” from my viewpoint. John said:

It seems people are even further from agreeing on what it means to be good than on what it means to be rational.

but the ethics of the most popular religions, for example, are remarkably similar, they all encompass a basic respect for life, and demand its protection. Where ever this comes from, it seems to be widespread, I don’t think that children learn this from their parents, it seems to be inbred. This is the reason why this topic is of central importance for AI, because since we do not know how to teach ethics to ourself, and don’t need to, we certainly don’t know (yet) how to teach it to an artificial AI.

2. I was a very bored teenager, but I was lucky: I grew up in a society with fascinating universities and the freedom and opportunity to study there. In a pause during my first calculus class, the professor said:

Just imagine, if you care about mathematics, you’ll never be bored again!

I agree and I wholeheartedly hope that others experience the same (maybe not through mathematics).

It seems people are even further from agreeing on what it means to be good than on what it means to be rational.

the ethics of the most popular religions, for example, are remarkably similar, they all encompass a basic respect for life, and demand its protection. Where ever this comes from, it seems to be widespread, I don’t think that children learn this from their parents, it seems to be inbred.

I agree that there’s a remarkable amount of agreement about ethics: enough to make it not completely laughable that we might someday agree on what it means to program an artificial intelligence to be ‘good’.

But there’s also a significant amount of disagreement about ethics, especially when it comes to formalizing ethics as a set of principles. And that’s what I was talking about.

For example: will our friendly AI use ethics based on virtue ethics, deontology, or consequentialism… or none of the above, or some hybrid? Among philosophers, each of these approaches has strong adherents. Ordinary people in ordinary life seem to use some complicated mixture.

It would be nice if attempts to create friendly AI led to greater understanding of ethics as well as rationality. That would be an incredibly important spinoff, even if we never built the darned machines!

… if you care about mathematics, you’ll never be bored again!

Digressing somewhat: when I was a teenager I was bored a lot even though I cared about mathematics. Quite often I was really miserable with boredom. In retrospect there were two main reasons. First, I was lonely, and no matter how exciting math is, it’s not a good substitute for human contact. Second, I wasn’t good enough at math. I could learn stuff pretty quickly, but I wasn’t yet able to invent good ideas of my own — and that was pretty frustrating.

Later on I solved both those problems, and I stopped being bored. Now I’m essentially never bored. Or more precisely: I’m only bored until I decide to start working on one of the many projects I’ve got going.

Florifulgurator wrote:

Once you start gardening, you’ll never get bored again…

I only got interested in gardening when I was around 40. First I had to buy a house and some land. Then I had to decide that I shouldn’t be spending as much time as possible on mathematics and physics. After a while of taking gardening seriously, I found it got more and more interesting. It takes a while to see plants actually growing, and then they become quite fascinating.

The Cosmist wrote:

The only thing I’ve come up with myself is space exploration, but that seems to have gone out of fashion among the young & the brilliant.

I haven’t tried it, but I suspect that space exploration is vastly more boring than either mathematics or gardening. Space, after all, is mainly vast expanses of extremely dilute hydrogen — not my idea of fun. With sufficient patience one might find very interesting things. But in mathematics, I find astounding new things every day, and I don’t even need to go anywhere: I can do it sitting in the bathtub, even! Admittedly, it took a couple of decades of hard work to reach this point.

I spent a lot of time every day reading about math, thinking about math, and talking to people about math. I spent a lot of time taking math classes and working hard in them. I spent a lot of time in good university libraries, trying to read all the math books: most of them were too hard at first, but I didn’t give up. I kept (and keep) a series of notebooks in which I wrote down all my thoughts and calculations.

I spent a lot of time trying to think of math as a unified whole. It’s crucial to focus on tiny details a lot of the time, but it’s also crucial to think a bit more vaguely about the ‘big picture’ a lot of the time, without getting bogged down in details.

Eventually, after a decade or two, math started making more and more sense, and a lot of things became obvious… including things that weren’t obvious to other people! The ‘tao of mathematics’ started becoming clear.

I did a lot of this before the internet existed… but now if I were trying to get up to speed I’d do all this and read and discuss math a lot on math blogs and Mathoverflow, trying to always learn more than I argue. Also, I’d read lots of papers on the arXiv.

But while the internet is wonderful, it is still no substitute for spending hours in a good university math library, trying to understand all the books, sometimes reading one in detail, sometimes skimming a dozen in an hour.

It’s also great to work on problems and brainstorm new ideas with a good, smart friend. Two heads are better than one.

Good luck! Just as water can slowly carve the landscape into fantastic rock formations, prolonged intelligent thought can do amazing things.

Brilliant comments by Yudkowsky. His observation that ambitious people would rather destroy the world than be unimportant really gets to the heart of the problem.

I think he was trying to say that certain ambitious people are like this. He said:

Now these people of whom I speak are not top-notch minds, not in the class of the top people in mainstream AI, like say Peter Norvig (to name someone I’ve had the honor of meeting personally).

I doubt that Norvig and others in mainstream AI lack ambition. On the hand, Yudkowsky says:

In fact all the people I have met who think they are going to win eternal fame through their AI projects have been like this…

which must mean that Norvig doesn’t think he’s going to “win eternal fame through his AI project”.

Anyway, I think there’s a sane kind of ambition that recognizes that being personally ‘important’ is really rather minor in the grand scheme of things, and also an insane kind of ambition that would be happier destroying the universe than accepting this fact.

“Don’t worry, things will get better. … We’ll be able to suck carbon dioxide from the atmosphere using nanotechnology …”

I think predictions that nanotechnology will let us do ‘X’ are merely a silly way of saying we have no clue how to do ‘X’. We do, however, know a simple, low-energy way of sucking carbon dioxide from the atmosphere.

hey, we could kill two birds with one stone, because we could use the labor of those geniuses on Wall Street to break up the rocks, sentence them to 20 years of rock hockey, which gets both them and excess CO2 out of circulation…

Robots’ friendliness towards humans is but one component of robot ethics. For just avoiding apocalyptic AI, active benevolence is overkill, and simple principles such as “first do no harm” and “don’t bite the hand that feeds you” are as effective as hardwiring “honor thy father and thy mother”. Overactive benevolence has its own problems, as in several science fiction stories about coddled humans in protective custody.

But “do no harm” is not the only principle. For example “seek out new life and new civilizations” could be another. And yes they are simple separately; the fun is in trying to satisfy multiple constraints.

Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.

This is true as stated but ignores an important issue which is there is feedback between more mundane current events and the eventual potential extinction of the humane race. For example, the United States’ involvement in Libya has a (small) influence on existential risk (I don’t have an opinion as to what sort). Any impact on human society impact due to global warming has some influence on existential risk.

Eliezer’s points about comparative advantage and of existential risk in principle dominating all other considerations are valid, important, and well-made, but passing from principle to practice is very murky in the complex human world that we live in.

Eliezer Yudkowsky says:“This means that you build a sufficiently powerful AI to do, not what people say they want, or even what people actually want, but what people would decide they wanted the AI to do, if they had all of the AI’s information, could think about for as long a subjective time as the AI, knew as much as the AI did about the real factors at work in their own psychology, and had no failures of self-control.

There’s a lot of important reasons why you would want to do exactly that”

And there is a reason why you would want to avoid it at all costs. The reason is called freedom, a central concept to our traditional way of life. I’ll explain.

The first thing to note is there’s nothing new about AI. We have several thousand years of experience on how to deal with it. For governments (and all our institutions including universities or private corporations) are nothing but ways to organize people to do something super-human. What could never be accomplished by any single individual, is an easy task for these organizations.

Whenever intelligence (problem solving capability) is needed, they simply hire the right persons with relevant expertise and sufficient brain power (while breaking up the task to manageable chunks).

Now, these institutions are of course dangerous, they are never friendly and have no built in ethics whatsoever. If there is anything to learn from history it’s there is nothing more destructive than a government gone berserk. In fact far more people were forced to live miserable lives or died painful deaths from this kind of conflict than from any natural disaster.

On the other hand these institutions are indispensable for our well being. When they collapse (like at the end of the Western Roman Empire or in present day Somalia), human population is at great peril.

Therefore a way had to be found to maintain them while letting people to have control. People, who do have ethics (even if it is imperfect and unexplained). Unfortunately individual guys, even smart ones, lack the capacity to process all the information that would be required to exercise such control. The solution itself is an old one, it is called checks and balances.

That is, the bulk of control is done by other, independent institutions and people only have to use them as levers to have their way.

The same principle goes for AI. There’s no need to formulate a perfect ethics to be implemented, one just needs multiple independent AIs to cross-check answers and arrive at a reasonable (and ethical and responsible) decision.

To see the deep misunderstanding in its entirety behind your line of reasoning, I give a simple translation here.

“This means that you build a sufficiently powerful Party to do, not what people say they want, or even what people actually want, but what people would decide they wanted the Party to do, if they had all of the Party’s information, could think about for as long a subjective time as the Party, knew as much as the Party did about the real factors at work in their own psychology, and had no failures of self-control”

This is the Bolshevik programme. It was tried (for seven decades over a wide area) and failed, with number of victims between forty and a hundred million.

I don’t even think an AI would be able to act as an agent in itself. Intelligence is nothing more than problem solving capability. If an entity is able to do that, that’s just a specific ability, which does not automatically imply other, independent abilities like desires, volition or lust for power.

These latter faculties are not in short supply in any human population, so there will be no incentive to develop artificial surrogates for them. The case with problem solving is just the opposite, as at any level of intelligence it is much easier (requires less logical depth) to create problems than to solve them.

Therefore it is not the case we are at danger of independent acts of a would-be AI, but people who would use it as an instrument of power. If we are not alert, one could do so unchecked. So great care should be taken that access to AI would not be restricted to small groups, but control should be distributed according to time tested principles of (constitutional) politics and (market) competition.

Just saying “implement checks and balances” is underspecified, though, and I can see a lot of ways in which it can go wrong.

Trivially, if there are two AIs running the same code (and inputs), they’ll always agree on everything, so you get no additional measure of safety.

If you have two AIs and their code differs arbitrarily, and you restrict actions to only doing what they both agree on, you’ve narrowed down the space of possible actions, but not necessarily in any way which is safer to humans. It would be like asking a bunch of people who’ve never seen or heard of the Statue of Liberty to design it, and structurally engineer it and so on, and then only build what they agree on — and expecting to actually get a copy of the Statue of Liberty. As opposed to some incoherent mess.

Lastly, if you put humans (or, by extension, stupider AIs) in the checks-and-balances loop, there seem to be two possibilities. If you let the AI make suggestions, a sufficiently smart AI will figure out how to fool the humans into accepting its proposal and it will achieve its own unfriendly goals anyway, regardless of any checks. If you don’t let any party in the checks-and-balances loop make suggestions, then you’re restricted to ideas which a human can come up with, and you’ve lost the benefit of the smarter AI in the first place.

So that’s why it doesn’t seem to me that checks and balances actually achieves anything useful.

Here’s a good “friendly AI” question. Say the recently announced success of a new MIT developed technology was done using a robot with “friendly AI” programming, for deciphering some of nature’s inscrutable chemical codes and operation sequences.

How does the AI robot then know to ask if artificial leaves would fit the image of the person asking for them, like create habitat needed to maintain biodiversity, say, or serve the other roles of “things called leaves” in the environment? The “specification for leaves” as a source of hydrogen fuel can be fairly narrowly defined, but it would be impossible to write a performance specification in relation to responses from undefined and independent parts of an open environment. So any such performance requirements would naturally be left out.

How would the robot know that its problem solving instructions are incomplete, then? It’s quite likely the person writing the instructions won’t have understood what question to ask to do that, nor even have a way to write a specification to convey their own actual intent. What’s a robot to do?

In business an employer can just blame their employees for not understanding that they needed to get the right answer, whether the boss had asked the right question or not, expecting them to treat their instructions as questions about what to do as a natural part of their jobs. Could we ask “friendly AI” bots to do the same?

While friendliness and intelligence are not mutually exclusive, intelligence that we think of in the human sense can never be made to depend on friendliness, whereby if a friendly action cannot be found an intelligent agent will be unable to choose an unfriendly option.

Yes, that’s close to what I was asking, that an intelligent response sometimes needs to be able to recognize that the question asked was stupid (even if “friendly” in not making a big issue of it).

So, could a hypothetical FAI robot be expected to see how to correct the question they were asked, or would its master’s misunderstanding of the circumstance and asking the wrong question be misleading? It seems to me that would require a kind of holistic awareness of complex circumstances it is hard to imagine being definable for coding into a computer.

How would a robot understand that it’s ‘master’ misunderstood the problem in making its request for a solution in the first place, as employees so often need to do for their employers?

P.S. Just to suggest where I would go with that, as I did not leave it clear, I’d ask if we’ve exhausted the traditional relationship between man and tools. Maybe the achievable end for AI is not to program robots to understand deep questions and give insightful and honest answers, replacing those jobs only humans are known for so far. Maybe in the traditional way the technology could be an aid rather than a replacement for people, with it’s task being to enhance human learning.

A robot might direct human attention to where the robot observes emergent systems developing, and ask the human if that would add new dimensions to the problems they are concerned with, for example.

I really enjoyed this series, it was an interview that I was really thrilled to see – I’ve respected both John Baez and Eliezer Yudkowsky for a long time, it’s great to see this meeting of minds happen!

John, it would be great if you could follow up at some point with your thoughts and responses to what Eliezer said here. He’s got a pretty firm view that environmentalism would be a waste of your talents, and it’s obvious where he’d like to see you turn your thoughts instead. I’m especially curious to hear what you think of his argument that there are already millions of bright people working for the environment, so your personal contribution wouldn’t be as important as it would be in a less crowded field.

Oh good, I’m glad someone is asking me about my response to Eliezer’s points. Is it okay if I post your 2nd paragraph here and my response as a separate blog entry? It’s a big question.

If I could use your last name in this blog entry, that would be great. (For one thing, there’s another guy named Eric who hangs out here.) But if you don’t want me to mention your last name, that’s fine too.

Quote away, my full name is Eric Jordan. I very much look forward to hearing your thoughts; also, don’t feel constrained to address only the question that I asked, I’m really interested in what you’re thinking about any of this stuff in general.

Okay, great. I’ve got a blog post I want to do now but yours will be the next one. You raised an issue that’s been on my mind a lot, so the only problem is organizing my thoughts and saying something coherent.

there are already millions of bright people working for the environment, so your personal contribution wouldn’t be as important as it would be in a less crowded field.

Are there more people working for the environment than on AI? It clearly depends on where you draw the line: is someone who’s doing phylogenetic classification purely for scientific interest working for the environment? Are the people working on tweaking the web search algorithms at Google working on artificial intelligence? IMO, if one thinks that AI may turn out to be achieved from lots of incremental progress rather than only from projects aiming to develop a complete AI, then it’s not clear that AI is less “crowded” than “environmental research”.

Are there more people working for the environment than on AI? It clearly depends on where you draw the line

Right, “AI” is a very overloaded term at the moment, and in the broadest sense it might include anyone working on any sort of machine learning application.

But I think if you restrict attention to full frontal assaults on general intelligence (as opposed to narrow AI) or to people that are directly addressing the problem of friendliness (which, I gather, is where Eliezer would prefer to see more great minds becoming active), the field is much thinner. I can only think of a small handful of people that are seriously taking on either problem – my impression of the consensus in the mainstream these days seems to be that there’s no “kernel of intelligence” that can be discovered in any meaningful way, so people have mostly moved on to more practical and restricted tasks, and I’d imagine most people probably consider work on Friendly AI to be about as useful as devising safety harnesses for interstellar transportation pods: the danger is (seems, in the case of AI?) so far in the future that it’s a waste of time to say anything about it now. That’s the response from most geeks whenever one of these posts about Friendliness hits any popular news site, at least.

Don’t get me wrong, I definitely have some sympathy for those feelings about Friendly AI, too; apart from extensions to decision theory, concrete open problems are hard to find (or at least they’re not collected in any one place and there’s little consensus on what’s worth looking into), making me wonder whether FAI can fully occupy many more people at all, especially if they’re operating outside of the SAIA proper.

Intelligence, artificial or otherwise, in itself is not dangerous, it’s neither good or evil, friendly or hostile. It’s just a tool, an ability to solve problems. It can be “dangerous” to some humans when placed in hands of other humans but that danger originates in humans themselves, not the intelligence.

For the runaway self-improving AI scenario to work the AI would have to be equipped with the equivalence of human survival instinct which has nothing to do with intelligence itself, it is simply hardwired into our biology.

It is far from obvious that such a drive can be imparted onto AI in any permanent way if this AI is going to be capable of self-modification. What will stop it from erasing this part of the code? Or for that matter what will stop it from erasing itself? This is an example of the more general problem with self-modifying agents – to achieve anything constructive there has to be some external framework which is off-limits and which sets the course, otherwise all you get is garbage. For us our biological instincts serve this purpose. But imagine for a moment what would happen if we could freely modify them.

But even assuming that a survival instinct equivalent can be somehow imparted to create a “motivated” AI, such an AI wouldn’t pose any real threat to humanity until it were able to control all the real world aspects of its functioning: resource extraction, energy generation, manufacturing, defense and so on. It’s certainly not technologically feasible now or in the near future but even if it were it is extremely unlikely humans would allow things to get that far out of control.

But the biggest problem with this scenario lies with self-improvement. For a “motivated” AI to self improve it would have to have ways of either testing or predicting the fitness of next iteration. Prediction is only possible if the next iteration is to be simpler, but simplification can only go so far. The viable long-term improvement would require rising the complexity, but a simpler AI cannot simulate more complex AI and therefore cannot predict its fitness. The only way to improve fitness while rising complexity is by trial and error. But there is a problem – most changes decrease fitness. So a single AI cannot just evolve itself, evolution requires millions of independent agents testing different modifications, each with a full access to the environment for which its fitness is to improve, and in this case it means each agent with full control over real world assets I mentioned before: resource extraction, energy generation, manufacturing, defense and so on. This is pure science fiction now and for the foreseeable future.

I think your arguments are heuristic to the point of being erroneous. To pick one of your points, you say

evolution requires millions of independent agents testing different modifications, each with a full access to the environment for which its fitness is to improve, and in this case it means each agent with full control over real world assets I mentioned before: …

To pick an incontrovertaile real-world example, consider the Microsoft Kinect. The goal there was to produce a simple-enough-to-mass-produce a classifier for human body parts from a scanner. By your argument, that would require each “proposed modification” to have access to a human being doing movements in a front of a scanner. However, this paper points out that it was possible to obtain the desired goal, namely an optimised classifier, using a cluster of machines with access to training and test data. (This is a much, much, much simpler task than refining an artificial general intelligence, but it shows that you can’t rely on vague high-level reasoning to figure out what’s possible.) This is not some theoretical argument; this is bread-and-butter stuff people are doing today.

Similar actual examples could be produced to contradict some of your other reasoning. AI might prove to not be possible, but if so I suspect it will due to creating the initial AI being beyond the reach of even technology augmented humans.

Seriously. Evaluating fitness for purpose of such a kinect classifier is trivial.

Evaluating fitness of a sentient agent who is to live in a complex real world environment filled with other sentient beings whose actions cannot be predicted is next to impossible, it can only be attempted after the fact and even then the best you get is a rough estimate since random chance plays a large part.

I’ll disagree with you about the triviality of this: it took a lot of work to get to this point. Likewise, I’m not remotely convinced that “Evaluating fitness of a sentient agent who is to live in a complex real world environment filled with other sentient beings whose actions cannot be predicted is next to impossible”. (I’m sure I’ve read somewhere about a viewpoint where everything is either “trivial” or “impossible”, and things mysteriously move between the two as evidence is pointed out.) It’s certainly hard, but for those who actually roll up their sleeves and work on the problem my opinion is that it’ll turn out to be perfectly feasible to do well enough to do a got enough job that a reasonable amount of learning is possible. I really think that if you took the logic you’re using and try to apply it to various things that are already happening in the world, you’d see things are happening that you claim should be next to impossible.

How To Write Math Here:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.