[...] SIAI's Scary Idea goes way beyond the mere statement that there are risks as well as benefits associated with advanced AGI, and that AGI is a potential existential risk.

[...] Although an intense interest in rationalism is one of the hallmarks of the SIAI community, still I have not yet seen a clear logical argument for the Scary Idea laid out anywhere. (If I'm wrong, please send me the link, and I'll revise this post accordingly. Be aware that I've already at least skimmed everything Eliezer Yudkowsky has written on related topics.)

So if one wants a clear argument for the Scary Idea, one basically has to construct it oneself.

[...] If you put the above points all together, you come up with a heuristic argument for the Scary Idea. Roughly, the argument goes something like: If someone builds an advanced AGI without a provably Friendly architecture, probably it will have a hard takeoff, and then probably this will lead to a superhuman AGI system with an architecture drawn from the vast majority of mind-architectures that are not sufficiently harmonious with the complex, fragile human value system to make humans happy and keep humans around.

Wherever I turn my head around in this world, I see lost causes everywhere. I see Goodhart's law and Campbell's law at loose everywhere. I see insane optimizers everywhere. Political parties that concentrate more on show, pomp and campaign funds than on actual issues. Corporates that seek money to the exclusion of actual creation of value. Governments that seek employment and GDP growth even when those are supported by artificial stimuli and not sustainable patterns of production and trade.

One might argue that none of these systems are actually as intelligent as a well educated human at any given moment in time. But that's the point, isn't it? You're unable to stop sub-human optimizers, how are you going to curb a near human or a super human one?

For me, the scary idea is not so much of an idea as it is an extension of something that is already happening in this world.

Without wanting to weasel out of your request, I honestly believe that Eliezer's Lost purposes post says the point I want to make very well, much better than I can hope to phrase it without putting in some hard work. The only new point I probably made is that these forces are already at loose and it is difficult to curb them.

However, I will make an effort this weekend and see what I can come up with.

Upvoted. Although I believe that one could also see our cultural and political systems as superhuman collective entities undergoing an evolutionary arms race featuring a anthropocentrically weighted utility maximizing selection pressure. There is some evidence for this too, to put it bluntly, we are better off than we have been 100 years ago?

I think the relation between breadth of intelligence and depth of empathy is a subtle issue which none of us fully understands (yet). It's possible that with sufficient real-world intelligence tends to come a sense of connectedness with the universe that militates against squashing other sentiences. But I'm not terribly certain of this, any more than I'm terribly certain of its opposite.

The obvious truth is that mind-design space contains every combination of intelligence and empathy.

Would you say that "The obvious truth is that mind -design space contains every combination of intelligence and rationality"? How about "The obvious truth is that mind -design space contains every combination of intelligence and effectiveness"?

One of my fundamental contentions is that empathy is a requirement for intelligence beyond a certain point because the consequences of lacking it are too severe to overcome.

I can't play positive-sum games with an E. coli. The AGI is missing out on tremendous opportunities if it bypasses positive-sum games of potentially infinite length and utility for a short-term finite gain. This is called time-discounting. In nature, there is a very high correlation (to the point that many call it causation) between increasing intelligence and time-discounting.

Please give an example of why the AGI should co-operate with something that cannot do anything the AGI itself cannot.

2) zero

Right. E. coli don't offer us anything we can't do for ourselves, that we can't just whip up a batch of E. coli for on demand.

The AGI is missing out on tremendous opportunities if it bypasses positive-sum games of potentially infinite length and utility for a short-term finite gain

If I'm a god, what would I need a human for? If I need humans, I can just make some. Better still, I could replace them with something more efficient that doesn't complain or rebel.

The fundamental flaw in your reasoning here is that you keep trying to construct paths through probability space that could support your hypothesis, but ONLY if you had presented some evidence for singling out that hypothesis in the first place!

It's like you're a murder investigator opening up the phonebook to a random place and saying, "well, we haven't ruled out the possibility that this guy did it", and when people quite reasonably point out that there is no connection between that random guy and the murder, you reply, "yeah, but I just called this guy, and he has no alibi." (That is, you're ignoring the fact that a huge number of people in that phonebook will also have no alibi, so your "evidence" isn't actually increasing the expected probability that that guy did it.)

And that's why you're getting so many downvotes: in LW terms, you are failing basic reasoning.

But that is not a shameful thing: any normal human being fails basic reasoning, by default, in exactly the same way. Our brains simply aren't built to do reasoning: they're built to argue, by finding the most persuasive evidence that supports our pre-existing beliefs and hypotheses, rather than trying to find out what is true.

When I first got here, I argued for some of my pet hypotheses in the exact same way, although I was righteously certain that I was not doing such a thing. It took a long time before I really "got" Bayesian reasoning sufficiently to understand what I was doing wrong, and before that, I couldn't have said here what you were doing wrong either.

Please give an example of why the AGI should co-operate with something that cannot do anything the AGI itself cannot.

If the overall price (including time, gaining requisite knowledge, etc) of co-operation is less expensive than the AGI doing it itself, the AGI should co-operate. No?

If I'm a god, what would I need a human for? If I need humans, I can just make some. Better still, I could replace them with something more efficient that doesn't complain or rebel.

How expensive is making humans vs. their utility? Is there something markedly more efficient that won't complain or rebel if you treat it poorly? How efficient/useful could a human be if you treated it well?

There are also useful pseudo-moral arguments of the type of pre-committing to a benevolent strategy so that others (bigger than you) will be benevolent to you.

The fundamental flaw in your reasoning here is that you keep trying to construct paths through probability space that could support your hypothesis, but ONLY if you had presented some evidence for singling out that hypothesis in the first place!

Agreed. So your argument is that I'm not adequately presenting evidence for singling out that hypothesis. That's a useful criticism. Thanks!

And that's why you're getting so many downvotes: in LW terms, you are failing basic reasoning.

I disagree. I believe that I am failing to successfully communicate my reasoning. I understand your arguments perfectly well (and appreciate them) and agree with them if that is what I was trying to do. Since they are not what I'm trying to do -- although they apparently are what I AM doing -- I'm assuming (yes, ASS-U-ME) that I'm failing elsewhere and am currently placing the blame on my communication skills.

Are you willing to accept that premise and see if you can draw any helpful conclusions or give any helpful advice?

And, once again, thank you for already taking the time to give such a detailed thoughtful response.

How expensive is making humans vs. their utility? Is there something markedly more efficient that won't complain or rebel if you treat it poorly?

Yes. The nano bots that you could build out of my dismantled raw materials.There is something humbling to realise that my complete submission and wholehearted support is worth less to a non-friendly AI than my spleen.

Oh, worth much much less than your spleen. It might be a fun exercise to take the numbers from Seth Lloyd and figure out how molecules (optimistically, the volume of a cell or two) your brain is worth.

Utility for what purpose? If we're talking about say, a paperclip maximizer, then its utility for human beings will be measured in paperclip production.

Is there something markedly more efficient that won't complain or rebel if you treat it poorly? How efficient/useful could a human be if you treated it well?

It won't be as efficient as specialized paperclip-production machines will, for the production of paperclips.

Are you willing to accept that premise and see if you can draw any helpful conclusions or give any helpful advice?

Yes, but you're unlikely to be happy with it: read the sequences, or at least the parts of them that deal with reasoning, the use of words, and inferential distances. (For now at least, you can skip the quantum mechanics, AI, and Fun Theory parts.)

At minimum, this will help you understand LW's standards for basic reasoning, and how much higher a bar they are than what constitutes "reasoning" pretty much anywhere else.

If you're reasoning as well as you say, then the material will be a breeze, and you'll be able to make your arguments in terms that the rest of us can understand. Or, if you're not, then you'll probably learn that along the way.

Comparative advantage explains how to make use of inefficient agents, so that ignoring them is a worse option. But if you can convert them into something else, you are no longer comparing the gain from trading with them to indifference of ignoring them, you are comparing the gain from trading with them to the gain from converting them. And if they can be cheaply converted into something much more efficient than they are, converting them is the winning move. This is a move largely not available to the present society, hence its absence is a reasonable assumption for now but one that breaks when you consider indifferent smart AGI.

The law of comparative advantage relies on some implicit assumptions that are not likely to hold between a superintelligence and humans:

The transactions costs must be small enough not to negate the gains from trade. A superintelligence may require more resources to issue a trade request to slow thinking humans and to receive the result, while possibly letting processes idle while waiting for the result, than to just do it itself.

Your trading partner must not have the option of building a more desirable trading partner out of your component parts. A superintelligence could get more productivity of atoms arranged as an extension of itself than atoms arranged as humans. (ETA: See Nesov's comment.)

And a sufficiently clever human should realize that clever humans can and do routinely increase the efficiencies of their industry enough to shift the comparative advantage.

I'm not sure I understand what "shift the comparative advantage" could mean, and I have no idea why this is supposed to be a response to my point.

Maybe I didn't make my point clearly enough. My contention is that even if an AI is better at absolutely everything than a human being, it could still be better off trading with human beings for certain goods, for the simple reason that it can't do everything, and in such a scenario both human beings and the AI would get gains from trade.

As Nesov points out, if the AI has the option of, say, converting human beings into computational substrate and using them to simulate new versions of itself, then this ceases to be relevant.

And a sufficiently clever human should realize that clever humans can and do routinely increase the efficiencies of their industry enough to shift the comparative advantage.

I don't understand what are you arguing for. That people become better off doing something different, doesn't necessarily imply that they become obsolete, or even that they can't continue doing the less-efficient thing.

Why all the focus on psychopaths? It could be said that certain forms of autism are equally empathy-blinded, and yet people along that portion of the spectrum are often hugely helpful to the human race, and get along just fine with the more neurotypical.

One of my fundamental contentions is that empathy is a requirement for intelligence beyond a certain point because the consequences of lacking it are too severe to overcome.

Now if you had suggested that intelligence cannot evolve beyond a certain point unless accompanied by empathy ... that would be another matter. I could easily be convinced that a social animal requires empathy almost as much as it requires eyesight, and that non-social animals cannot become very intelligent because they would never develop language.

But I see no reason to think that an evolved intelligence would have empathy for entities with whom it had no social interactions during its evolutionary history. And no a priori reason to expect any kind of empathy at all in an engineered intelligence.

Which brings up an interesting thought. Perhaps human-level AI already exists. But we don't realize it because we have no empathy for AIs.

There is a large, continuous spectrum between making an AI and hoping it works out okay, and waiting for a formal proof of friendliness. Now, I don't think a complete proof is feasible; we've never managed a formal proof for anything close to that level of complexity, and the proof would be as likely to contain bugs as the program would. However, that doesn't mean we shouldn't push in that direction. Current practice in AI research seems to be to publish everything and take no safety precautions whatsoever, and that is definitely not good.

Suppose an AGI is created, initially not very smart but capable of rapid improvement, either with further development by humans or by giving it computing resources and letting it self-improve.Suppose, further, that its creators publish the source code, or allow it to be leaked or stolen.

AI improvement will probably proceed in a series of steps: the AI designs a successor, spends some time inspecting it to make sure the successor has the same values, then hands over control, then repeat. At each stage, the same tradeoff between speed and safety applies: more time spent verifying the successor means a lower probability of error, but a higher probability that other bad things will happen in the mean time.

And here's where there's a real problem. If there's only one AI improving itself, then it can proceed slowly, knowing that the probability of an asteroid strike, nuclear war or other existential risk is reasonably low. But if there are several AIs doing this at once, then whichever one proceeds least cautiously wins. That situation creates a higher risk of paperclippers, as compared to if there were only one AI developed in secret.

Current practice in AI research seems to be to publish everything and take no safety precautions whatsoever, and that is definitely not good.

Most of the compaines involved (e.g. Google, James Harris Simons) publish little or nothing relating so their code in this area publicly - and few know what safeguards they employ. The government security agencies potentially involved (e.g. the NSA) are even more secretive.

There is a large, continuous spectrum between making an AI and hoping it works out okay, and waiting for a formal proof of friendliness.

Exactly this!

I think there is a U-shaped response curve to risk versus rigor. Too little rigor ensures disaster, but too much rigor ensures a low rigor alternative is completed first.

When discussing the correct course of action, I think it is critical to consider not just probability of success but also time to success. So far as I've seen arguments in favor of SIAI's course of action have completely ignored this essential aspect of the decision problem.

With regards to teaching an AI to care: what you can teach a mind depends on the mind. The best examples come from human beings: for hundreds of years many (though not all) parents have taught their children that it is wrong to have sex before marriage, a precept that many people break even when they think they shouldn't and feel bad about it . And that's with our built in desires for social acceptance and hardware for propositional morality. For another example, you can't train tigers to care about their handlers. No matter how much time you spend with them and care for them, they sometimes bite off arms just because they are hungry. I understand most big cats are like this.

It's quite true that nobody plans to build a system with no concern for human life, but it's also true that many people assume Friendliness is easy.

Human value is fragile as well as complex, so if you create an AGI with a roughly-human-like value system, then this may not be good enough, and it is likely to rapidly diverge into something with little or no respect for human values

... that doesn't seem quite right. The main problem with values being fragile isn't that a "roughly-human-like value system" might diverge rapidly; it's that properly implementing a "roughly-human-like value system" is actually quite hard and most AGI programmer seem to underestimate it's complexity, and go for "hacky" solutions, which I find somewhat scary.

Ben seems aware of this, and later goes on to say:

This is related to the point Eliezer Yudkowsky makes that "value is complex" -- actually, human value is not only complex, it's nebulous and fuzzy and ever-shifting, and humans largely grok it by implicit procedural, empathic and episodic knowledge rather than explicit declarative or linguistic knowledge.

... which seems to be one of the reasons to pay extra attention to it (and this also seems to be a reason given by Eliezer, whereas Ben almost presents it as a counterpoint to Eliezer).

Human evaluation of human values under specific instances is everything that Ben says it is (complex, nebulous, fuzzy, ever-shifting, and grokked by implicit rather than explicit knowledge).

On the other-hand, evaluation of a points in the Mandelbroit set by a deterministically moving entity that is susceptible to color-illusions is even more complex, nebulous, fuzzy, and ever-shifting to the extent that it probably can't be grokked at all. Yet, it is generated from two very simple formulae (the second being the deterministic movement of the entity).

Eliezer has provided absolutely NO rational arguments (much less proof) that the core of Friendly is complex at all. Further, paying attention to the fact that ethical mandates within the obviously complex real world (particularly when viewed through the biased eyes and fallible beings) are comprehensible at all would seem an indication that maybe there are just a small number of simple laws underlying them (or maybe only one -- see my comment on Ben's post cross-posted at http://becominggaia.wordpress.com/2010/10/30/ben-goertzel-the-singularity-institutes-scary-idea/ for easy access).

For me, the oddest thing about Goertzels' article is his claim that SIAI's arguments are so unclear that he had to construct it himself. The way he describes the argument is completely congruent with what I've been reading here.

In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.

If Goertzel's claim that "SIAI's arguments are so unclear that he had to construct it himself" can't be disproven by the simple expedient of posting a single link to an immediately available well-structured top-down argument then the SIAI should regard this as an obvious high-priority, high-value task. If it can be proven by such a link, then that link needs to be more highly advertised since it seems that none of us are aware of it.

But of course the argument is a little large to entirely set out in one paper; the next nearest thing is What I Think, If Not Why and the title shows in what way that's not what Goertzel was looking for.

Artificial Intelligence as a Positive and Negative Factor in Global Risk

44 pages. I don't see anything much like the argument being asked for. The lack of an index doesn't help. The nearest thing I could find was this:

It may be tempting to ignore Artificial Intelligence because, of all the global risks discussed in this book, AI is hardest to discuss. We cannot consult actuarial statistics to assign small annual probabilities of catastrophe, as with asteroid strikes. We cannot use calculations from a precise, precisely confirmed model to rule out events or place infinitesimal upper bounds on their probability, as with proposed physics disasters. But this makes AI catastrophes more worrisome, not less.

He also claims that intelligence could increase rapidly with a "dominant" probabilty.

I cannot perform a precise calculation using a precisely confirmed theory, but my current opinion is that sharp jumps in intelligence are possible, likely, and constitute the dominant probability.

Is this an official position in the first place? It seems to me that they want to give the impression that - without their efforts - the END IS NIGH - without committing to any particular probability estimate - which would then become the target of critics.

Halloween update: It's been a while now, and I think the response has been poor. I think this means there is no such document (which explains Ben's attempted reconstruction). It isn't clear to me that producing such a document is a "high-priority task" - since it isn't clear that the thesis is actually correct - or that the SIAI folks actually believe it.

Most of the participants here seem to be falling back on: even if it is unlikely, it could happen, and it would be devastating, so therefore we should care a lot - which seems to be a less unreasonable and more defensible position.

It isn't clear to me that producing such a document is a "high-priority task" - since it isn't clear that the thesis is actually correct - or that the SIAI folks actually believe it.

Most of the participants here seem to be falling back on: even if it is unlikely, it could happen, and it would be devastating, so therefore we should care a lot - which seems to be a less unreasonable and more defensible position.

You lost me at that sharp swerve in the middle. With probabilities attached to the scary idea, it is an absolutely meaningless concept. What if its probability were 1 / 3^^^3, should we still care then? I could think of a trillion scary things that could happen. But without realistic estimates of how likely it is to happen, what does it matter?

The default case of FOOM is an unFriendly AI, built by researchers with shallow insights. This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever).

...however, it is not terribly clear what being "the default case" is actually supposed to mean.

Seems plausible to interpret "default case" as meaning "the case that will most probably occur unless steps are specifically taken to avoid it".

For example, the default case of knocking down a beehive is that you'll get stung; you avoid that default case by specifically anticipating it and taking countermeasures (i.e. wearing a bee-keeping suit).

So: it seems as though the "default case" of a software company shipping an application would be that it crashes, or goes into an infinite loop - since that's what happens unless steps are specifically taken to avoid it.

The term "the default case" seems to be a way of making the point without being specific enough to attract the attention of critics

So: it seems as though the "default case" of a software company shipping an application would be that it crashes, or goes into an infinite loop - since that's what happens unless steps are specifically taken to avoid it.

Not quite. The "default case" of a software company shipping an application is that there will definitely be bugs in the parts of the software they have not specifically and sufficiently tested... where "bugs" can mean anything from crashes or loops, to data corruption.

The analogy here -- and it's so direct and obvious a relationship that it's a stretch to even call it an analogy! -- is that if you haven't specifically tested your self-improving AGI for it, there are likely to be bugs in the "not killing us all" parts.

I repeat: we already know that untested scenarios nearly always have bugs, because human beings are bad at predicting what complex programs will do, outside of the specific scenarios they've envisioned.

And we are spectacularly bad at this, even for crap like accounting software. It is hubris verging on sheer insanity to assume that humans will be able to (by default) write a self-improving AGI that has to be bug-free from the moment it is first run.

How do you plan to fix the bugs in its bug-fixing ability, before the bug-fixing ability is applied to fixing bugs in the "don't kill everyone" routine? ;-)

More to the point, how do you know that you and the machine have the same definition of "bug"? That seems to me like the fundamental danger of self-improving AGI: if you don't agree with it on what counts as a "bug", then you're screwed.

(Relevant SF example: a short story in which the AI ship -- also the story's narrator -- explains how she corrected her creator's all-too-human error: he said their goal was to reach the stars, and yet for some reason, he set their course to land on a planet. Silly human!)

What about a "controlled ascent"?

How would that be the default case, if you're explicitly taking precautions?

So: it seems as though the "default case" of a software company shipping an application would be that it crashes, or goes into an infinite loop - since that's what happens unless steps are specifically taken to avoid it.

The default case for a lot of shipped application isn't to do what it was designed to do, i.e. satisfy the target customer's needs. Even when you ignore the bugs, often the target customer doesn't understand how it works, or it's missing a few key features, or it's interface is clunky, or no-one actually needs it, or it's made confusing with too many features nobody cares about, etc. - a lot of applications (and websites) suck, or at least, the first released version does.

We don't always see that extent because the set of software we use is heavily biased towards the "actually usable" subset, for obvious reasons.

For example, see the debatetools that have been discussed here and are never used by anybody for real debate.

I think your analogy is apt. It's a similar argument for FAI; just as a software company should not ship a product without first running it through some basic tests to make sure it doesn't crash, so an AI developer should not turn on their (edit: potentially-FOOMing) AI unless they're first sure it is Friendly.

If the "default case" is that your next operating system upgrade will crash your computer or loop forever, then maybe you have something to worry about - and you should probably do an extensive backup, with this special backup software I am selling.

If the "default case" is that your next operating system upgrade will crash your computer or loop forever...

It would certainly be the default case for untested operating system upgrades. Whenever I write a program, even a small program, it usually doesn't work the first time I run it; there's some mistake I made and have to go back and fix. I would never ship software that I hadn't at least ran on my own to make sure it does what it's supposed to.

The problem with that when it comes to AI research, according to singulitarians, is that there's no safe way to do a test run of potentially-FOOMing software; mistakes that could lead to unFriendliness have to be found in some way that doesn't involve running the code, even in a test environment.

In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.

That it's impossible to find a course of action that is knowably good, is not an argument for the goodness of pursuing a course of action that isn't known to be good.

Certainly, but it is an argument for the goodness of pursuing a course of action that is known to have a chance of being good.

There are roughly two types of options:

1) A plan that, if successful, will yield something good with 100% certainty, but has essentially 0% chance of succeeding to begin with.

2) A plan that, if successful, may or may not be good, with a non-zero chance of success.

Clearly type 2 is a much, much larger class, and includes plans not worth pursuing. But it may include plans worth pursuing as well. If Friendly AI is as hard as everyone makes it out to be, I'm baffled that type 2 plans aren't given more exposure. Indeed, it should be the default, with reliance on a type 1 plan a fall back given more weight only with extraordinary evidence that all type 2 plans are as assuredly dangerous as FAI is impossible.

(1) In any case, his argument that it may not be possible to have provable Friendliness and it makes more sense to take an incremental approach to AGI than to not do AGI until Friendliness is proven seems reasonable.

That it's impossible to find a course of action that is knowably good, is not an argument for the goodness of pursuing a course of action that isn't known to be good.

Certainly, but it is an argument for
(2) the goodness of pursuing a course of action that is known to have a chance of being good.

You point out a correct statement (2) for which the incorrect argument (1) apparently argues. This doesn't argue for correctness of the argument (1).

(A course of action that is known to have a chance of being good is already known to be good, in proportion to that chance (unless it's also known to have a sufficient chance of being sufficiently bad). For AI to be Friendly doesn't require absolute certainty in its goodness, but beware the fallacy of gray.)

Goertzel's article seems basically reasonable to me. There were some mis-statements that I can excuse at the very end, because by that point part of his argument was that certain kinds of hyperbole came up over and over and his text was mimicing the form of the hyperbolic arguments even as it criticized them. The grandmother line and IQ obsessed aliens spring to mind :-P

Given his summary of the "Scary AGI Thesis"...

If someone builds an advanced AGI without a provably Friendly architecture, probably it will have a hard takeoff, and then probably this will lead to a superhuman AGI system with an architecture drawn from the vast majority of mind-architectures that are not sufficiently harmonious with the complex, fragile human value system to make humans happy and keep humans around.

...it seemed like it would make sense to track down past discussions here where our discussions may have been implicitly shaped by the thesis. Here are two articles where the issue of concrete programming projects came up, spawning interesting discussions that seemed to have the Scary Thesis as a subtext:

In June 2009, cousin_it wrote Let's reimplement EURISKO!, and some of the discussion got into AGI direction meta-strategy. The highest top level comment is Eliezer bringing up issues of caution.

In January 2010, StuartArmstrong wrote Advice for AI makers and again Eliezer brings up caution to massive approval. This one is particularly interesting because Wei_Dai has a +20 child comment off of that talking about Goertzel's company webmind... and the anthropic argument.

At the same time, in the course of searching, the "other side" also came up, which I think speaks well for the community :-)

Three days after the Eurisko article was posted, rwallace wrote Why safety is not safe which discussed the issue in the context of (1) historical patterns of competition versus historical patterns of politically managed non-innovation and (2) the fact that the "human trajectory" simply doesn't appear to be long term stable such that swift innovation may be the only thing that prevents a sort of "default outcome" of human extinction.

Of course, even earlier, Eliezer was talking about the general subject of novel research as something that can prevent or cause tragedy, as with the July 2008 article Should We Ban Physics? (although he did his normal thing with an off-handed claim that it was basically impossible to actually prevent innovation).

Yes, you may argue: the Scary Idea hasn't been rigorously shown to be true… but what if it IS true?

OK but ... pointing out that something scary is possible, is a very different thing from having an argument that it's likely.

The Scary Idea is certainly something to keep in mind, but there are also many other risks to keep in mind, some much more definite and palpable.

[...]

Also, there are always possibilities like: the alien race that is watching us and waiting for us to achieve an IQ of 333, at which point it will swoop down upon us and eat us, or merge with us. We can't rule this out via any formal proof, and we can't meaningfully estimate the odds of it either. Yes, this sounds science-fictional and outlandish; but is it really more outlandish and speculative than the Scary Idea?

The issue with potential risks posed by unfriendly AI are numerous. The only organisation that takes those issues serious is the SIAI, as its name already implies. But I believe most people simply don't see a difference between the SIAI and one or a few highly intelligent people telling them that a particle collider could destroy the world while all experts working directly on it claim there's no risk. Now I think I understand the argument that if the whole world is at stake it does outweigh the low probability of the event. But does it? I think it is completely justified to have at least one organisation working on FAI, but is the risk as serious as portrayed and perceived within the SIAI? Right now if I had to hazard a guess I'd say that it will probably be a gradual development of many exponential growth phases. That is, we'll have this conceptual revolution and optimize it very rapidly. Then the next revolution will be necessary. Sure, I might be wrong there, as the plateau argument of self-improvement recursion might hold. But even if that is true, I think we'll need at least two paradigm-shattering conceptual revolutions before we get there. But what does that mean though? How quickly can such revolutions happen? I'm guessing that this could take a long time, if it isn't completely impossible. That is, if we are not the equivalent of Universal Turing Machine of abstract reasoning. Just imagine we are merely better chimps. Maybe it doesn't matter if a billion humans does science for a million years, we won't come up with the AI equivalent of Shakespeare's plays. This would mean that we are doomed to evolve slowly, to tweak ourselves incrementally into a posthuman state. Yet, there are also other possibilities, that AGI might for example be a gradual development over many centuries. Human intelligence might turn out to be close to the maximum.

There is so much we do not know yet (http://bit.ly/ckeQo6). Take for example a constrained well-understood domain like Go. AI does still perform awfully at Go. Or take P vs. NP.:

P vs. NP is an absolutely enormous problem, and one way of seeing that is that there are already vastly, vastly easier questions that would be implied by P not equal to NP but that we already don’t know how to answer. So basically, if someone is claiming to prove P not equal to NP, then they’re sort of jumping 20 or 30 nontrivial steps beyond what we know today. [...] We have very strong reasons to believe that these problems cannot be solved without major — enormous — advances in human knowledge. [...] So in order to prove such a thing, a prerequisite to it is to understand the space of all possible efficient algorithms. That is an unbelievably tall order. So the expectation is that on the way to proving such a thing, we’re going to learn an enormous amount about efficient algorithms, beyond what we already know, and very, very likely discover new algorithms that will likely have applications that we can’t even foresee right now. (http://web.mit.edu/newsoffice/2010/3q-pnp.html).

But that is just my highly uneducated guess which I never seriously contemplated. I believe that for most academics the problem here is mainly about the missing proof of concept. Missing evidence. They are not the kind of people who wait before testing the first nuke because it might ignite the atmosphere. If there's no good evidence, a position supported by years worth of disjunctive lines of reasoning won't convince them either.

The paperclip maximizer (http://wiki.lesswrong.com/wiki/Paperclip_maximizer) scenario needs serious consideration. But given what needs to be done, what insights may be necessary to create something creative that is effective in the real world, it's hard to believe that this is a serious risk. It's similar with the kind of grey goo scenario that nanotechnology might hold. It will likely be a gradual development that once it becomes sophisticated enough to pose a serious risk is also understood and controlled by countermeasures.

I also wonder why we don't see any alien paperclip maximizer's out there? If there are any in the observable universe our FAI will lose anyway since it is far behind in its development.

A major upvote for this. The SIAI should create a sister organization to publicize the logical (and exceptionally) dangerous conclusion to the course that corporations are currently on. We have created powerful, superhuman entities with the sole top-level goal (required by LAW in for-profit corporations) of "Optimize money acquisition and retention". My personal and professional opinion is that this is a far more immediate (and greater) risk than UnFriendly AI).

Companies are probably the number 1 bet for the type of organisation most likely to produce machine intelligence - with number 2 being governments. So, there's a good chance that early machine intelligences will be embedded into the infrastructure of companies. So, these issues are probably linked.

Money is the nearest global equivalent of "utility". Law-abiding maximisation of it does not seem unreasonable. There are some problems where it is difficult to measure and price things, though.

Money is the nearest global equivalent of "utility". Law-abiding maximisation of it does not seem unreasonable.

On the other hand, maximization of money, including accurate terms for expected financial costs of legal penalties, can cause remarkable unreasonable behavior. As was repeated recently "It's hard for the idea of an agent with different terminal values to really sink in", in particular "something that could result in powerful minds that actually don't care about morality". A business that actually behaved as a pure profit maximizer would be such an entity.

Apparently I don't understand what you mean by "serious risk". (Before I pick this apart, by the way, I agree that we should try not to Godwin people -- because I think it doesn't work.)

I consider it likely that AGI will take a long time to develop. A rational species would likely figure out the flaw and take corrective steps by then. But look around you. Nearly all of us seem to agree, if you look at what we actually want according to our actions, that we should try to prevent an asteroid strike that might destroy humanity. As far as I can tell we haven't started yet. No doubt you can think of other examples: the evidence says that if we put off FAI theory 'until we need it', we could easily put it off longer than that.

A recent paper showed that 'Striatal Volume Predicts Level of Video Game Skill Acquisition'. A valid inference would be that an AGI with the computational equivalent of a higher striatal volume would possess a superior cognitive flexibility, at least when it comes to gaming. But what could it accomplish? I'm playing a game called Trackmania, it is a arcade racing game. The top players are so close to the ideal line and therefore the fastest time that a superhuman AI could indeed beat them but only by a few milliseconds. Each millisecond less might demand a order of magnitude more skill, but that doesn't matter. First of all, there is a absolute limit. Secondly, it doesn't provide a serious advantage, it doesn't matter. And that may very well be the case with physics too. There is no guarantee that a faster thinking or increased working memory capacity will ever yield anything genuine without a lot of dumb luck, if at all. It is unlikely that a superhuman AI would come up with a faster than light propulsion or that it would disprove Gödel's incompleteness theorems.

Of course, we should be careful. And it is absolutely justified that an organisation like the SIAI gets money to do research on those questions. But there is not enough evidence to outweigh the doubt as to impede AI research. We will actually need research of real AGI to answer some of the open questions.

Regarding self-improvement I'm very doubtful too. The human indecision and fuzziness of thinking might very well be a feature. A superhuman AI might very well beat us at Go or the stock exchange, as long as it deals with its own kind and not the irrational agents that we are, but that doesn't mean it will be able to deal with natural problems orders of magnitude more efficient than we do.

Most of the risks from superhuman AI are associated with advanced nanotechnology. Without it, it will be impotent. Can it solve it, if it is possible at all? Can it implement its results if it can solve it, if it is possible? Because without it, self-improvement will be very hard. What will be even harder is creating copies of it without first building the necessary infrastructure for the computational substrates.

Could an AGI take over the Internet? This is very unlikely. There are spare resources, but not that much. You can't expect that it would even be suitable as a computational substrate. And how is it going to make use of it before crude measures are taken to shut it down? Many open questions, much speculation.

Paperclipping is another very speculative idea. Is a superhuman artificial general intelligence possible that is mistakenly equipped with the incentive to turn the universe into paperclips? I guess it is possible, but not without hard-coding this incentive deliberately and with great care.

Kaj's paper relies very heavily on Omohundro's paper from AGI '08. Check out the reply that I presented/published at BICA '08 which (among other things) summarizes why the assumptions that Kaj relies upon are probably incorrect:

Two things surprised me in your argument. One is that you seemed to assume that features of human ethics (which you attribute to our having evolved as social animals) would be universal in the sense that they would also apply to AIs which did not evolve and which aren't necessarily social.

The second is that although you pay lip service to game theory, you don't seem to be aware of any game theoretic research on ethics deeper than Axelrod(1984) and the Tit-for-Tat experiments. You ought to at least peruse Binsmore's "Natural Justice", even if you don't want to plow through the two volumes of "Game Theory and the Social Contract".

Being social is advantageous to any entity without terminal goals and advantageous to entities with terminal goals in most cases (primary exceptions being single goal entities, entities on the verge of achieving all of their terminal goals, and entities that are somehow guaranteed that they are and will remain far, far more powerful than everyone else). Humans evolved to be social because social was advantageous. A super-intelligent but non-evolved AGI will figure out that social is advantageous as well (except, obviously, in the very limited edge cases mentioned above).

Not quoting more research is not the same as being unaware of that research. I've read Binsmore -- but how can I successfully bring it up when I can't even get acceptance of Axelrod? It's like trying to teach multiplication while addition is still a problem. I really should read GT&tSC. It's been on my reading list since I've tasked myself with writing something in response to Rawls' corpus. I just haven't gotten around to it.

I have presented further works on the same subject at BICA '09 and AGI '10 (with a really fun second presentation at AGI '10 here) but haven't advanced the game theory portion at all (unfortunately). My focus has recently shifted radically though and going back to game theory could help that tremendously. Thanks.

Being social is advantageous to any entity without terminal goals [your emphasis]

I can't accept this. Many animals are not social, or are social only to the extent of practicing parental care.

A super-intelligent but non-evolved AGI will figure out that social is advantageous as well.

Only if it is actually advantageous to them (it?). Your claim would be much more convincing if you could provide examples of what AIs might gain by social interaction with humans, and why the AI could not achieve the same benefits with less risk and effort by exterminating or enslaving us. Without such examples, your bare assertions are completely unconvincing.

Please note that as humans evolved to their current elevated moral plane <cough, excuse me> they occasionally found extermination and enslavement to be more tempting solutions to their social problems than reciprocity. In fact, enslavement is a form of reciprocity - it is one possible solution to a bargaining problem as in Nash(1953). A solution in which one bargainer has access to much better threats than the other.

entities that are somehow guaranteed that they are and will remain far, far more powerful than everyone else

And you don't think a self-improving AI will ever fall into this category? Hell, if you gave a human the ability to run billions of simulations per second to study how their decisions would turn out, they'd be able to take over the world and "remain far, far more powerful" than everyone else. (If they were actually more intelligent, and not just faster, even more so.)

Your so-called "limited edge case" is the main case being discussed: superhuman intelligence. (The problem of single-goal entities is of course also discussed here; see the idea of a "paper-clip maximizer", for example.)

In short, you seem to be saying that we shouldn't worry about those "edge" cases because in all non-"edge" cases, things work out fine. That's like saying we shouldn't worry about having fire departments or constructing homes according to a fire code, because a fire is an "edge" case, and normally buildings don't burn down.

Even if you were to make such an argument, it makes little sense to propose it at a meeting of the fire council. ;-)

It may be true that mostly, fires don't happen. However, it's also true that if you don't build the buildings with fire prevention (and especially, preventing the spread of fires) in mind, then, sooner or later, your whole city burns down. Because at that point, it only takes one fire to do it.

Not really - the paper is about ways by which an AGI might become more powerful than humanity (corresponding to premise 3 in Ben's reconstructed version of the SIAI argument). You can combine it with Omohundro-like arguments, and I do briefly mention that connection in the conclusions, but the core content of the paper is an independent and separate issue from AI drives, universal ethics or any such issue.

Omohundro's paper was about The Basic AI Drives. The abstract says: " We identify a number of “drives” that will appear in sufficiently advanced AI systems of any design".

Social drives are arguably not very "basic" - since they only show up in social situations.

I'm sure such machines would also have a "drive to swim" - if immersed in water - and a "drive to escape" - if encased by crushing jaws - but these "drives" were judged not sufficiently "basic" to go into Omohundro's paper.

One thing that I think is relevant, in the discussion of existential risk, is Martin Weitzmann's "Dismal Theorem" and Jim Manzi's analysis of it. (Link to the article, link to the paper.)

There, the topic is not unfriendly AI, but climate change. Regardless of what you think of the topic, it has attracted more attention than AGI, and people writing about existential risk are often using climate change as an example.

Martin Weitzman, a Harvard economist, deals with the probability of extreme disasters, and whether it's worth it in cost-benefit terms to deal with them. Our problem, in cases of extreme uncertainty, is that we don't only have probability distributions, we have uncertain probability distributions; it's possible we got the models wrong. Weitzman's paper takes this into account. He creates a family of probability distributions, indexed over a certain parameter, and integrates over it -- and he proves that the process of taking "probability distributions of probability distributions" has the result of making the final distribution fat-tailed. So fat-tailed that the integral doesn't converge.

This is a terrible consequence. Because if the PDF of the cost of the risk doesn't converge, then we cannot define an expected cost. We can't do cost-benefit analysis at all. Weitzman's conclusion is that the right amount to spend mitigating risk is "more than we're doing."

Manzi criticizes this approach as just an elaborately stated version of the precautionary principle. If it's conceivable that your models are wrong and things are even riskier than you imagined, it doesn't follow that you should spend more to mitigate the risk; the reductio is that if you knew nothing at all, you should spend all your money mitigating the most unknown possible risk!

This is relevant to people talking about AGI. We're not considering spending a lot of money to mitigate this particular risk, but we are considering forgoing a lot of money -- the value of a possible useful AI. And it may be tempting to propose a shortcut, a la Marty Weitzman, claiming that the very uncertainty of the risk is an argument for being more aggressive in mitigating it. The problem is that this leads to absurd conclusions. You could think up anything -- murderous aliens! Killer vacuum cleaners! and claim that because we don't know how likely they are, and because the outcome would be world-endingly terrible, we should be spending all our time trying to mitigate the risk!

Uncertainty about an existential risk is not an argument in favor of spending more on it. There are arguments in favor of spending more on an existential risk -- they're the old-fashioned, cost-benefit ones. (For example, I think there's a strong case, in old-fashioned cost-benefit terms, for asteroid collision prevention.) But if you can't justify spending on cost-benefit grounds, you can't try a Hail Mary and say "You should spend even more -- because we could be wrong!"

The talk about uncertainty is indeed a red herring. There are two things going on here:

A linear aggregative (or fast-growing enough in the relevant range) social welfare function makes even small probabilities of existential risk more important than large costs or benefits today. This is the Bostrom astronomical waste point. Weitzmann just uses a peculiar model (with agents with bizarre preferences that assign infinite disutility to death, and a strangely constricted probability distribution over outcomes) to indirectly introduce this. You can reject it with a bounded social welfare function like Manzi or Nordhaus, representing your limited willingness to sacrifice for future generations.

The fact that there are many existential risks competing for our attention, and many routes to affecting existential risk, so that spending effort on any particular risk now means not spending that effort on other existential risks, or keeping it around while new knowledge accumulates, etc. Does the x-risk reduction from climate change mitigation beat the reduction from asteroid defense or lobbying for arms control treaties at the current margin? Weitzmann addresses this by saying that the risk from surprise catastrophic climate change is much higher than other existential risks collectively, which I don't find plausible.

Is anyone in SIAI making the argument that we should spend more because our models are too uncertain to provide expected costs, or more generally that our very uncertainty of model is a significant source of concern? My impression was more that it's "we have good reasons to doubt people's estimation that Friendliness is easy" and "we have good reason to believe it's actually quite hard."

fair enough -- this is my caution against the logic "I can think of a risk, therefore we need to worry about it!" It seems that SIAI is making the stronger claim that unfriendliness is very likely.

My personal view is that AI is very hard itself, and that working on, say, a computer that can do what a mouse can do is likely to take a long time, and is harmless but very interesting research. I don't think we're anywhere near a point when we need to shut down anybody's current research.

Consider marginal utility. Many people are working on AI, machine learning, computational psychology, and related fields. Nobody is working on preference theory, formal understanding of our goals under reflection. If you want to do interesting research and if you have the background to advance either of those fields, do you think the world will be better off with you on the one side or on the other?

Maybe that's true, but that's a separate point. "Let's work on preference theory so that it'll be ready when the AI catches up" is one thing -- tentatively, I'd say it's a good idea. "Let's campaign against anybody doing AI research" seems less useful (and less likely to be effective.)

But if provable friendliness is hard, wouldn't it be much easier to accomplish with the help of AI? Presumably if the FAI problem can be solved by a few dozen smart human researchers within a few decades, then it can be solved in a year or so by a few dozen not-guaranteed-friendly AGIs-in-a-box with limited IQs in the 180-220 range. The AGIs design an FAI architecture and provide the proof, some smart humans check the proof, and then we build the thing and fasten our seatbelts for the exciting ride as the FAI goes FOOM.

How do you propose to limit their IQs? I'm not asking facetiously; your plan seems reasonable to me, but that's the part that seems the trickiest, and the part that if gotten wrong could lead to accidental early FOOMage.

I have no idea how to limit the IQ of AIs that other people produce without my knowledge. For AI's that I produce myself, I would simply do without closed-loop recursive self-improvement (aka, keep the AI in a box) until I have a proven FAI architecture in hand.

I'm reasonably confident that a closed-loop FOOM is impossible until AI "IQ" goes well past the max human level. I am also reasonably confident that closing the recursive self-improvement loop doesn't speed things up much until you reach that level, either.

So, if a "Sane AI" project like this one, operating under the slogan of "Open loop until we have a proof" can maintain a technological lead of a year or so over a "Risky AI" project with the slogan "Close the loop - Full speed ahead", then I'm pretty sure it is actually safer than a "Secure FAI" project operating under the slogan "No AGI until we have a proof". Because it has a better chance of establishing and maintaining that technological lead.

Eliezer figures out how to download his own brain. The emulation requires only a small amount of processing speed and memory. With the financial backing of the SIAI, LessWrong readers and wealthy tech businesspeople we create millions of Ems and have each run at 1,000 times the speed that Eliezer runs at. All of the Eliezer ems immediately work on improving the Ems' code and make huge use of trial and error in which they make some changes to the code of a subset of the Ems and give them intelligence tests, throwout the less intelligent Ems and make many copies of the superior ones.

Your scenario strikes me as laughably overoptimistic. A brain emulation requires only a small amount of processing speed and memory? A story that begins with finding financial backing takes only a week to reach completion?

But in any case, this is a closed-loop recursive self-improvement FOOM. I don't doubt
that such things are possible. My point was that if you already have a bunch of super-Eliezers, why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI? If they discover the secret of FAI within a year or so, great! If it turns out that provably correct FAI is just a pipe-dream, then maybe we ought to reconsider our plans to close the loop and FOOM.

" A brain emulation requires only a small amount of processing speed and memory?"

If software is the bottleneck and computer speed and memory are increasing exponentially than you would expect that by the time the software was available it would use a relatively small amount of computing power.

" A story that begins with finding financial backing takes only a week to reach completion?"

My story begins with the Eliezer Em. 150,000 people die everyday, and money probably becomes useless after a singularity. If enough people understood what was happening we could raise, say, a billion dollars in a few days. Hedge funds, I strongly suspect, do sometimes make billion dollar bets based on information they acquired in the last day.

"why not have them design a provably-correct FAI, rather than sending them off to FOOM into an uFAI?"

The 150,000 lives a day cost of delay plus the Eliezer ems might be competing with other ems that have list benign intentions.

Hm, so then the issue just becomes how to keep the AI from closing its own loop (i.e. modifying itself in-memory through some security hole it finds). I agree that it seems unlikely to figure out how to do so at a relatively low level of intelligence.

On the other hand, it seems like it would be pretty hard to do research on self-improvement without a closed loop; isn't the expectation usually that the self-improvement process won't start doing anything particularly interesting until many iterations have passed?

Maybe I'm just misunderstanding your use of the terms. I take it by "open loop" you mean that the AI would seek to generate an improved version of itself, but would simply provide that code back to the researcher rather than running it itself?

Maybe I'm just misunderstanding your use of the terms. I take it by "open loop" you mean that the AI would seek to generate an improved version of itself, but would simply provide that code back to the researcher rather than running it itself?

Roughly, yes. But I see recursive self-improvement as having a hardware component as well, so "closed loop" also includes giving the AI control over electronics factories and electronic assembly robots.

... it seems like it would be pretty hard to do research on self-improvement without a closed loop; isn't the expectation usually that the self-improvement process won't start doing anything particularly interesting until many iterations have passed?

Odd. My expectation for the software-only and architecture-change portion of the self-improvement is that the curve would be the exact opposite - some big gains early by picking off low-hanging fruit, but slower improvement thereafter. It is only in the exponential growth of incorporated hardware that you would get a curve like that which you seem to expect.

Perhaps the current state of evidence really is insufficient to support the scary hypothesis.

But surely, if one agrees that AI ethics is an existentially important problem, one should also agree that it makes sense for people to work on a theory of AI ethics. Regardless of which hypothesis turns out to be true.

Just because we don't currently have evidence that a killer asteroid is heading for the Earth, doesn't mean we shouldn't look anyway...

Is the overall utility of the universe maximized by one universe-spanning consciousness happily paperclipping or by as many utility maximizing discrete agents as possible? It seems ethics must be anthropocentric and utility cannot be maximized against an outside view. This of course means that any alien friendly AI is likely to be an unfriendly AI to us and therefore must do everything to impede any coherent extrapolated volition of humanity so as to subjectively maximize utility by implementing its own CEV. Given such inevitable confrontation one might ask oneself, what advice would I give to aliens that are not interested in burning the cosmic commons over such a conflict? Maybe the best solution from an utilitarian perspective would be to get back to an abstract concept of utility, disregard human nature and ask what would increase the overall utility for most possible minds in the universe?

Is the overall utility of the universe maximized by one universe-spanning consciousness happily paperclipping or by as many utility maximizing discrete agents as possible?

I favor many AIs rather than one big one, mostly for political (balance of power) reasons, but also because:

The idea of maximizing the "utility of the universe" is the kind of idiocy that utilitarian ethics induces. I much prefer the more modest goal "maximize the total utility of those agents currently in your coalition, and adjust that composite utility function as new agents join your coalition and old agents leave."

Clearly, creating new agents can be good, but the tradeoff is that it dilutes the stake of existing agents in the collective will. I think that a lot of people here forget that economic growth requires the accumulation of capital, and that the only way to accumulate capital is to shortchange current consumption. Having a brilliant AI or lots of smart AIs directing the economy cannot change this fact. So, moderate growth is a better way to go.

Trying to arrive at the future quickly runs too much risk of destroying the future. Maybe that is one good thing about cryonics. It decreases the natural urge to rush things because people are afraid they will die too soon to see the future.

I suppose the question is why you think that the old patterns of industrial organization will continue to apply? That agents will form coalitions and cooperate is generally a good thing, to my mind - the pattern you seem to imagine, in which the powerful join to exploit the powerless can easily be avoided with a better distribution of power and information.

One distinctive feature of the hypothetical "paperclipers" is that they attempt to leave a low-entropy state behind - one which other organisms would normally munch through. Humans don't tend to do that - like most living things, they keep consuming until there is (practically) nothing left - and then move on.

Leaving a low entropy state behind seems like the defining feature of the phenomenon to me. From that perspective, a human civilisation would not really qualify.

I’m also not big on friendly AI, but my position differs somewhat. I’m pretty skeptical about a very local hard takeoff scenario, where within a month one unnoticed machine in a basement takes over a world like ours. And even given on such a scenario the chance that its creators could constraint it greatly via a provably friendly design seems remote. And the chance such constraint comes from a small team that is secretive to avoid assisting wreckless others seems even more remote.

[...] I just see little point anytime soon in trying to coordinate to prevent such an outcome.

I do see a real risk that, if we proceed in the manner I'm advocating, some nasty people will take the early-stage AGIs and either use them for bad ends, or proceed to hastily create a superhuman AGI that then does bad things of its own volition. These are real risks that must be thought about hard, and protected against as necessary. But they are different from the Scary Idea.

Is this really different from the Scary Idea?

I've always thought of this as part of the Scary Idea, in fact, the reason the Scary Idea is scary - scarier than nuclear weapons. Because when mankind reaches the abyss, and looks with dismay at the prospect that lies ahead, we all know that there will be at least one idiot among us why doesn't draw back from the abyss, but instead continues forward down the slippery slope.

At the nuclear abyss, that idiot will probably kill a few hundred million of us. No big deal. But at the uFAI abyss, we may have ourselves a serious problem.

If I believe "X is incredibly useful but someone might use it to destroy the world," I can conclude that I should build X and take care to police the sorts of people who get to use it. But if I believe "X is incredibly useful but its very existence might spontaneously destroy the world" then that strategy won't work... it doesn't matter who uses it. Maybe there's another way, or maybe I just shouldn't build X, but regardless of the solution it's a different problem.

It's like the difference between believing that nuclear weapons might some day be directed by humans to overthrow civilization, and believing that a nuclear reaction will cause all of the Earth's atmosphere to spontaneously ignite. In the first case, we can attempt to control nuclear weapons. In the second case, we must prevent nuclear reactions from ever starting.

Just to be clear: I'm not championing a position here on what sort of threat AGI's pose. I'm just saying that these are genuinely different threat models.

The "uFAI abyss"? Does that have something to do with the possibility of a small group of "idiots" - who were nonetheless smart enough to beat everyone else to machine intelligence - overthrowing the world's governments?

Much is unclear. I believe this post is a good oppurtunity to give a roundup of the problem, for anyone who hasn't read the comments thread here:

The risk from recursive self-improvement is either dramatic enough to outweigh the low probability of the event or likely enough to outweight the probability of other existential risks. This is the idea everything revolves around in this community (it's not obvious, but I believe so). It is a idea that, if true, possible affects everyone and our collective future, if not the whole universe.

I believe that someone like Eliezer Yudkowsky and the SIAI should be able to state in a concise way (with possible extensive references) why it is rational to make friendly AI a top priority. Given that friendly AI seems to be what his life revolves around the absence of material in support for the proposition of risks posed by uFAI seems to be alarming. And I'm not talking about the absence of apocalyptic scenarios here but other kinds of evidence than a few years worth of disjunctive lines of reasoning. The bulk of all writings on LW and by the SIAI are about rationality, not risks posed by recursively self-improving artificial general intelligence.

Where are the formulas? What are the variables? Where is a method exemplified to reflect the decision process of someone who's already convinced, preferably of someone within the SIAI? That would be part of what I call transparency and a foundational and reproducible corroboration of one's first principles.

Where are the reference to substantial third-party research papers? There are many open problems regarding artificial general intelligence, how exactly does the SIAI handle those uncertainties and accounts for them in their probability estimations of the dangers posed by AI?

Where does the SIAI outline the likelihood of slow versus fast development of AGI? Where are your probability estimations that account for these uncertainties. Where are your variables and references that allow you to make any kind of estimations to balance the risks of a hard rapture with a somewhat controllable development?

What are the foundations that give credibility to the chain of reasoning that leads one to accept unfriendly superhuman intelligence going foom as a serious risk?

What if someone came along making coherent arguments about some existential risk about how some sort of particle collider might destroy the universe? I would ask what the experts think who are not associated with the person who makes the claims. What would you think if he simply said, "do you have better data than me"? Or, "I have a bunch of good arguments"? If you say that some sort of particle collider is going to destroy the world with a probability of 75% if run, I'll ask you for how you came up with these estimations. I'll ask you to provide more than a consistent internal logic but some evidence-based prior.

The current state of evidence IS NOT sufficient to scare people up to the point of having nightmares and ask them for most of their money. It is not sufficient to leave comments making holocaust comparisons on the blogs of AI researchers.

Is smarter than human intelligence possible in a sense comparable to the difference between chimps and humans?

How is an encapsulated AI going to get into control without already existing advanced nanotechnology? It might order something over the Internet if it hacks some bank account etc. (long chain of assumptions), but how is it going to make use of the things it orders?

Why should self-optimization not be prone to be very limited. Changing anything substantial might lead Gandhi to swallow the pill that will make him want to hurt people, so to say.

You have to list your primary propositions on which you base further argumentation, from which you draw conclusions and which you use to come up with probability estimations stating risks associated with former premises. You have to list these main principles so anyone who comes across claims of existential risks and a plead for donation, can get an overview. Then you have to provide the references, if you believe they give credence to the ideas, so that people see that all you say isn't made up but based on previous work and evidence by people that are not associated with your organisation.

You could argue your case of "this is obviously true" with completely made-up claims, and I'd have no way to tell. -- Kaj_Sotala

This is a community devoted to refining the art of rationality. How is it rational to believe the Scary Idea without being able to tell if it is more than an idea?

The risk from recursive self-improvement is either dramatic enough to outweigh the low probability of the event or likely enough to outweight the probability of other existential risks. This is the idea everything revolves around in this community (it's not obvious, but I believe so).

Umm, this is not the SIAI blog. It is "Less Wrong: a community blog devoted to refining the art of human rationality".

The idea everything revolves around in this community is what comes after the ':' in the preceding sentence.

Besides its history and the logo with a link to the SIAI that you can see in the top right corner, I believe that you underestimate the importance of artificial intelligence and associated risks within this community. As I said, it is not obvious, but when Yudkowsky came up with LessWrong.com it was against the background of the SIAI.

Eliezer explicitly forbade discussion of FAI/Singularity topics on lesswrong.com for the first few months because he didn't want discussion of such topics to be the primary focus of the community.

Again, "refining the art of human rationality" is the central idea that everything here revolves around. That doesn't mean that FAI and related topics aren't important, but lesswrong.com would continue to thrive (albeit less so) if all discussion of singularity ceased.

Perhaps you overestimate the extent to which google search results on a term reflect the importance of the concept to which the word refers.

I note that:

The best posts on 'rationality' are among those that do not use the word 'rationality'*.

Similar to 'Omega' and 'Clippy', AI is a useful agent to include when discussing questions of instrumental rationality. It allows us to consider highly rational agents in the abstract without all the bullshit and normative dead weight that gets thrown into conversations whenever the agents in question are humans.

The current state of evidence IS NOT sufficient to scare people up to the point of having nightmares

You appear to be suggesting that Eliezer should censor presentation of his thoughts on the subject so as to prevent people from having nightmares. Spot the irony! ;)

and ask them for most of their money.

Eliezer asks people for money. That hardly makes him unique. Neither he nor anyone else is obliged to get your permission before they ask for donations in support of their cause. It seems to me that you expect more from the SIAI than you do from other well meaning organisations simply because there is actually a chance that the cause may make a significant long term difference. As opposed to virtually all the rest - those we know are pointless!

What if someone came along making coherent arguments about some existential risk about how some sort of particle collider might destroy the universe? I would ask what the experts think who are not associated with the person who makes the claims. What would you think if he simply said, "do you have better data than me"? Or, "I have a bunch of good arguments"? If you say that some sort of particle collider is going to destroy the world with a probability of 75% if run, I'll ask you for how you came up with these estimations. I'll ask you to provide more than a consistent internal logic but some evidence-based prior.

I rather suspect that if all those demands were meant you would go ahead and find new rhetorical demands to make.

So take my word for it, I know more than you do, no really I do, and SHUT UP. -- Eliezer Yudkowsky (Reference)

You have to list your primary propositions on which you base further argumentation, from which you draw conclusions and which you use to come up with probability estimations stating risks associated with former premises. You have to list these main principles so anyone who comes across claims of existential risks and a plead for donation, can get an overview. Then you have to provide the references, if you believe they give credence to the ideas, so that people see that all you say isn't made up but based on previous work and evidence by people that are not associated with your organisation.

That quote is out of context. While I do happen to hold Eliezer's behavior in that context in contempt, the way the quote is presented here is misleading. It is not relevant to your replies and only relevant to the topic here by virtue of Eliezer's character.

Is smarter than human intelligence possible in a sense comparable to the difference between chimps and humans?

This is a community devoted to refining the art of rationality. How is it rational to believe the Scary Idea without being able to tell if it is more than an idea?

Speak for yourself. I don't have the difficulty comprehending the premises either the ones you have questions here or the others required to make an adequate evaluation for the purpose of decision making.

Neither I nor Eliezer and the SIAI need to force understanding of the Scary Idea upon you for it to be rational for us to place credence on it. The same applies to other readers here. That is not to say that more work producing the documentation of the kind that you describe would not be desirable.

This comment will be downvoted but I hope you people will actually explain yourself and not just click 'Vote down', every bot can do that.

Now that I've slept I read your comment again and I don't see any justification for why it got upvoted even once. I never claimed that EY can't ask for money, you are creating a straw man there. You also do not know what I do expect from other organisations. Further, it is not fallacious to suspect that Yudkowsky has some responsibility if people get nighmares from ideas that he would be able to resolve. If he really believes those things, it is of course his right to proclaim them. But the gist of my comment was meant to inquire about the foundations of those beliefs and stating that it does not appear to me that they are based on evidence which makes it legally right but ethically irresponsible to tell people to worry to such an extent or even not to tell them not to worry.

I rather suspect that if all those demands were meant you would go ahead and find new rhetorical demands to make.

I just don't know how to parse this. I mean what I asked for and I do not ask for certainty here. I'm not doubting evolution and climate change. The problem is that even a randomly picked research paper likely bears more analysis, evidence and references than all of LW and the SIAI' documents together regarding risks posed by recursive self-improvement from artificial general intelligence.

That quote is out of context.

The quotes have been relevant as they showed that Yudkowsky clearly believes in his intellectual and epistemic superiority, yet any corroborative evidence seems to be missing. Yes, there is this huge amount of writings on rationality and some miscellaneous musing on artificial intelligence. But given how the idea of risks from AGI is weighted by him, it is just the cherry on top of marginal issues that do not support the conclusions.

Speak for yourself. I don't have the difficulty comprehending the premises either the ones you have questions here or the others required to make an adequate evaluation for the purpose of decision making.

I don't have a difficulty to comprehend them either. I'm questioning the propositions, the conclusions drawn and further speculations based on those premises.

Neither I nor Eliezer and the SIAI need to force understanding of the Scary Idea upon you for it to be rational for us to place credence on it.

This is ridiculous. I never said you are forced to explain yourself. You are forced to explain yourself if you want people like me to take you serious.

The quotes have been relevant as they showed that Yudkowsky clearly believes in his intellectual and epistemic superiority, yet any corroborative evidence seems to be missing. Yes, there is this huge amount of writings on rationality and some miscellaneous musing on artificial intelligence. [...]

Yudkowsky is definitely a clever fellow. He may not have fancy qualifications - and he is far from infallible - but he is pretty smart.

In the particular post in question, I am pretty sure he was being silly - which is a rather unfortunate time to be claiming superiority.

However, I don't really know. The stunt created intrigue, mystery, the forbidden, added to the controversy. Overall, Yudkowsky is pretty good at marketing - and maybe this was a taste of it.

I wonder if his Harry Potter fan-fic is marketing - or else how he justifies it.

I was too lazy to write this up again, it's copy and paste work so don't mind some inconsistencies. Regarding the quotes, I think that EY seriously believes what he says in the given quotes, otherwise I wouldn't have posted them. I'm not even suggesting that it isn't true, I actually allow for the possibility that he is that smart. But I want to know what I should do and right now I don't see any good arguments.

I'm a supporter and donor and what I'm trying to do here is coming up with the best possible arguments to undermine the credence of the SIAI. Almost nobody else is doing that, so I'm trying my best here. This isn't damaging, this is helpful. Because once you become really popular, people like P.Z. Myers and other much more eloquent and popular people will pull you to pieces if you can't even respond to my poor attempt at being a devils advocate.

I don't have the difficulty comprehending the premises either the ones you have questions here or the others required to make an adequate evaluation for the purpose of decision making.

I don't even know where to start here, so I won't. But I haven't come across anything yet that I had trouble understanding.

I rather suspect that if all those demands were meant you would go ahead and find new rhetorical demands to make.

See that women with red hair? Well, the cleric told me that he believes that she's a witch. But he'll update on evidence if the fire didn't consume her. I said red hair is insufficient data to support that hypothesis and take such extreme measures to test it. He told me that if he came up with more evidence like sorcery I'd just go ahead and find new rhetorical demands.

You appear to be suggesting that Eliezer should censor presentation of his thoughts on the subject so as to prevent people from having nightmares. Spot the irony! ;)

I'm not against free speech and religious freedom but that also applies for my own thoughts on the subject. I believe he could do much more than censoring certain ideas, namely show that they are bogus.

He told me that if he came up with more evidence like sorcery I'd just go ahead and find new rhetorical demands.

[See context for implied meaning if the excerpt isn't clear]. I claimed approximately the same thing that you say yourself below.

I'm a supporter and donor and what I'm trying to do here is coming up with the best possible arguments to undermine the credence of the SIAI. Almost nobody else is doing that, so I'm trying my best here.

I've got nothing against the Devil, it's the Advocacy that is mostly bullshit. Saying you are 'Devil's Advocate' isn't an excuse to use bad arguments. That would be an insult to the Devil!

I don't even know where to start here, so I won't. But I haven't come across anything yet that I had trouble understanding.

You conveyed most of your argument via rhetorical questions. To the extent that they can be considered to be in good faith (and not just verbal tokens intended to influence) some of them only support the position you used them for if you genuinely do not understand them (implying that there is no answer). I believe I quoted an example in the context.

Making an assertion into a question does not give a license to say whatever you want with no risk of direct contradiction. (Even though that is how the tactic is used in practice.)

To the extent that they can be considered to be in good faith (and not just verbal tokens intended to influence) some of them only support the position you used them for if you genuinely do not understand them (implying that there is no answer).

I'm probably too tired to parse this right now. I believe there probably is an answer, but it is buried under hundreds of posts about marginal issues. All those writings on rationality, there is nothing I disagree with. Many people know about all this even outside of the LW community. But what is it that they don't know that EY and the SIAI knows? What I was trying to say is that if I have come across it then it was not convincing enough to take it as serious as some people here obviously do.

It looks like that I'm not alone. Goertzel, Hanson, Egan and lots of other people don't see it as well. So what are we missing, what is it that we haven't read or understood?

Goertzel: I could and will list the errors I see in his arguments (if nobody there has done so first). For now I'll just say his response to claim #2 seems to conflate humans and AIs. But unless I've missed something big, which certainly seems possible, he didn't make his decision based on those arguments. They don't seem good enough on their face to convince anyone. For example, I don't think he could really believe that he and other researchers would unconsciously restrict the AI's movement in the space of possible minds to the safe area(s), but if we reject that possibility some version of #4 seems to follow logically from 1 and 2.

Egan: don't know. What I've seen looks unimpressive, though certainly he has reason to doubt 'transhumanist' predictions for the near future. (SIAI instead seems to assume that if humans can produce AGI, then either we'll do so eventually or we'll die out first. Also, that we could produce artificial X-maximizing intelligence more easily then we can produce artificial nearly-any-other-human-trait, which seems likely based on the tool I use to write this and the history of said tool.) Do you have a particular statement or implied statement of his in mind?

Hanson: maybe I shouldn't point any of this out, but EY started by pursuing a Heinlein Hero quest to save the world through his own rationality. He then found himself compelled to reinvent democracy and regulation (albeit in a form closely tailored to the case at hand and without any strict logical implications for normal politics). His conservative/libertarian economist friend called these new views wrongheaded despite verbally agreeing with him that EY should act on those views. Said friend also posted a short essay about "heritage" that allowed him to paint those who disagreed with his particular libertarian vision as egg-headed elitists.

Saying you are 'Devil's Advocate' isn't an excuse to use bad arguments.

I don't think I used a bad argument, otherwise I wouldn't have done it.

You conveyed most of your argument via rhetorical questions.

Wow, you overestimate my education and maybe intelligence here. I have no formal education except primary school. I haven't taken a rhetoric course or something. I honestly believe that what I have stated would be the opinion of a lot of educated people outside of this community if they came across the arguments on this site and by the SIAI. That is, data and empirical criticism are missing given the extensive use of the idea that is AI going FOOM to justify all kinds of further argumentation.

I'm fighting against giants here. Someone who only mastered elementary school. I believe it should be easy to refute my arguments or show me where I am wrong, point me to some documents I should read up on. But I just don't see that happening. I talk to other smart people online as well, that way I was actually able to overcome religion. But seldom there have been people less persuasive than you when it comes to risks associated with artificial intelligence and the technological singularity. Yes, maybe I'm unable to comprehend it right now, I grant you that. Whatever the reason, I'm not conviced and will say so as long as it takes. Of course you don't need to convince me, but I don't need to stop questioning either.

Here is a very good comment by Ben Goertzel that pinpoints it:

This is what discussions with SIAI people on the Scary Idea almost always come down to!

The prototypical dialogue goes like this.

SIAI Guy: If you make a human-level AGI using OpenCog, without a provably Friendly design, it will almost surely kill us all.

Ben: Why?

SIAI Guy: The argument is really complex, but if you read Less Wrong you should understand it

Ben: I read the Less Wrong blog posts. Isn't there somewhere that the argument is presented formally and systematically?

SIAI Guy: No. It's really complex, and nobody in-the-know had time to really spell it out like that.

But seldom there have been people less persuasive than you when it comes to risks associated with artificial intelligence and the technological singularity.

I don't know if there is a persuasive argument about all these risks. The point of all this rationality-improving blogging is that when you debug your thinking, when you can follow long chains of reasoning and feel certain you haven't made a mistake, when you're free from motivated cognition - when you can look where the evidence points instead of finding evidence that points where you're looking! - then you can reason out the risks involved in recursively self-improving self-modifying goal-oriented optimizing processes.

I believe he could do much more than censoring certain ideas, namely show that they are bogus.

I'm not a big fan of Eliezer, but that complaint strikes me as completely unfair. There is far less censorship here than at a typical moderated blog. And EY does expend some effort showing that various ideas are bogus.

I'm not an insider, or even old-timer, but I have reason to believe that the one single forbidden subject here is censored not because it is believed to be valid or bogus, nor because it casts a bad light on EY and SIAI, but rather because discussing it does no good and may do some harm - something a bit like a ban on certain kinds of racist offensive speech, but different.

And in any case, the "forbidden idea" can always be discussed elsewhere, assuming you can even find anyone that can become interested in the idea elsewhere. The reach of EY's "censorship" is very limited.

That is my impression too. Which is why I don't understand why you are complaining about censorship of ideas and wondering why EY doesn't spend more time refuting ideas.

As I understand it, we are talking about actions that might be undertaken by an AI that you and I would call insane. The "censorship" is intended to mitigate the harm that might be done by such an AI. Since I think it possible that a future AI (particularly one built by certain people) might actually be insane, I have no problem with preemptive mitigation activities, even if the risk seems miniscule.

Does astronomical value outweigh astronomical low probability? You can come up with all kinds of scenarios that bear astronomical value, an astronomical amount of scenarios if you allow for astronomical low probability. Isn't this betting on infinity?

Having such beliefs with absolute certainty is incorrect, we don't have sufficient understanding for that, but weak beliefs multiplied by astronomical value lead to the same drastic actions, whose cost-benefit analysis doesn't take notice of small inconveniences such as being perceived to be crazy.

I should add, don't get a wrong impression from those quotes. I still believe he might actually be that smart. He's at least the smartest person I know of by what I've read. Except when it comes to public relations. You shouldn't say those things if you do not explain yourself sufficiently at the same time.

I would like to explore Ben's reasons for rejecting the premises of the argument.

I think the first of the above points is reasonably plausible

He offers the possibility that intelligence might cause or imply empathy; I feel that although we see that connection when we look at all of Earth's creatures, correlation doesn't imply causation, so that (intelligence AND empathy) doesn't mean (intelligence IMPLIES empathy) - it probably means (evolution IMPLIES intelligence AND empathy) and we aren't using natural selection to build an AI.

I doubt human value is particularly fragile.

He makes the point that human values have robustly changed many times, and will probably continue to change in coordination with AGI. Human value is not fragile on the timescales we deal with as humans; our values have indeed changed since, say, Victorian times. But that took generations - most value change will take generations, because humans are (understandably) reserved about modifying their values. The timescales that AGIs will be dealing with are, on the low end, weeks. (An AGI with access to a microchip manufacturing plant, say). I can't see a plausible AGI that enacts changes at generational speed. So, yes, our values are robust, in the sense that a mountain is robust to weather patterns - but not robust to falling into the sun.

I think a hard takeoff is possible ... it's very unlikely to occur until we have an AGI system that has very obviously demonstrated general intelligence

I think he is accurate in this assessment.

I think the path to this "hard takeoff enabling" level of general intelligence is going to be somewhat gradual

Again, accurate. The path to nuclear fission was gradual over many years, but the reaction itself (the takeoff) could have irradiated a university in hours. His position appears to be that he think a hard takeoff is possible, but that we'll have warning signs and a deeper understanding of the AGI before it happens ... well, a scientist from the Manhattan Project in Japan during WWII would have a deeper understanding of the features of a nuclear explosion, but the defense against it is STILL not being in Hiroshima. I don't think more knowledge about the issue is going to significantly change the solution. We have reached the diminishing returns level of knowledge about AI with respect to decreasing existential risk of said AI.

pointing out that something scary is possible, is a very different thing from having an argument that it's likely.

This is just wrong. The only difference between possible and likely is the probability distribution, and we know how to reason with probability distributions. If Ben has an argument for why the probability distribution is SO small that even multiplied by 'hard takeoff, universe is paperclips, end of existence' comes out below the "AGI without Friendly" route, well, he should articulate it and provide evidence. Without certainty that the chances are very low, he should accept the Scary Idea.

I'm a lot more worried about nasty humans taking early-stage AGIs and using them for massive destruction

Preventing abuse of AGI and preventing uFAI takeoff scenarios are not mutually exclusive, you can and should attempt to prevent both.

I'm also quite unconvinced that "provably safe" AGI is even feasible.

Mostly claims and arguments that a proof of friendliness is impossible. In order to argue that "provably safe AGI isn't feasible, we should instead develop unpredictable but I-don't-see-the-danger AGI" you pretty much need an Incompleteness Proof; that there IS NO proof of friendliness, not that there isn't one yet. If you believe that friendliness proof is probably impossible, you shouldn't work on AGI at all, instead of working on possibly-unfriendly AI. That Ben came to the conclusion that work should continue, rather than halt entirely, suggests he is motivated to justify his own work rather than engage with his beliefs about the Scary Idea.

I just don't buy the Scary Idea.

The Scary Idea that Ben outlined is "The stakes are so high that 'unlikely' is not good enough; we need 'surer than we've ever been'. Anything less is too dangerous." and his refutations have amounted to "I don't know for sure, but I don't think it's likely".

In essence, he hasn't refuted the argument, but instead made it scarier. If AI developers can see "stakes are very high" and "there is a small chance", and argue against "the stakes are high enough that a small chance is too much chance", then uFAI is that much more likely.

Finally, I note that most of the other knowledgeable futurist scientists and philosophers, who have come into close contact with SIAI's perspective, also don't accept the Scary Idea. Examples include Robin Hanson, Nick Bostrom and Ray Kurzweil.

Is there a reference for Bostrom's position on AGI-without-FAI risk? Is Goertzel correct here?

For all of these reasons, one should be wary of assuming that the emergence of superintelligence can be predicted by extrapolating the history of other technological breakthroughs, or that the nature and behaviors of artificial intellects would necessarily resemble those of human or other animal minds.

The question is not whether Bostrom urges caution (which Goertzel and many others also urge), but whether Bostrom agrees that the Scary Idea is true -- that is, whether projects like Ben's and others will probably end the human race if developed without a pre-existing FAI theory, and whether the only (or most promising) way to not incur extremely high risk of wiping out humanity is to develop FAI theory first.

That is, rather than "if you go ahead with an AGI when you're not 100% sure that it's safe, you're committing the Holocaust," I suppose my view is closer to "if you avoid creating beneficial AGI because of speculative concerns, then you're killing my grandma" !!

Yeah, that may very well be a big risk too. As I said before here: Or maybe most civilisations are that cautionary that even if something is estimated to be safe by the majority they rather avoid it. And this overcautious makes them either evolve so slow that the chance of a fatal natural disaster to occur before sufficient technology is developed to survive it, rises to 100%, or stops them from evolving at all for being unable to prove something being 100% safe before trying it and thus never taking the necessary steps to become less vulnerable to existing risks.

1-line summary: if the good guys delay their projects to make them safer, the bad guys are more likely to win.

The video's "abstract":

It is commonly thought that caution in the initial development of machine intelligence is associated with better outcomes - and that things like extensive testing, sandoxes, and provable correctness are things that will help to produce safe and beneficial synthetic intelligent agents.

In this video, I cast doubt on that idea, by exhibiting a model in which delays caused by caution can lead to much poorer outcomes.

It may depend on what the videos are like. They don't have to be simplified versions of the writing-- some people either take in information more easily if they hear it, or it's more convenient for them to listen whether they're driving or doing chores or whatever instead of reading.

No. And I've read interesting arguments to the effect that the cognitive habits of text are critical for helping people think in a logically coherent fashion.

Low resolution video appears to be good for public relations work targeting masses of people prevented by poverty from cultivating their cognitive resources, but it does not appear to be good for spelling out solid and cogent reasoning.

Part of the author's argument is simply that TV causes people to become mentally passive (alpha-wave brain states, etc) but another aspect of the argument is what kind of content optimizes impact given the medium. He argues that TV works differently even from movies in part because TV simply has such low resolution and so it mostly shows close ups of faces experiencing extreme emotions, slow motion replay of human bodies colliding, and dancing cartoon squirrels because those are what the medium does best.

A movie can give you a landscape or other complex scene and have it mean something. A book can cover nearly anything (including mental states), but only via low bitrate descriptive text, generally delivering a linearized stream of implicitly tree structured arguments or a narrative.

When choosing a publication venue, the form of the media determines the competitive environment and the safely assumed cognitive skills of the audience. There may be outliers like UCTV, but the central tendency reveals the medium's strengths.

The place to look to test the author's thesis (as opposed to the derivative claim about the value of video for this community) would be to compare the memetic complexity, themes, and "rationality" in top youtube videos, versus highest grossing movies, versus best sellers.

I could easily imagine that it could be helpful for aspiring rationalists to express themselves and argue in more than one medium simultaneously so that their ideas have to survive in multiple contexts that should not theoretically change the "reality correspondence" of their thinking...

And good uses for low res video could probably be found by anyone trying to consciously game the medium in light of analysis of the medium...

...but "in general, for society, as a medium" I would guess that low res video isn't particularly conducive to rationality.

I agree about the general low quality of youtube comments, but occasionally I'll see a special interest video with intelligent comments. The low quality may be a result of youtube being popular with the general public (blogs have specific audiences, youtube is for everyone) combined with founder effect, so that people who want to do intelligent comments generally put them elsewhere.

It seems to me that another test case is audio books vs books in text.

I'd rather see tests of how well people take in argument offered in text vs sound, and some attention to whether there are different subgroups.

I dislike watching videos, as they are synchronous (i.e., require a set amount of time to watch, which is generally more than it would take to read the same material) and not random access (i.e., I cannot easily skim them for a certain section).

Agreed thoroughly. They also demand all of my attention at once, and if I want to pause to do something else, it's harder to find my place and catch up again (I can't just glance up a couple of sentences). Plus they require fiddly mouse controls and are relatively resource-intensive, neither of which is any fun on a netbook.

I agree that that risk exists as well, but much of SIAI's efforts revolve around increasing discussion of the risks of AGI, not just holding back their own efforts. Slowing down other efforts through awareness of the dangers is a factor that should be considered.

Also, discussions of caution may increase the number of "desirable organizations" working to develop AI. In terms of your model, such discussion could turn a black-hat organization into a smiley-faced one. No one is going to release an AI that they actually think is going to wipe out humanity. What's more, not every well-intentioned organization would be one we want to build AGI. While certain organizations are more likely to be scrupulous in their development, the risk of well-intentioned error is probably the largest one.

In addition, one should consider the extent to which Friendliness can be developed in parallel with AGI, not just something added on at the end of the process. If we assume that no one is currently close to AGI (a fair belief, I think), then now is a fantastic time to help support the development of that theory. If FAI can be developed before anyone can implement AGI, then humanity is in good shape. If it's easy to add FAI to a project, or if knowing about workable FAI would not help a group with the problem of AGI, then the solution can be released widely for anyone to incorporate into their project. SIAI's goal is not to be the ones to implement the first superintelligence, but just to make sure that the first one is Friendly.

Also, discussions of caution may increase the number of "desirable organizations" working to develop AI. In terms of your model, such discussion could turn a black-hat organization into a smiley-faced one.

That seems like the (dubious) "engineers are incompetent and a bug takes over the world" scenario.

I think a much more obvious concern is where the "engineers successfully build the machine to do what it is told" scenario - where the machine helps its builders and sponsors - but all the other humans in the world - not so much.

SIAI's goal is not to be the ones to implement the first superintelligence, but just to make sure that the first one is Friendly.

That wasn't true not terribly long ago:

"The Singularity Institute was founded on the theory that in order to get a Friendly artificial intelligence, someone has got to build one. So, we’re just going to have an organization whose mission is: build a Friendly AI. That’s us."

Ben Goertzel also says "If one fully accepts SIAI's Scary Idea, then one should not work on practical AGI projects..." Here is another recent quote that is relevant:

What I find a continuing source of amazement is that there is a subculture of people half of whom believe that AI will lead to the solving of all mankind's problems (which me might call Kurzweilian S^) and the other half of which is more or less certain (75% certain) that it will lead to annihilation. Lets call the latter the SIAI S^.

Yet you SIAI S^ invite these proponents of global suicide by AI, K-type S^, to your conferences and give them standing ovations.

And instead of waging desperate politico-military struggle to stop all this suicidal AI research you cheerlead for it, and focus your efforts on risk mitigation on discussions of how a friendly god-like AI could save us from annihilation.

You are a deeply schizophrenic little culture, which for a sociologist like me is just fascinating.

But as someone deeply concerned about these issues I find the irrationality of the S^ approach to a-life and AI threats deeply troubling. -- James J. Hughes (existential.ieet.org mailing list, 2010-07-11)

It is impossible for a rational person to both believe in imminent rise of sea levels and purchase ocean-front property.

It is reported that former Vice President Al Gore just purchased a villa in Montecito, California for $8.875 million. The exact address is not revealed, but Montecito is a relatively narrow strip bordering the Pacific Ocean. So its minimum elevation above sea level is 0 feet, while its overall elevation is variously reported at 50ft and 180ft. At the same time, Mr. Gore prominently sponsors a campaign and award-winning movie that warns that, due to Global Warming, we can expect to see nearby ocean-front locations, such as San Francisco, largely under water. The elevation of San Francisco is variously reported at 52ft up to high of 925ft.

Ask yourself, wouldn't you fly a plane into a tower if that was the only way to disable Skynet? The difference between religion and the risk of uFAI makes it even more dangerous. This crowd is actually highly intelligent and their incentive based on more than fairy tales told by goatherders. And if dumb people are already able to commit large-scale atrocities based on such nonsense, what are a bunch of highly-intelligent and devoted geeks who see a tangible danger able and willing to do? More so as in this case the very same people who believe it are the ones who think they must act themselves because their God doesn't even exist yet.

The Al Gore hypocrisy claim is misleading. Global warming changes the equilibrium sea level, but it takes many centuries to reach that equilibrium (glaciers can't melt instantly, etc). So climate change activists like to say that there will be sea level rises of hundreds of feet given certain emissions pathways, but neglect to mention that this won't happen in the 21st century. So there's no contradiction between buying oceanfront property only slightly above sea level and claiming that there will be large eventual sea level increases from global warming.

The thing to critique would be the misleading rhetoric that gives the impression (by mentioning that the carbon emissions by such and such a date will be enough to trigger sea level rises, but not mentioning the much longer lag until those rises fully occur) that the sea level rises will happen mostly this century.

Regarding Hughes' point, even if one thinks that an activity has harmful effects, that doesn't mean that a campaign to ban it won't do more harm than good. That would essentially be making bitter enemies of several of the groups (AI academia and industry) with the greatest potential to reduce risk, and discredit the whole idea of safety measures. Far better to develop better knowledge and academic analysis around the issues, or to mobilize resources towards positive safety measures.

Regarding your quoted comment, it seems crazy. The Unabomber attacked innocent people in a way that did not slow down technology advancement and brought ill repute to his cause. The Luddites accomplished nothing. Some criminal nutcase hurting people in the name of preventing AI risks would just stigmatize his ideas, and bring about impenetrable security for AI development in the future without actually improving the odds of a good outcome (when X can make AGI, others will be able to do so then, or soon after).

"Ticking time bomb cases" are offered to justify legalizing torture, but they essentially never happen: there is always vastly more uncertainty and lower expected benefits. It's dangerous to use such hypotheticals as a way to justify legalization of abuse in realistic cases. No one can expect an act of violence to "disable Skynet" (if such a thing was known to exist, it would be too late anyway), and if a system could be shown to be quite likely dangerous, one would call the police, regulators, and politicians.

Keep your friends close...maybe they just want to keep the AI crowd as close together as possible. Making enemies wouldn't be a smart idea either, as the 'K-type S^' subgroup would likely retreat from further information disclosure. Making friends with them might be the best idea.

An explanation of the rather calm stance regarding a potential giga-death or living hell event would be to keep a low profile until acquiring more power.

I'm aware of that argument and also the other things you mentioned and don't think they are reasonable. I've written about it before but deleted my comments as they might be very damaging to the SIAI. I'll just say that there is no argument against active measures if you seriously believe that certain people or companies pose existential risks. Hughes' comment just highlights an important observation, that doesn't mean I support the details.

Regarding Al Gore: What it highlights is how what the SIAI says and does is as misleading as what Al Gores does. It doesn't mean that it is irrational but that people draw conclusions like the one Hughes' did based on this superficially contradictory behavior.

At the Singularity Summit's "Meet and Greet", I spoke with both Ben Geortzel and Eliezer Yudowski (among others) about this specific problem.

I am FAR more in line with Ben's position than with Eliezer's (probably because both Ben and I are either Working or Studying directly on the "how to do" aspect of AI, rather than just concocting philosophical conundrums for AI, such as the "Paperclip Maximizer" scenario of Eliezer's, which I find highly dubious).

AI isn't going to spring fully formed out of some box of parts. It may be an emergent property of something, but if we worry about all of the possible places from which it could emerge, then we might as well worry about things like ghosts and goblins that we cannot see (and haven't seen) popping up suddenly as a threat.

At Bard College on the Weekend of October the 22nd, I attended a Conference where this topic was discussed a bit. I spoke to James Hughes, head of the IEET (Institute for the Ethics of Emerging Technologies) about this problem as well. He believes that the SIAI tends to be overly dramatic about Hard Takeoff scenarios at the expense of more important ethical problems... And, he and I also discussed the specific problems of "The Scary Idea" that tend to ignore the gradual progress in understanding human values and cognition, and how these are being incorporated into AI as we move toward the creation of a Constructed Intelligence (CI as opposed to AI) that is equivalent to human intelligence.

Also, WRT this comment:

For another example, you can't train tigers to care about their handlers. No matter how much time you spend with them and care for them, they sometimes bite off arms just because they are hungry. I understand most big cats are like this.

You CANtrain (Training is not the right word for it) tigers, and other big cats to care about their handlers. It requires a type of training and teaching that goes on from birth, but there are plenty of Big Cats who don't attack their owners or handlers simply because they are hungry, or some other similar reason. They might accidentally injure a handler due to the fact that they do not have the capacity to understand the fragility of a human being, but this is a lack of cognitive capacity, and it is not a case of a higher intelligence accidentally damaging something fragile... A more intelligent mind would be capable of understanding things like physical frailty and taking steps to avoid damaging a more fragile body... But, the point still stands... Big cats can and do form deep emotional bonds with humans, and will even go as far as to try to protect and defend those humans (which, can sometimes lead to injury of the human in its own right).

And, I know this from having worked with a few big cats, and having a sister who is a senior zookeeper at the Houston Zoo (and head curator of the SW US Zoo's African Expedition) who works with big cats ALL the time.

Back to the point about AI.

It is going to be next to impossible to solve the problem of "Friendly AI" without first creating AI systems that have social cognitive capacities. Just sitting around "Thinking" about it isn't likely to be very helpful in resolving the problem.

That would be what Bertrand Russell calls "Gorging upon the Stew of every conceivable idea."

Personally, I'm a lot more worried about nasty humans taking early-stage AGIs and using them for massive destruction, than about speculative risks associated with little-understood events like hard takeoffs.

A psychotic egoist like Stalin or an non-humanist like Hitler is indeed terrifying but I'm not convinced that giving a great increase in power and intelligence to someone like a Mao or a Lord Lytton, who caused millions of deaths by doing something they thought would improve people's lives, would lead to a worse outcome than we got in reality. Granted, for something like the cultural revolution these mistakes might be subtle enough to get into an AI, but it's hard to imagine them getting a computer to say "yes, the peasants can live on 500 calories a day, increase the tariff" unless they were deliberately trying to be wrong, which they weren't.

Moral considerations aside, the real causes of the mass famines under Mao and Stalin can be understood from a perspective of pure power and political strategy. From the point of view of a strong centralizing regime trying to solidify its power, the peasants are always the biggest problem.

Urban populations are easy to control for any regime that firmly holds the reins of the internal security forces: just take over the channels of food distribution, ration the food, and make obedience a precondition for eating. Along with a credible threat to meet any attempts at rioting with bayonets and live bullets, this is enough to ensure obedience of the urban dwellers. In contrast, peasants always have the option of withdrawing into an autarkic self-sufficient lifestyle, and they will do it if pressed hard by taxation and requisitioning. In addition, they are widely dispersed, making it hard for the security forces to coerce them effectively. And in an indecisive long standoff, the peasants will eventually win, since without buying or confiscating their food surplus, everyone else starves to death.

Both the Russian and the Chinese communists understood that nothing but the most extreme measures would suffice to break the resistance of the peasantry. When the peasants responded to confiscatory measures by withdrawing to subsistence agriculture, they knew they'd have to send the armed forces to confiscate their subsistence food and let them starve, and eventually force the survivors into state-run enterprises where they'd have no more capacity for autarky than the urban populations. (In the Russian case, this job was done very incompletely during the Revolution, which was followed by a decade of economic liberalization, after which the regime finally felt strong enough to finish the job.)

(Also, it's simply untenable to claim that this was due to some special brutality of Stalin and Mao. Here is a 1918 speech by Trotsky that discusses the issue in quite frank terms. Now of course, he's trying to present it as a struggle against the minority of rich "kulaks," not the poorer peasants, but as Zinoviev admitted a few years later, "We [the Bolsheviks] are fond of describing any peasant who has enough to eat as a kulak.")

Oh yes, I see I've inadvertently fallen into that sordid old bromide about communism being a good idea that unfortunately failed to work, still- committing to an action that one knows will cause millions of deaths is quite different to learning about it as one is doing it. Certainly in the case of the British in India, their Malthusian rhetoric and victim-blaming was so at odds with their earlier talk of modernizing the continent that it sounds like a post-hoc rationalization of the genocide. I realize now though that I don't know enough about the PRC to judge whether a similar phenomenon was at work there.

It is going to be next to impossible to solve the problem of "Friendly AI" without first creating AI systems that have social cognitive capacities. Just sitting around "Thinking" about it isn't likely to be very helpful in resolving the problem.

I am guessing that this unpacks to "to create and FAI you need some method to create AGI. For the later we need to create AI systems with social cognitive capabilities (whatever that means - NLP?)". Doing this gets us closer to FAI every day, while "thinking about it" doesn't seem to.

First, are you factually aware that some progress has been made in a decision theory that would give some guarantees about the future AI behavior?

Second, yes, perhaps whatever you're tinkering with is getting closer to an AGI which is what FAI runs on. It is also getting us closer to and AGI which is not FAI, if the "Thinking" is not done first.

Third, if the big cat analogy did not work for you, try training a komodo dragon.

The idea of provably safe AGI is typically presented as something that would exist within mathematical computation theory or some variant thereof. So that's one obvious limitation of the idea: mathematical computers don't exist in the real world, and real-world physical computers must be interpreted in terms of the laws of physics, and humans' best understanding of the "laws" of physics seems to radically change from time to time. So even if there were a design for provably safe real-world AGI, based on current physics, the relevance of the proof might go out the window when physics next gets revised.

I didn't get the impression that Eliezer's goal was to "build a provably Friendly AI" (in the mathematical sense of "provable"), as Ben puts it. The impression I get is more that Eliezer wants to put off building an AI until we understand enough about morality and human values. Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don't usually have to deal with).

As an analogy, imagine you're trying to debug some complex and badly written code you were previously unfamiliar with. One approach is to find the bit in the code that seems related to the bug, and modify it locally ( "if DatabaseDown() return False" and the like) until the issue seems fixed. Another approach is to try to understand how the program works to the point where you understand which conceptual mistake caused the bug, and see the right way to fix it.

The second approach takes more time but is also less likely to create another bug somewhere else, or to deteriorate the overall quality of the code. I think most programmers who've worked on sufficiently large codebases have seen examples of both approaches.

Anyway, I get the impression that Eliezer is advocating something like the second approach here (understand how everything works before implementing), and that Ben is describing that as "proving correctness", which seems to be quite different (and much stronger!).

"Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision - provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI."

Eliezer also cares about mathematical proofs, but more for the purpose of preserving values under self-modification (something that humans don't usually have to deal with).

The provability here has to do with the AI proving to itself that modifying itself will preserve it's values (or not cause it to self-destruct or wirehead or whatever), not the designers proving the AI is non-dangerous.

I.e. friendly as "provably non-dangerous AGI" doesn't necessarily mean having a rigorous mathematical proof that the AI is not dangerous; but "merely" having enough understanding of morality when building it (as opposed to some high-level notions whose components haven't been rigorously analyzed).

The impression I get is more that Eliezer wants to put off building an AI until we understand enough about morality and human values.

Seems slightly off to me. I think EY argues that as much trouble as AGI is giving us, we'll still understand it long before we can formalize human morality well enough to simulate that directly. His suggestion of Coherent Extrapolated Volition would basically tell the AI to look to us for the answer. Instead of simulating morality this plan looks to the existing morality-simulators (us) and checks to see how much they agree on. See also this massive spoiler for a certain comic.

Well, a tendency towards mud-slinging might be counter-balanced by wanting to appear moral. Using FUD against competitors is usually regarded as a pretty low marketing strategy. Perhaps most of the mud-slinging can be delegated to anonymous minions, though.

More generally, there's going to be a lot of primate tribal politics in this space. After all, not only does it have all the usual trappings of academic arguments, it is also predicated on some pretty fundamental challenges to where power comes from and how it propagates.