In deference to Mitch Howe, I think that the idea of containment is
played out. But we're not exactly discussing keeping an AI in a box
against its will, but whether or not its existence in the world means
our immediate destruction, or if we have some game-theoretic chance of
defending ourselves.

I tend to think there is a chance that a self-improving AI could be so
smart so fast that it doesn't make sense to try to evaluate its
'power' relative to us. We could be entirely within it's mercy(if it
had any). This bare possibility is enough to warp the planning of any
AI theorist, as it's not very good to continue with plans that have
big open existential risks in them.

The question is, how likely is that? This is important, quite aside
from the problem that it's only one part of the equation. Suppose I,
Eliezer and other self improvement enthusiasts are quite wrong about
the scaling speed of self improving cognition. We might see AGI
designs stuck at infrahuman intelligence for ten years(or a thousand,
but that's a different discussion). In that ten years, do you think
that even a project that started out as friendly-compliant(whatever
that means) would remain so? I imagine even I might have trouble
continuing to treat it with the respect and developmental fear tht it
deserves. To be frank, if AGIs are stuck at comparatively low levels
of intelligence for any amount of time, they're going to be
productized, and used everywhere. That's an entirely different kind of
safety problem. Which one is the first up the knee of the curve?
Yours? IBM's? An unknown? There are safety concerns here that must be
applied societally, or at least not to a single AI.

There are other possible scenarios. Broken AGIs may exist in the
future, which could be just as dangerous as unFriendly ones. Another
spectre worried about is something I call a commandline intelligence,
with no upperlevel goal system, just mindlessly optimizing against
input commands of a priviliged user. Such a system would be
fantastically dangerous, even before it started unintended
optimization. It would be the equivalent of a stunted human upload.
all the powers, none of the intelligence upgrade.

These and other scenarios may not be realistic, but I have thought
about them. If they are possible, they deserve consideration, just as
much(relative to their possiblity, severity, and possibility of
resolution) as the possibility of a single super-fast self improving
AI, which will be either Friendly or unFriendly, depending on who
writes it.

So what's likely? What can be plan for, and what solutions may there
be? None of the above are particularly well served by planning a
Friendly architecture, and nothing else. Someone else talked about
safety measures, I think that may be a realistic thing to think about
*for some classes of error*. The question is, how likely are those
classes, and does it make sense to worry about them more than the edge
cases.

For those of you who are still shaking your heads at the impossibility
of defending against a transhuman intelligence, let me point out some
scale. If you imagine that an ascendant AI might take 2 hours from
first getting out to transcension, that's more than enough time for a
forewarned military from one of the superpowers to physically destroy
a signifance portion of internet infrastructure(mines, perhaps), and
EMP the whole world into the 17th century(ICBMs set for high altitute
airburst would take less than 45 minutes from anywhere in the
world(plus America, at least, has EMP weapons), the amount of shielded
computing centers is miniscule). We may be stupid monkeys, but we've
spent a lot of time preparing for the use of force. Arguing that we
would be impotent in front of a new threat requires some fancy
stepping. I, for one, tend to think there might be classes of danger
we could defend against, which are worth defending against.