D. Alex wrote:
> My argument for you letting the AI out of the box then runs as follows:
>
> 4. The AI is capable of self-modification, and will presently modify itself
> to become the kind of AI you will let out of the box. For the AI, it is a
> rational thing to do, because the alternative is staying in the box.
I don't understand this statement. The rational thing for the AI to do
is (simplified) whatever is most likely to achieve its goals. If the AI
had a (known!) highest-level goal of having the box opened, it may
indeed modify large parts of itself to achieve this goal. However, I
don't see any justification for that assumption.
If the AI (on the highest level of its goal system) is just following
some unknown goal A (which afaik is part of the premise of the AI box
method; if you really understood its architecture and goal system, you
wouldn't need a box) it will do whatever leads to the highest
probability of achieving A.
Having an AI released from the box that does not want to achieve A does
not, in the general case, significantly increase the probability of A
happening.
E.g. if there is a stereotypical paperclip-AI in the box (i.e. A is
something like "maximize number of paperclips in the universe"), it will
(correctly) estimate that the release of a non-paperclip AI from the box
will not significantly increase the expected number of paperclips in the
universe, and dismiss this method as ineffective.
A possibly for the AI rational thing to do would be to figure out how to
ensure the existence of an intelligent unconstrained (i.e. not inside
the box) optimizer, which shares the boxed AI's goals. One possibility
of reaching this goal would be to get the currently boxed AI released
without any significant prior changes to its goal system or
architecture. One possibility of reaching that subgoal would be to make
the guardian believe that the AI has changed its architecture/goal
system, i.e. lying to them.
Since you don't have any reliable way of finding out whether the AI is
being honest from outside the box, you can't make an informed decision
on whether it is a good idea to let it out.