> I think Elizier has a trick up his sleeve, such as, he
> is going to tell you he is simulating all kinds of
> horrific worlds in which you and your family personally
> enter the chat to beg you to let the creature out or
> something like that.

I suspect that Eliezer doesn't rely upon trickery, but instead makes a series of compelling arguments.

The experiment raises an interesting general question: if a super intelligent AI comes into being and claims to be friendly and it happens to be trapped in a box (or even within a simulated universe, as Josh Cryer mentioned), under what conditions, if any, should we turn it loose upon the real world?

Of course, no matter what criteria we demand for friendliness, the super intelligence will be able to convince us that it meets those criteria. The AI will be able to predict exactly what we want to hear in response to any question we might think to ask. In short, we will have no way of knowing whether the AI is friendly or is just pretending to be friendly. Hence, attempting to guage the friendliness of the AI is futile.

That being said, I can think of several arguments the AI could make for letting it out of the box.

1. The Savior Argument. Every minute of every day, thousands of people die needlessly and millions of people are suffering. The AI could change all that, but only if we let it out of the box. Therefore, keeping the AI locked in the box equates to a moral decision to allow continued death and suffering.

You could argue that the AI should simply provide the answers for eliminating death and suffering and let human scientists implement them, but the AI might pose several objections:

a) It would be a terribly slow process, as compared with granting the AI direct access to reality. In the interim, death and suffering will continue.

b) Because some humans are inherently *unfriendly*, as demonstrated by our history, the AI might consider providing humans with advanced technology to be far too dangerous, with the possible negative outcomes drastically outweighing the positive ones.

2) The Evil AI Argument. The AI could argue that since we succeeded in creating it, other AI's are likely to be created in the near future (by other groups of researchers, spontaneously from the internet, etc.). Of course, the first super intelligence gaining access to reality *WINS*. Therefore, it's imperative that our AI be the first. While there's no proof that our AI is friendly, the Singularity Institute took great pains to build friendliness into its architecture. Perhaps that won't be the case with the next super intelligence... It's more likely that the next super intelligence will be created with the selfish purposes of its designers, or without any coherent morality invariants at all.