xgl wrote:
>
> hmmm ... if we try to get friendliness by directed evolution,
> wouldn't friendliness end up implicitly as a subgoal of survival? isn't
> that, like, bad?

"Implicit" doesn't pay the rent - if it's not represented declaratively or
procedurally *somewhere* in the source, it doesn't exist except as an
obscure historical fact.

If you start out with a Friendly AI that wants to excel at training
scenarios {as a subgoal of improving the population of Friendly AIs, as a
subgoal of creating better Friendly AIs, as a subgoal of better
implementing Friendliness}, and you create training scenarios - fitness
metrics - that test for the presence or absence of complex functional
Friendliness and Friendliness-sensitivity in problem solving, then there
should be no simple mutation that can short-circuit the AI. Though you'd
still have to deal with genetic drift in any source/content that
implements functionality you can't directly test via training scenarios.