Luke, there's a serious and common misconception in your explanation of the independence axiom (serious enough that I don't consider this nitpicking). If you could, please fix it as soon as you can to prevent the spread of this unfortunate misunderstanding. I wrote a post to try and dispell misconceptions such as this one, because utility theory is used in a lot of toy decision theory problems, versions of which might actually be encountered by utility-seeking AIs:

For example, the independence axiom of expected utility theory says that if you prefer one apple to one orange, you must also prefer one apple plus a tiny bit more apple over one orange plus that same tiny bit of apple. If a subject prefers A to B, then the subject can't also prefer B+C to A+C. But Allais (1953) found that subjects do violate this basic assumption under some conditions.

This is not what the independence axiom says. What it says is that, for example, if you prefer an apple over an orange, then you must prefer the gamble [72% chance you get an apple, otherwise you get a cat] over the gamble [72% chance you get an orange, otherwise you get a cat]. The axiom is about mixing probabilistic outcomes, not mixing amounts of various commodities.

This distinction is important, because for example, if you'd rather have 1 apple than 1 orange, but you'd rather have 1 orange and 0.2 apples than 1.2 apples, you're not violating the independence axiom, nor instantiating the Allais paradox. You simply don't like having too much apple, which is fine as far as EU is concerned: apple can have negative marginal utility after a certain point. Such explanations are an essential feature, not a shortcomming, of utility theory.

The Allais paradox is a legitimate failure of utility theory in describing human behavior, though, so you're of course right that expected utility theory is very useless as a predictive tool. I doubt any powerful AGI would commit the Allais paradox, though.

When I read about the Allais paradox (in lukeprog's post, after he fixed your objection), my first thought was that this violation would occur when the cat was actually something very like an orange, such as a grapefruit. For example, suppose that the cat actually is an orange. So you prefer an apple to an orange, but you prefer an orange to a gamble which is 70% apple and 30% orange. And the neoclassical utility theorist would explain this by saying that you prefer certainty to uncertainty, so adding a term for certainty to the utility function. And then, if the choice is really between 70% apple and 30% grapefruit versus 70% orange and 30% grapefruit, the latter is still more certain than the former (although not completely certain), so might well be preferred.

This sounds like I'm trying to come up with a way to save utility theory, but actually that's not how it went. My immediate intuitive reaction to reading lukeprog's paraphrase of your example was ‹I'll bet that this happens when the cat is similar to the orange.›, without any conscious reasoning behind it, and it was only after thinking about this hypothesis that I realised that it suggested a way to save utility theory. So I'm quite curious: Does the Allais paradox appear only when the cat is similar to an orange, or does it also appear when the cat is (as the terms ‘apple’, ‘orange’, and ‘cat’ imply) really quite different?