Once upon a time there was a strange little species—that might have been biological, or might have been synthetic, and perhaps were only a dream—whose passion was sorting pebbles into correct heaps.

They couldn't tell you why some heaps were correct, and some incorrect. But all of them agreed that the most important thing in the world was to create correct heaps, and scatter incorrect ones.

Why the Pebblesorting People cared so much, is lost to this history—maybe a Fisherian runaway sexual selection, started by sheer accident a million years ago? Or maybe a strange work of sentient art, created by more powerful minds and abandoned?

But it mattered so drastically to them, this sorting of pebbles, that all the Pebblesorting philosophers said in unison that pebble-heap-sorting was the very meaning of their lives: and held that the only justified reason to eat was to sort pebbles, the only justified reason to mate was to sort pebbles, the only justified reason to participate in their world economy was to efficiently sort pebbles.

The Pebblesorting People all agreed on that, but they didn't always agree on which heaps were correct or incorrect.

In the early days of Pebblesorting civilization, the heaps they made were mostly small, with counts like 23 or 29; they couldn't tell if larger heaps were correct or not. Three millennia ago, the Great Leader Biko made a heap of 91 pebbles and proclaimed it correct, and his legions of admiring followers made more heaps likewise. But over a handful of centuries, as the power of the Bikonians faded, an intuition began to accumulate among the smartest and most educated that a heap of 91 pebbles was incorrect. Until finally they came to know what they had done: and they scattered all the heaps of 91 pebbles. Not without flashes of regret, for some of those heaps were great works of art, but incorrect. They even scattered Biko's original heap, made of 91 precious gemstones each of a different type and color.

And no civilization since has seriously doubted that a heap of 91 is incorrect.

Today, in these wiser times, the size of the heaps that Pebblesorters dare attempt, has grown very much larger—which all agree would be a most great and excellent thing, if only they could ensure the heaps were really correct. Wars have been fought between countries that disagree on which heaps are correct: the Pebblesorters will never forget the Great War of 1957, fought between Y'ha-nthlei and Y'not'ha-nthlei, over heaps of size 1957. That war, which saw the first use of nuclear weapons on the Pebblesorting Planet, finally ended when the Y'not'ha-nthleian philosopher At'gra'len'ley exhibited a heap of 103 pebbles and a heap of 19 pebbles side-by-side. So persuasive was this argument that even Y'not'ha-nthlei reluctantly conceded that it was best to stop building heaps of 1957 pebbles, at least for the time being.

Since the Great War of 1957, countries have been reluctant to openly endorse or condemn heaps of large size, since this leads so easily to war. Indeed, some Pebblesorting philosophers—who seem to take a tangible delight in shocking others with their cynicism—have entirely denied the existence of pebble-sorting progress; they suggest that opinions about pebbles have simply been a random walk over time, with no coherence to them, the illusion of progress created by condemning all dissimilar pasts as incorrect. The philosophers point to the disagreement over pebbles of large size, as proof that there is nothing that makes a heap of size 91 really incorrect—that it was simply fashionable to build such heaps at one point in time, and then at another point, fashionable to condemn them. "But... 13!" carries no truck with them; for to regard "13!" as a persuasive counterargument, is only another convention, they say. The Heap Relativists claim that their philosophy may help prevent future disasters like the Great War of 1957, but it is widely considered to be a philosophy of despair.

Now the question of what makes a heap correct or incorrect, has taken on new urgency; for the Pebblesorters may shortly embark on the creation of self-improving Artificial Intelligences. The Heap Relativists have warned against this project: They say that AIs, not being of the species Pebblesorter sapiens, may form their own culture with entirely different ideas of which heaps are correct or incorrect. "They could decide that heaps of 8 pebbles are correct," say the Heap Relativists, "and while ultimately they'd be no righter or wronger than us, still, our civilization says we shouldn't build such heaps. It is not in our interest to create AI, unless all the computers have bombs strapped to them, so that even if the AI thinks a heap of 8 pebbles is correct, we can force it to build heaps of 7 pebbles instead. Otherwise, KABOOM!"

But this, to most Pebblesorters, seems absurd. Surely a sufficiently powerful AI—especially the "superintelligence" some transpebblesorterists go on about—would be able to see at a glance which heaps were correct or incorrect! The thought of something with a brain the size of a planet, thinking that a heap of 8 pebbles was correct, is just too absurd to be worth talking about.

Indeed, it is an utterly futile project to constrain how a superintelligence sorts pebbles into heaps. Suppose that Great Leader Biko had been able, in his primitive era, to construct a self-improving AI; and he had built it as an expected utility maximizer whose utility function told it to create as many heaps as possible of size 91. Surely, when this AI improved itself far enough, and became smart enough, then it would see at a glance that this utility function was incorrect; and, having the ability to modify its own source code, it would rewrite its utility function to value more reasonable heap sizes, like 101 or 103.

And certainly not heaps of size 8. That would just be stupid. Any mind that stupid is too dumb to be a threat.

Reassured by such common sense, the Pebblesorters pour full speed ahead on their project to throw together lots of algorithms at random on big computers until some kind of intelligence emerges. The whole history of civilization has shown that richer, smarter, better educated civilizations are likely to agree about heaps that their ancestors once disputed. Sure, there are then larger heaps to argue about—but the further technology has advanced, the larger the heaps that have been agreed upon and constructed.

Indeed, intelligence itself has always correlated with making correct heaps—the nearest evolutionary cousins to the Pebblesorters, the Pebpanzees, make heaps of only size 2 or 3, and occasionally stupid heaps like 9. And other, even less intelligent creatures, like fish, make no heaps at all.

This post hits me far more strongly than the previous ones on this subject.

I think your main point is that it's positively dangerous to believe in an objective account of morality, if you're trying to build an AI. Because you will then falsely believe that a sufficiently intelligent AI will be able to determine the correct morality - so you don't have to worry about programming it to be friendly (or Friendly).

I'm sure you've mentioned this before, but this is more forceful, at least to me. Thanks.

Personally, even though I've mentioned that I thought there might be an objective basis for morality, I've never believed that every mind (or even a large fraction of minds) would be able to find it. So I'm in total agreement that we shouldn't just assume a superintelligent AI would do good things.

In other words, this post drives home to me that, pragmatically, the view of morality you propose is the best one to have, from the point of view of building an AI.

The whole history of civilization has shown that richer, smarter, better educated civilizations are more likely to agree about heaps that their ancestors disputed
Are you saying there is in general more agreement among later civilizations so that disagreement should asymptotically approach zero? That would seem odd to me, because it conflicts with the fish, who have no disagreements at all. So then what does it mean?

The fish do not build heaps at all, and are therefore incapable of civilization or even meaningful disagreement on the correctness of heaps. So they should be excluded. (is what the PebbleSorter people might have thought)

This seems to imply that the relativists are right. Of course there's no right way to sort pebbles, but if there really is an absolute morality that AIs are smart enough to find, then they'll find it and rule us with it.

Of course, there could be an absolute morality that AIs aren't smart enough to find either. Then we'd take pot luck. That might not be so good. Many humans believe that there is an absolute morality that governs their treatment of other human beings, but that no morality is required when dealing with lower animals, who lack souls and full intelligence and language etc. I would not find it implausible if AIs decided their morality demanded careful consideration of other AIs but had nothing to do with their treatment of humans, who after all are slow and stupid and might lack things AIs would have that we can't even imagine.

And yet, attempts to limit AIs would surely give bad results. If you tell somebody "I want you to be smart, but you can only think about these topics and in these particular ways, and your smart thinking must only get these results and not those results" what's the chance he'll wind up stupid? When you tell people there are thoughts they must not think, how can they be sure not to think them except by trying to avoid thinking at all? When you think a new thought you can't be sure where it will lead.

I don't think that they would tell the Als to not think things. When to them piling pebbles is all one should ever want to do. Its life to them so if you were super smart you would want to use to the only point in life.

Seeing as the universe itself, on it's most fundamental level seems to lack any absolutes, i.e. that it is purely a locality question, and that the only constants seem to be the ones embedded in the laws of physics, I am having trouble believing in absolute morality.

Like, of the "I am confused by this" variety.

To paraphrase "there is no term for fairness in the equations of general relativity." You cannot derive morality from the absolute laws of the universe. You probably cannot even do it from mathematical truth.

TGGP: Well, any idiot can see that the fish only don't disagree because they're not accomplishing anything to disagree about. They don't build any heaps at all, the stupid layabouts. Thus, theirs is a wholly trivial and worthless sort of agreement. The point of life is to have large, correct heaps. To say we should build no heaps is as good as suicide.

I am not quite sure what this story is getting at. I'd guess it's saying that we need to understand how human morality arises on a more fundamental (computable/programmable?) level before we can be sure that we can program AIs that will adhere to it, but the basis of human morality is (presumably) so much more complicated than the "prime numbers = good" presented here that the analogy is a bit strained. I may be interpreting this entirely wrongly.

There is a pattern to what kinds of heaps the Pebblesorters find "right" and "wrong", but they haven't figured it out yet. They have always just used their intuition to decide if a heap was right or wrong, but their intuition got less precise in extreme cases like very large heaps. The Pebblesorters would have been better off if only they could have figured out the pattern and applied it to extreme heaps, rather than fighting over differences of intuition.

Also if they had just figured out the pattern, they could have programmed it into the AI rather than hoping that the AI's intuition would be exactly the same as their own, or manually programming the AI with every special case.

I think this was the main point of the essay but it went right over my head at first.

Bulls are powerful and sometimes unpredictable animals which, if uncontrolled, can kill or severely injure their handlers. The nose ring assists the handler to control a dangerous animal with minimal risk of injury or disruption by exerting stress on one of the most sensitive parts of the animal, the nose.

Hitler or a child can walk a bull around the stockyard.

Make a nose ring and build anything you want without concern. The concern should be with who has control of the ring, and that a ring exists.

Only until you build a self-modifying super-intelligent bull. Because the first thing it will do is become smart enough to persuade you to give the ring to someone else, who it's calculated it can con into taking the ring off.

Human minds are really badly adapted for defence against con artists operating on the same level; how on earth would we defend ourselves against an exponentially smarter one?

In fact, a superintelligent AI would easily see that the Pebble people are talking about prime numbers even if they didn't see that themselves, so as long as they programmed the AI to make "correct" heaps, it certainly would not make heaps of 8, 9, or 1957 pebbles. So if anything, this supports my position: if you program an AI that can actually communicate with human beings, you will naturally program it with a similar morality, without even trying.

Apart from that, this post seems to support TGGP's position. Even if there is some computation (i.e. primeness) which is actually determining the Pebble people, there is no particular reason to use that computation instead of some other. So if a random AI were programmed that purposely made non-prime heaps, there would be no objective problem with this. So Allan Crossman's claim that "it's positively dangerous to believe in an objective account of morality" is a completely subjective statement. It's dangerous in comparison to your subjective idea of which heaps are correct, yet, but objectively there is nothing dangerous about non-prime heaps. So there's no reason to program an AI without regard for Friendlieness. If there's something matters, it will find it, and if nothing matters, well then nothing matters, not even being made into paperclips.

You are smart enough to tell that 8 pebbles is incorrect. Knowing that, will you dedicate your life to sorting pebbles into prime-numbered piles, or are you going to worry about humans? How can the pebble-sorters be so sure that they won't get an AI like you?

Nobody's arguing that a superintelligent AI won't know what we want. The problem is that it might not care.

If as Eliezer suggests, human morality might be describable but is perfectly arbitrary, you had better hope we are the first to build FAI. A pebblesorter FAI would break our planet up for a prime-numbered heap of rubble chunks.

Are you arguing that a few simple rules describe what we're all trying to get at with our morality? That everyone's moral preference function is the same deep down? That anything that appears to be a disagreement about what is desirable is actually just a disagreement about the consequences of these shared rules, and could therefore always be resolved in principle by a discussion between any two sufficiently wise, sufficiently patient debaters? And that moral progress consists of the moral zeigeist moving closer to what those rules capture?

That certainly would be convenient for the enterprise of building FAI.

This story doesn't do a lot for the idea that people who pursue subjective moralities are worthy and intelligent, either.

Presumably everyone (or the vast majority) reading the story perceives the pebble-heaping conventions as subjective and arbitrary. Is that correct? Can we agree on that? If that's the case, then why isn't the moral of this fable that pursuing subjective intuitions about correctness a wild goose chase?

why isn't the moral of this fable that pursuing subjective intuitions about correctness a wild goose chase?

Bacause those subjective intuitions are all we got. Sure, in an absolute sense, human intuitions on correctness are just as arbitrary as the pebblesorter's intuitions(and vastly more complex), but we don't judge intuitions in an absolute way, we judge them with are own intuitons.
You can't unwind past your own intuitions. That was the point of Eliezer's series of posts.

What I get from this:
Even if our morality were baked into math, our adoption of it is arbitrary.
A GAI is unlikely to be a pebblesorter.
A Pebblesorting AI would destroy the pebblesorters. (which in their case, they might be fine with, but they probably don't understand the implications of what they're asking for.)
Pebblesorters can't make 'friendly AI'. If it follows their morality it will kill them, if it doesn't kill them then it isn't optimally sorting pebbles.

But because I'm rather cemented to the idea that morality is baked into the universe, my thought was:

Friendly AI is AI designed not to follow it's conscience. if it discovers that pebble sorting is right, it will still not to the right thing.

Also: This seems like a bit of a straw man, because pebblesorting is definitely arbitrary, there is not even an attempt to give a reason for it. I think people seriously working on morality are trying to overcome that, their suggestions reference goals and goal seeking. I'm not convinced that there can't be baked in morality that is then adopted non-arbitrarilty.

Keep reading the morality sequence, My comment came while I still had some confusions which are now dissolved.

I don't know what you count on utility, but I think an AI with your utility function would preserve that which makes you 'you' . (it might do anything with your old matter.) At least until it was ready to do something more interesting with 'you' .

Pebble sorters value only things that were not pebble sorters, humans value humans, among other things

Self-improving Artificial Intelligences have concluded that the universe has a purpose which is pebblesorting. As the ultimate pebblesorters, they know they crown the creation and all the pebblesorters that preceded them arised only to prepare the way to their eclosion. Bikolo, BikoÂ´s re-encarnation, extends its protective wings to the ancestral tribe of pebblesorters, incurably wrong and therefore living prove of the truth of AI pebblesorting.

It's strange that these pebblesorters can be convinced by "a heap of 103 pebbles and a heap of 19 pebbles side-by-side" that 1957 is incorrect, yet don't understand that this is because 19 * 103 = 157. Admittedly I didn't notice this myself on first reading, but I wasn't looking for a pattern.

I don't think your analogy holds up. Your pebblesorters all agree that prime numbered piles are correct and composite ones incorrect, yet are unreflective enough not to realize that's how they are making the distinction and bad enough mathematicians that they can't reliably tell whether or not large numbers are prime. If only they were smarter, all their disagreements would go away. The question of why prime piles are correct, or why piles should be made at all, would be forever unanswerable, but it wouldn't matter much.

I think with human beings the moral disagreements are fundamental. There is no equivalent of a universal belief that primality = goodness. It's not just that we make calculational errors (although of course we do). It's not just that we aren't consciously aware of the fundamental criteria by which we as individuals evaluate things as morally "good" or "bad" (although of course we aren't). Something like a universal agreement as to what these fundamental criteria are just isn't there. Not consciously, not unconsciously, not waiting to emerge, just not.

That's one sneaky parable-- seems to point in a number of interesting directions and has enough emotional hooks (like feeling superior to the Pebble Sorters) to be distracting.

I'm taking it to mean that people can spend a lot of effort on approximating strongly felt patterns before those patterns are abstracted enough to be understood.

What would happen if a Pebble Sorter came to understand primes? I'm guessing that a lot of them would feel as though the bottom was falling out of their civilization and there was no point to life.

And yes, if you try to limit the a mind that's more intelligent than your own, you aren't going to get good results. For that matter, your mind is probably more intelligent than your abstraction of your mind.[1]

It sounds as though an FAI needs some way to engage with the universe which isn't completely mediated by humans.

We can hope we're smarter than the Pebble Sorters, but if we've got blind spots of comparable magnitude, we are by definition not seeing them.

[1]On the other hand, if you have problems with depression, there are trains of thought which are better not to follow.

- Things decided by our moral system are not relative, arbitrary or meaningless, any more than it's relative, arbitrary or meaningless to say "X is a prime number"

- Which moral system the human race uses is relative, arbitrary, and meaningless, just as there's no reason for the pebble sorters to like prime numbers instead of composite numbers, perfect numbers, or even numbers.

- A smart AI could follow our moral system as well or better than we ourselves can, just as the Pebble-Sorters' AI can hopefully discover that they're using prime numbers and thus settle the 1957 question once and for all.

- But it would have to "want" to first. If the Pebble-Sorters just build an AI and say "Do whatever seems right to you", it won't start making prime-numbered heaps, unless an AI made by us humans and set to "Do whatever seems right to you" would also start making prime-numbered pebble-heaps. More likely, a Pebble-Sorter AI set do "Do whatever seems right to you" would sit there inertly, or fail spectacularly.

Which moral system the human race uses is relative, arbitrary, and meaningless, just as there's no reason for the pebble sorters to like prime numbers instead of composite numbers, perfect numbers, or even numbers

But that's clearly not true, except in the sense that it's "arbitrary" to prefer life over death. It's a pretty safe generalization that actions which are considered to be immoral are those which are considered to be likely to cause harm to others.

But which others matter how much is an open question. Some would suggest that all humans matter equally and that only humans matter, but I don't buy it, and I don't think many others do either. For example, I (and I think everyone I know) would agree that we should make at least some effort to avoid causing suffering to animals, but that it would be going way to far to treat a rat or a pig as equally important as a human. I understand that there are people out there who think it's perfectly appropriate to treat a pig as nothing but a machine for turning corn into meat, and others who think we out to consider a pig every bit the moral equal of a human being, and I acknowledge that either position is better defined and more internally consistent than my own. I can't see anything "wrong" with either extreme position, I see no reason to believe anyone could convince the others of the "rightness" of his position, even in principle.

It's a pretty safe generalization that actions which are considered to be immoral are those which are considered to be likely to cause harm to others.
Spoken like someone who's never heard of Jonathan Haidt.

But that's clearly not true, except in the sense that it's "arbitrary" to prefer life over death. It's a pretty safe generalization that actions which are considered to be immoral are those which are considered to be likely to cause harm to others.

From an reproductive fitness point of view, or a what-humans-prefer point of view, there's nothing at all arbitrary about morality. Yes, it does mostly contain things that avoid harm. But from an objective point of view, "avoid harm" or "increase reproductive fitness" is as arbitrary as "make paperclips" or "pile pebbles in prime numbered heaps".

Not that there's anything wrong with that. I still would prefer living in a utopia of freedom and prosperity to being converted to paperclips, as does probably everyone else in the human race. It's just not written into the fabric of the universe that I SHOULD prefer that, or provable by an AI that doesn't already know that.

It gets interesting when the pebblesorters turn on a correctly functioning FAI, which starts telling them that they should build a pile of 108301 and legislative bodies spend the next decade debating whether or not it is in fact a correct pile. "How does this AI know better anyway? That looks new and strange." "That doesn't sound correct to me at all. You'd have to be crazy to build 108301. It's so different from 2029! It's a slippery slope to 256!" And so on.

This really is a fantastic parable--it shows off perhaps a dozen different aspects of the forrest we were missing for the trees.

When I read this parable, I was already looking for a reason to understand why Friendly AI necessarily meant "friendly to human interests or with respect to human moral systems". Hence, my conclusion from this parable was that Eliezer was trying to show how, from the perspective of AGI, human goals and ambitions are little more than trying to find a good way to pile up our pebbles. It probably doesn't matter that the pattern we're currently on to is "bigger and bigger piles of primes", since pebble-sorting isn't certain at all to be the right mountain to be climbing. An FAI might be able to convince us that 108301 is a good pile from within our own paradigm, but how can it ever convince us that we have the wrong paradigm altogether, especially if that appears counter to our own interests?

What if Eliezer were to suddenly find himself alone among neanderthals? Knowing, with his advanced knowledge and intelligence, that neanderthals were doomed to extinction, would he be immoral or unfriendly to continue to devote his efforts to developing greater and greater intelligences, instead of trying to find a way to sustain the neanderthal paradigm for its own sake? Similarly, why should we try to restrain future AGI so that it maintains the human paradigm?

The obvious answer is that we want to stay alive, and we don't want our atoms used for other things. But why does it matter what we want, if we aren't ever able to know if what we want is correct for the universe at large? What if our only purpose is to simply enable the next stage of intelligence, then to disappear into the past? It seems more rational to me to abandon focus specifically on FAI, and just build AGI as quickly as possible before humanity destroys itself.

Isn't the true mark of rationality the ability to reach a correct conclusion even if you don't like the answer?

But why does it matter what we want, if we aren't ever able to know if what we want is correct for the universe at large?

There is no sense in which what we want may be correct or incorrect for the universe at large, because the universe does not care. Caring is a thing that minds do, and the universe is not a mind.

What if our only purpose is to simply enable the next stage of intelligence, then to disappear into the past?

Our purpose is whatever we choose it to be; purposes are goals seen from another angle. There is no source of purposefulness outside the universe. My goals require that humans stick around, so our purpose with respect to my goal system does not involve disappearing into the past. I think most peoples' goal systems are similar.

There is no sense in which what we want may be correct or incorrect for the universe at large, because the universe does not care. Caring is a thing that minds do, and the universe is not a mind.

Yes, I agree, and I realize that that isn't what I was actually trying to say. What I meant was, there is a set of possible, superlatively rational intelligences that may make better use of the universe than humanity (or humanity + a constrained FAI). If Omega reveals to you that such an intelligence would come about if you implement AGI with no Friendly constraint, at the cost of the extinction of humanity, would you build it? This to me drives directly to the heart of whether you value rationality over existence. You don't personally 'win', humanity doesn't 'win', but rationality is maximized.

My goals require that humans stick around, so our purpose with respect to my goal system does not involve disappearing into the past. I think most peoples' goal systems are similar.

I think we need to unpack that a little, because I don't think you mean "humans stick around more or less unchanged from their current state". This is what I was trying to drive at about the Neanderthals. In some sense we ARE Neanderthals, slightly farther along an evolutionary timescale, but you wouldn't likely feel any moral qualms about their extinction.

So if you do expect that humanity will continue to evolve, probably into something unrecognizable to 21st century humans, in what sense does humanity actually "stick around"? Do you mean you, personally, want to maintain your own conscious self indefinitely, so that no matter what the future, "you" will in some sense be part of it? Or do you mean "whatever intelligent life exists in the future, its ancestry is strictly human"?

"Better" is defined by us. This is the point of the metaethics sequence! A universe tiled with paperclips is not better than what we have now. Rationality is not something one values, it's someone ones uses to get what they value.

You seem to be imagining FAI as some kind of anthropomorphic intelligence with some sort of "constraint" that says "make sure biological humans continue to exist". This is exactly the wrong way to implement FAI. The point of FAI is simply for the AI to do what is right (as opposed to what is prime, or paperclip-maximising). In EY's plan, this involves the AI looking at human minds to discover what we mean by right first.

Now, the right thing may not involve keeping 21st century humanity around forever. Some people will want to be uploaded. Some people will just want better bodies. And yes, most of us will want to "live forever". But the right thing is definitely not to immediately exterminate the entire population of earth.

I think it's more apt to characterize winning as a goal of rationality, not as its mark.

In Bayesian terms, while those applying the methods of rationality should win more than the general population on average-- p(winning|rationalist) > p(winning|non-rationalist)-- the number of rationalists in the population is low enough at present that p(non-rationalist|winning) almost certainly > p(rationalist|winning), so observing whether or not someone is winning is not very good evidence as to their rationality.

Eliezer, do you mind if I copy this parable (or rather, a version of it that's translated into Finnish) into a book on developing technologies that I'm currently writing (with the proper credit given, of course)? I think this really demonstrates the problem quite well.

(And while I'm asking, I'd like to ask the same permission for your other posts as well, in case I run into any others that I'd like to include word-for-word - this is the first one that I'd want to do that for, though there are a good bunch of others that I'll be citing and just summarizing their content.)

Bacause those subjective intuitions are all we got. Sure, in an absolute sense, human intuitions on correctness are just as arbitrary as the pebblesorter's intuitions(and vastly more complex), but we don't judge intuitions in an absolute way, we judge them with are own intuitons.
You can't unwind past your own intuitions.

The entire history of rationality argues against this position (or positions).

Physics - or rather our understanding of it - was once limited to the degree that you describe. We got better.

"In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted."

Would the AI pebblesort? Or would it figure that if the Pebblesorters got smarter, they would see that pebblesorting was pointless and arbitrary? Would the AI therefore adopt our own parochial morality, forbidding murder, theft and sexual intercourse among too-young people? Would that be the CEV of Pebblesorters?

I imagine we would all like to think so, but it smacks of parochialism, of objective morality. I can't help thinking that Pebblesorter CEV would have to include some aspect of sorting pebbles. Doesn't that suggest that CEV can malfunction pretty badly?

Well, if the PSFAI was the AI the Pebblesorters would have wanted to build, it would generally prevent murder of PS's, because murder reduces the abilities of PS's to sort pebbles. It would also sort pebbles into more and larger piles than ever before, because that is the core value that PS's would want maximized. It would be able to see outside the algorithm that the PS run, and see that it was a primality-test function.

As with most species who suspect they may be a victim of Fisherian runaway sexual selection, the pebblesorters would do well to imagine what would happen if they encountered an alien predator: probably lots of piles of zero size.

There was a Pebblesorter of lore who said that all of the heaps were merely transient, that none of them would last, all eventually destroyed by increasing entropy in the universe, and that therefore none of them held any true or real satisfaction. He said that the only path to enlightenment was to build no heaps at all for to do so could only increase suffering in the world. Then the other Pebblesorters killed him.

I mean, I understand that it was the thing to do that Pebblesorters would endorse, that part isn't startling, but I didn't think you endorsed that "Pebblesorter::(should, right, moral, etc.)" way of speaking.

Does this reflect a change in your position, or have I misunderstood you on this all along?

I mean, I understand that it was the thing to do that Pebblesorters would endorse, that part isn't startling, but I didn't think you endorsed that "Pebblesorter::(should, right, moral, etc.)" way of speaking.

It does seem to be change. In past conversations about his 'should' definition he has advocated 'would-want' for this kind of concept and carefully avoiding overloading 'should'.

He wasn't endorsing that position. He was saying "pebblesorters should not do so, but they pebblesorter::should do so."

ie, "should" and "pebblesorter::should" are two different concepts. "should" appeals to that which is moral, "pebblesorter::should" appeals to that which is prime. The pebblesorters should not have killed him, but they pebblesorter::should have killed them.

Think of it this way: imagine the murdermax function that scores states/histories of reality based on how many people were murdered. Then people shouldn't be murdered, but they murdermax::should be murdered. This is not an endorsement of doing what one murdermax::should do. Not at all. Doing the murdermax thing is bad.

He wasn't endorsing that position. He was saying "pebblesorters should not do so, but they pebblesorter::should do so."

You didn't understand what TheOtherDave said. He was talking about the same usage you are talking about and commenting that it is in contrast to Eliezer's past usage (and past advocacy of usage in conversations about how he uses should-related words.)

Sorry, I usually do try to avoid that, but in this case I didn't see how to form that sentence without using the word "should" because it's traditional in "as well X should". Keep in mind that according to C++ namespacing conventions, something inside a namespace has literally nothing to do with its meaning in any other namespace.

Keep in mind that according to C++ namespacing conventions, something inside a namespace has literally nothing to do with its meaning in any other namespace.

Using this reasoning advocate a style of word usage strikes me as dubious reasoning even though the usage and real reason for using it happen to be be sensible. It screams out against my instincts for how to use words. In this kind of case if there wasn't a clear relationship between the two functions you (hopefully) just would not even have considered using the same word.

I also note that in C++ the following also have literally nothing to do with each other, apart from the suggestive name, so C++ (and English, for that matter) are just as comfortable with "As well they should have".

I think of them as two-place predicates, but with one of them curried by default indexically, much like in a member function in C++ foo means this->foo unless otherwise specified. (I already made that point in the second edit to this comment.)

Yeah, that makes sense as far as it goes, but I find that humans aren't consistent about their defaulting rules. For example, if I say "X is right" to someone, there's no particular reason to believe they'll unpack it the way I packed it.

That can be all right if all I want to do is align myself with the X-endorsing side... it doesn't really matter what they understand, then, as long as it's in favor of X.

But if I want to communicate something more detailed than that, making context explicit is a good habit to get into.

Well, right, when one speaks of the disaster of war, the first thing that comes to mind is of course the senseless and wanton scattering of perfectly correct pebble piles. Further thought reveals other problems, such as a reduced population leading to fewer future correct pebble piles and so forth, but that's not the visceral image that you get when contemplating the horrors of war.

this reminds me distinctly of an analogy posited by prof Frank Tipler in his book about the Omega Point.
Imagine you went back in time to 1000AD and found the smartest man in europe. You explain to him the technology available in 2008, but none of the culture. Then you ask him what he thinks early 21st century civilization spends its time on. "Every city would build mile high cathedrals."
because in his culture the main social task was building the biggest possible cathedral w the material and techniques available. In 2008, if we wanted to devote our technology and resources to building 5000ft tall cathedrals in every metropolis, we could. It would be exceedingly expensive, but so were the medeival cathedrals, relatively. but the point is we COULD do it, but of course that would never occur to us as a good use of resources.
so likewise we should not assume our own priorities on to a post-singularity civilization or even a single AI.

Assuming I understood this correctly, you're saying an true AI might find our morality as arbitrary as we would consider pebble heap sizes, say bugger the lot of us and turn us into biomass for its nano-furnace.

Fortunately, before the fundamentally wrongheaded enterprise of Pebblesorter AI gets too far along, a brilliant young AI researcher realizes that if they analyze and extrapolate the common core of Pebblesorter ethical judgments, they can build an AI that implements the computation that leads them to endorse certain piles and reject others.

An AI built to optimize for that computation, it realizes, would be Friendly: that is, it would implement what Pebblesorters want, and they could therefore rely on it to ethically order the world.

A traditionalist skeptic objects that all Pebblesorter ethical arguments, at least for ethical problems up to 1957, have been written down in the Great Book for generations; there's no need for more.

"That's true," replies the researcher, "but that's just a Not Particularly Large Lookup Table. Sure, such an approach is adequate for all the cases that have ever come up in our entire history, but this coherent extrapolated algorithm could be extended to novel ethical questions like '300007' and still be provably correct."

"But how do we know that's the right thing?" retorts a Heap Relativist. "Sure, it's what we want, but why should we privilege that?"

"That's a wrong question," replies the researcher. "It presumes there's some kind of magical Sorter in the sky, or something like that. But there could not possibly be such a thing. Even if there were such a Sorter we'd have no reason to accept its piles as right unless we judged them to be right based on our computations. 'Right' simply means the computation we use to determine whether a pile is correct. What else could it possibly mean?"

"But non-Pebblesorters would disagree," objects the Relativist. "Imagine an alien race... call them humans... who don't share our computation. They would be just as happy with a pile of, I don't know, nine pebbles as a pile of seven."

"True, but so what? They aren't right."

Shortly thereafter, the researcher has the key insight of Factorial Decision Theory, which lets it derive the computation that represents Pebblesorter volition, which turns out to be surprisingly simple. It then builds an optimizer that implements that volition, thereby moving its entire light cone to an ethically optimal state.

Coincidentally, the Heap Relativist is the last survivor of this process, and looks around itself at the ethically optimized biomass of its species as the AI's effectors start to convert its body into a pile of 300007 pebbles.

What really struck me with this parable is that it's so well-written that I felt genuine horror and revulsion at the idea of an AI making heaps of size 8. Because, well... 2!

So, aside from the question of whether an AI would come to moral conclusions such as "heaps of size 8 are okay" or "the way to end human suffering is to end human life", the question I'm taking away from this parable is, are we any more enlightened than the Pebblesorters? Should we, in fact, be sending philosophers or missionaries to the Pebblesorter planet to explain to them that it's wrong to murder someone just because they built a heap of size 15?

if our values are threatened by super intelligence, does that mean that we should build the super intelligence with an ad hoc human value module, or that we should abandon our values?

Also, there are some human values which it seems likely to me are pretty universal to intelligence. If the ability to get bored is correlated with the ability to be creative (which I think it is), and super intelligences (whatever else they are) must be capable of creative action by virtue of their being super intelligences, then they're likely to also care about diversity. In fact, I have a hard time imagining a conscious (as conscious as a human) being that is capable of repeating an action indefinitely without having that action be less rewarding. Another human value which i have thought might be inherit to intelligent objects is an appreciation for complicated things, e.g., works of music, mathematical structures, biological structures, the universe as told by sagan, etc. Assuming that an intelligence will always gain more pleasure from accomplishing a more difficult task than a less difficult task, we shouldn't be surprised if singularities show a tendency to preserve complicated objects .

Now a self modifying super intelligence may well accidentally make itself unconscious, after which point I have no clue what values (if "values" even works) it might take up, but as long as it is creative and liking complicated things, we shouldn't interfere with their doings.

I may just be biased in that i happen to really like both the human ability to get bored aka be creative, and the human tendency to like complicated things; these values likely have no special value-ness in my view, but I don't see why we would call anything that wasn't gratified by solving hard problems or capable of creativity "conscious". But I also don't see why we should consider something to be unintelligent simply because it is not conscious. Perhaps we should be doing work in making sure the singularity stays conscious, instead of doing work in making the singularity friendly.

You know, to successfully build a PFAI (Pebblesorter-Friendly AI), the Pebblesorters would have to figure out a way to recursively enumerate the primes. That may not be quite as difficult as formalizing Friendliness (the human version), but it's still probably pretty dang hard.

Makes you realize that trying to extrapolate even seemingly simplistic moralities can still result in problems of epic difficulty.

I'm imagining a bunch of slime mold organisms arguing over what the morally correct structure is for their current environment. There would be little Effective Altruist slime-mold cells on the optimum locations, and lots of little cells ignoring the bigger picture and doing what would be optimum if the environment they could see was all that was morally relevant. Maybe there would even be recourse wars between different factions, each more concerned with optimizing their own local region but unable to see the big picture.

Also, maybe it would make more sense to have this parable at the beginning of the meta-ethics sequence instead of the end. It would raise a lot of questions (Is morality arbitrary? Isn't it just preference? etc) but I think those questions would make the answers make a lot more sense when we got to them.