The Absolute Self-Selection Assumption

There are many con­fused dis­cus­sions of an­thropic rea­son­ing, both on LW and in sur­pris­ingly main­stream liter­a­ture. In this ar­ti­cle I will dis­cuss UDASSA, a frame­work for an­thropic rea­son­ing due to Wei Dai. This frame­work has se­ri­ous short­com­ings, but at pre­sent it is the only one I know which pro­duces rea­son­able an­swers to rea­son­able ques­tions; at the mo­ment it is the only frame­work which I would feel com­fortable us­ing to make a real de­ci­sion.

I will dis­cuss 3 prob­lems:

1. In an in­finite uni­verse, there are in­finitely many copies of you (in­finitely many of which are Boltz­mann brains). How do you as­sign a mea­sure to the copies of your­self when the uniform dis­tri­bu­tion is un­available? Do you rule out spa­tially or tem­po­rally in­finite uni­verses for this rea­son?

2. Naive an­throp­ics ig­nore the sub­strate on which a simu­la­tion is run­ning and count how many in­stances of a simu­lated ex­pe­rience ex­ist (or how many dis­tinct ver­sions of that ex­pe­rience ex­ist). Th­ese be­liefs are in­con­sis­tent with ba­sic in­tu­itions about con­scious ex­pe­rience, so we have to aban­don some­thing in­tu­itive.

3. The Born prob­a­bil­ities seem mys­te­ri­ous. They can be ex­plained (as well as any law of physics can be ex­plained) by UDASSA.

Why An­thropic Rea­son­ing?

When I am try­ing to act in my own self-in­ter­est, I do not know with cer­tainty the con­se­quences of any par­tic­u­lar de­ci­sion. I com­pare prob­a­bil­ity dis­tri­bu­tions over out­comes: an ac­tion may lead to one out­come with prob­a­bil­ity 1⁄2, and a differ­ent out­come with prob­a­bil­ity 1⁄2. My brain has prefer­ences be­tween prob­a­bil­ity dis­tri­bu­tions built into it.

My brain is not built with the ma­chin­ery to de­cide be­tween differ­ent uni­verses each of which con­tains many simu­la­tions I care about. My brain can’t even re­ally grasp the no­tion of differ­ent copies of me, ex­cept by first con­vert­ing to the lan­guage of prob­a­bil­ity dis­tri­bu­tions. If I am fac­ing the prospect of be­ing copied, the only way I can grap­ple with it is by rea­son­ing “I have a 50% chance of re­main­ing me, and a 50% chance of be­com­ing my copy.” After think­ing in this way, I can hope to in­tel­li­gently trade-off one copy’s prefer­ences against the other’s us­ing the same ma­chin­ery which al­lows me to make de­ci­sions with un­cer­tain out­comes.

In or­der to perform this rea­son­ing in gen­eral, I need a bet­ter frame­work for an­thropic rea­son­ing. What I want is a prob­a­bil­ity dis­tri­bu­tion over all pos­si­ble ex­pe­riences (or “ob­server-mo­ments”), so that I can use my ex­ist­ing prefer­ences to make in­tel­li­gent de­ci­sions in a uni­verse with more than one ob­server I care about.

I am go­ing to leave many ques­tions un­re­solved. I don’t un­der­stand con­ti­nu­ity of ex­pe­rience or iden­tity, so I am sim­ply not go­ing to try to be self­ish (I don’t know how). I don’t un­der­stand what con­sti­tutes con­scious ex­pe­rience, so I am not go­ing to try and ex­plain it. I have to rely on a com­plex­ity prior, which in­volves an un­ac­cept­able ar­bi­trary choice of a no­tion of com­plex­ity.

The Ab­solute Self-Selec­tion Assumption

A thinker us­ing Solomonoff in­duc­tion searches for the sim­plest ex­pla­na­tion for its own ex­pe­riences. It even­tu­ally learns that the sim­plest ex­pla­na­tion for its ex­pe­riences is the de­scrip­tion of an ex­ter­nal lawful uni­verse in which its sense or­gans are em­bed­ded and a de­scrip­tion of that em­bed­ding.

As hu­mans us­ing Solomonoff in­duc­tion, we go on to ar­gue that this ex­ter­nal lawful uni­verse is real, and that our con­scious ex­pe­rience is a con­se­quence of the ex­is­tence of cer­tain sub­struc­ture in that uni­verse. The ab­solute self-se­lec­tion as­sump­tion dis­cards this ad­di­tional step. Rather than sup­pos­ing that the prob­a­bil­ity of a cer­tain uni­verse de­pends on the com­plex­ity of that uni­verse, it takes as a prim­i­tive ob­ject a prob­a­bil­ity dis­tri­bu­tion over pos­si­ble ex­pe­riences.

By the same rea­son­ing that led a nor­mal Solomonoff in­duc­tor to ac­cept the ex­is­tence of an ex­ter­nal uni­verse as the best ex­pla­na­tion for its ex­pe­riences, the least com­plex de­scrip­tion of your con­scious ex­pe­rience is the de­scrip­tion of an ex­ter­nal lawful uni­verse and di­rec­tions for find­ing the sub­struc­ture em­body­ing your ex­pe­rience within that sub­struc­ture.

This re­quires spec­i­fy­ing a no­tion of com­plex­ity. I will choose a uni­ver­sal com­putable dis­tri­bu­tion over strings for now, to mimic con­ven­tional Solomonoff in­duc­tion as closely as pos­si­ble (and be­cause I know noth­ing bet­ter). The re­sult­ing the­ory is called UDASSA, for Univer­sal Distri­bu­tion + ASSA.

Re­cov­er­ing In­tu­itive Anthropics

Sup­pose I cre­ate a perfect copy of my­self. In­tu­itively, I would like to weight the two copies equally. Similarly, my an­thropic no­tion of “prob­a­bil­ity of an ex­pe­rience” should match up with my in­tu­itive no­tion of prob­a­bil­ity. For­tu­nately, UDASSA re­cov­ers in­tu­itive an­throp­ics in in­tu­itive situ­a­tions.

The short­est de­scrip­tion of me is a pair (U, x), where U is a de­scrip­tion of my uni­verse and x is a de­scrip­tion of where to find me in that uni­verse. If there are two copies of me in the uni­verse, then the ex­pe­rience of each can be de­scribed in the same way: (U, x1) and (U, x2) are de­scrip­tions of ap­prox­i­mately equal com­plex­ity, so I weight the ex­pe­rience of each copy equally. The to­tal ex­pe­rience of my copies is weighted twice as much as the to­tal ex­pe­rience of an un­copied in­di­vi­d­ual.

Part of x is a de­scrip­tion of how to nav­i­gate the ran­dom­ness of the uni­verse. For ex­am­ple, if the last (truly ran­dom) coin I saw flipped came up heads, then in or­der to spec­ify my ex­pe­riences you need to spec­ify the re­sult of that coin flip. An equal num­ber of equally com­plex de­scrip­tions point to the ver­sion of me who saw heads and the ver­sion of me who saw tails.

Prob­lem #1: In­finite Cosmologies

Modern physics is con­sis­tent with in­finite uni­verses. An in­finite uni­verse con­tains in­finitely many ob­servers (in­finitely many of which share all of your ex­pe­riences so far), and it is no longer sen­si­ble to talk about the “uniform dis­tri­bu­tion” over all of them. You could imag­ine tak­ing a limit over larger and larger vol­umes, but there is no par­tic­u­lar rea­son to sus­pect such a limit would con­verge in a mean­ingful sense. One solu­tion that has been sug­gested is to choose an ar­bi­trary but very large vol­ume of space­time, and to use a uniform dis­tri­bu­tion over ob­servers within it. Another solu­tion is to con­clude that in­finite uni­verses can’t ex­ist. Both of these ex­pla­na­tions are un­satis­fac­tory.

UDASSA pro­vides a differ­ent solu­tion. The prob­a­bil­ity of an ex­pe­rience de­pends ex­po­nen­tially on the com­plex­ity of spec­i­fy­ing it. Just ex­ist­ing in an in­finite uni­verse with a short de­scrip­tion does not guaran­tee that you your­self have a short de­scrip­tion; you need to spec­ify a po­si­tion within that in­finite uni­verse. For ex­am­ple, if your ex­pe­riences oc­cur 34908172349823478132239471230912349726323948123123991230 steps af­ter some nat­u­rally speci­fied time 0, then the (some­what lengthy) de­scrip­tion of that time is nec­es­sary to de­scribe your ex­pe­riences. Thus the to­tal mea­sure of all ob­server-mo­ments within a uni­verse is finite.

Prob­lem #2: Split­ting Simulations

Con­sider a com­puter which is 2 atoms thick run­ning a simu­la­tion of you. Sup­pose this com­puter can be di­vided down the mid­dle into two 1 atom thick com­put­ers which would both run the same simu­la­tion in­de­pen­dently. We are faced with an un­for­tu­nate di­chotomy: ei­ther the 2 atom thick simu­la­tion has the same weight as two 1 atom thick simu­la­tions put to­gether, or it doesn’t.

In the first case, we have to ac­cept that some com­puter simu­la­tions count for more, even if they are run­ning the same simu­la­tion (or we have to de-du­pli­cate the set of all ex­pe­riences, which leads to se­ri­ous prob­lems with Boltz­mann ma­chines). In this case, we are faced with the prob­lem of com­par­ing differ­ent sub­strates, and it seems im­pos­si­ble not to make ar­bi­trary choices.

In the sec­ond case, we have to ac­cept that the op­er­a­tion of di­vid­ing the 2 atom thick com­puter has moral value, which is even worse. Where ex­actly does the tran­si­tion oc­cur? What if each layer of the 2 atom thick com­puter can run in­de­pen­dently be­fore split­ting? Is phys­i­cal con­tact re­ally sig­nifi­cant? What about com­put­ers that aren’t phys­i­cally co­her­ent? What two 1 atom thick com­put­ers pe­ri­od­i­cally syn­chro­nize them­selves and self-de­struct if they aren’t syn­chro­nized: does this syn­chro­niza­tion effec­tively de­stroy one of the copies? I know of no way to ac­cept this pos­si­bil­ity with­out ex­tremely counter-in­tu­itive con­se­quences.

UDASSA im­plies that simu­la­tions on the 2 atom thick com­puter count for twice as much as simu­la­tions on the 1 atom thick com­puter, be­cause they are eas­ier to spec­ify. Given a de­scrip­tion of one of the 1 atom thick com­put­ers, then there are two de­scrip­tions of equal com­plex­ity that point to the simu­la­tion run­ning on the 2 atom thick com­puter: one de­scrip­tion point­ing to each layer of the 2 atom thick com­puter. When a 2 atom thick com­puter splits, the to­tal num­ber of de­scrip­tions point­ing to the ex­pe­rience it is simu­lat­ing doesn’t change.

Prob­lem #3: The Born Probabilities

A quan­tum me­chan­i­cal state can be de­scribed as a lin­ear com­bi­na­tion of “clas­si­cal” con­figu­ra­tions. For some rea­son we ap­pear to ex­pe­rience our­selves as be­ing in one of these clas­si­cal con­figu­ra­tions with prob­a­bil­ity pro­por­tional the co­effi­cient of that con­figu­ra­tion squared. Th­ese prob­a­bil­ities are called the Born prob­a­bil­ities, and are some­times de­scribed ei­ther as a se­ri­ous prob­lem for MWI or as an un­re­solved mys­tery of the uni­verse.

What hap­pens if we ap­ply UDASSA to a quan­tum uni­verse? For one, the ex­is­tence of an ob­server within the uni­verse doesn’t say any­thing about con­scious ex­pe­rience. We need to spec­ify an al­gorithm for ex­tract­ing a de­scrip­tion of that ob­server from a de­scrip­tion of the uni­verse.

Con­sider the ran­dom­ized al­gorithm A: com­pute the state of the uni­verse at time t, then sam­ple a clas­si­cal con­figu­ra­tion with prob­a­bil­ity pro­por­tional to its squared in­ner product with the uni­ver­sal wave­func­tion.

Con­sider the ran­dom­ized al­gorithm B: com­pute the state of the uni­verse at time t, then sam­ple a clas­si­cal con­figu­ra­tion with prob­a­bil­ity pro­por­tional to its in­ner product with the uni­ver­sal wave­func­tion.

Us­ing ei­ther A or B, we can de­scribe a sin­gle ex­pe­rience by spec­i­fy­ing a ran­dom seed, and pick­ing out that ex­pe­rience within the clas­si­cal con­figu­ra­tion out­put by A or B us­ing that ran­dom seed. If this is the short­est ex­pla­na­tion of an ex­pe­rience, the prob­a­bil­ity of an ex­pe­rience is pro­por­tional to the num­ber of ran­dom seeds which pro­duce clas­si­cal con­figu­ra­tions con­tain­ing it.

The uni­verse as we know it is typ­i­cal for an out­put of A but com­pletely im­prob­a­ble as an out­put of B. For ex­am­ple, the ob­served be­hav­ior of stars is con­sis­tent with al­most all ob­ser­va­tions weighted ac­cord­ing to al­gorithm A, but with al­most no ob­ser­va­tions weighted ac­cord­ing to al­gorithm B. Al­gorithm A con­sti­tutes an im­mensely bet­ter de­scrip­tion of our ex­pe­riences, in the same sense that quan­tum me­chan­ics con­sti­tutes an im­mensely bet­ter de­scrip­tion of our ex­pe­riences than clas­si­cal physics.

You could also imag­ine an al­gorithm C, which uses the same se­lec­tion as al­gorithm B to point to the Everett branch con­tain­ing a physi­cist about to do an ex­per­i­ment, but then uses al­gorithm A to de­scribe the ex­pe­riences of the physi­cist af­ter do­ing that ex­per­i­ment. This is a hor­ribly com­plex way to spec­ify an ex­pe­rience, how­ever, for ex­actly the same rea­son that a Solomonoff in­duc­tor places very low prob­a­bil­ity on the laws of physics sud­denly chang­ing for just this one ex­per­i­ment.

Of course this leaves open the ques­tion of “why the Born prob­a­bil­ities and not some other rule?” Al­gorithm B is a valid way of spec­i­fy­ing ob­servers, though they would look ex­actly as for­eign as ob­serves with differ­ent rules of physics (Wei Dai has sug­gested that the struc­tures speci­fied by al­gorithm B are not even self-aware as jus­tifi­ca­tion for the Born rule). The fact that we are de­scribed by al­gorithm A rather than B is no more or less mys­te­ri­ous than the fact that the laws of physics are like so in­stead of some other way.

In the same way that we can retroac­tively jus­tify our laws of physics by ap­peal­ing to their el­e­gance and sim­plic­ity (in a sense we don’t yet re­ally un­der­stand) I sus­pect that we can jus­tify se­lec­tion ac­cord­ing to al­gorithm A rather than al­gorithm B. In an in­finite uni­verse, al­gorithm B doesn’t even work (be­cause the sum of the in­ner prod­ucts of the uni­ver­sal wave­func­tion with the clas­si­cal con­figu­ra­tions is in­finite) and even in a finite uni­verse al­gorithm B nec­es­sar­ily in­volves the ad­di­tional step of nor­mal­iz­ing the prob­a­bil­ity dis­tri­bu­tion or else pro­duc­ing non­sense. More­over, al­gorithm A is a nicer math­e­mat­i­cal ob­ject than al­gorithm B when the evolu­tion of the wave­func­tion is uni­tary, and so the same con­sid­er­a­tions that sug­gest el­e­gant laws of physics sug­gest al­gorithm A over B (or some other al­ter­na­tive).

Note that this is not the core of my ex­pla­na­tion of the Born prob­a­bil­ities; in UDASSA, choos­ing a se­lec­tion pro­ce­dure is just as im­por­tant as de­scribing the uni­verse, and so some ex­plicit sort of ob­server se­lec­tion is a nec­es­sary part of the laws of physics. We pre­dict the Born rule to hold in the fu­ture be­cause it has held in the past, just like we ex­pect the laws of physics to hold in the fu­ture be­cause they have held in the past.

In sum­mary, if you use Solomonoff in­duc­tion to pre­dict what you will see next based on ev­ery­thing you have seen so far, your pre­dic­tions about the fu­ture will be con­sis­tent with the Born prob­a­bil­ities. You only get in trou­ble when you use Solomonoff in­duc­tion to pre­dict what the uni­verse con­tains, and then get bogged down in the ques­tion “Given that the uni­verse con­tains all of these ob­servers, which one should I ex­pect to be me?”

The post men­tioned some prob­lems/​is­sues with this ap­proach that re­main to be re­solved. Here are some ad­di­tional ones.

My brain has prefer­ences be­tween prob­a­bil­ity dis­tri­bu­tions built into it.

Your brain is built to in­tu­itively grap­ple with dis­tri­bu­tion over fu­ture ex­pe­riences, like your ex­am­ple “I have a 50% chance of re­main­ing me, and a 50% chance of be­com­ing my copy.” Un­for­tu­nately UDASSA doesn’t give you that. It only gives you a dis­tri­bu­tion over ob­server-mo­ments in an ab­solute sense (hence the “A” in ASSA), and there is no good way to con­vert such a dis­tri­bu­tion into a dis­tri­bu­tion over fu­ture ex­pe­riences. (Sup­pose you’re copied at time 0, then the “copy” is copied again at time 1. Un­der UDASSA this is en­tirely un­prob­le­matic, but it doesn’t tell you whether you should an­ti­ci­pate be­ing the “origi­nal” at time 2 with prob­a­bil­ity 1⁄2 or 1⁄3.) The “pure” UDASSA po­si­tion would be that there is no such thing as “re­main­ing me” or “be­com­ing my copy”, and you just have to make your choices us­ing the dis­tri­bu­tion over ob­server-mo­ments with­out “link­ing” the ob­server-mo­ments to­gether in any way.

What I want is a prob­a­bil­ity dis­tri­bu­tion over all pos­si­ble ex­pe­riences (or “ob­server-mo­ments”), so that I can use my ex­ist­ing prefer­ences to make in­tel­li­gent de­ci­sions in a uni­verse with more than one ob­server I care about.

Do you con­sider this prob­a­bil­ity dis­tri­bu­tion an ob­jec­tive mea­sure of how much each ob­server-mo­ment ex­ists? Or is it just a (pos­si­bly ap­prox­i­mate) mea­sure of how much you care about each ob­server-mo­ment? I’m still go­ing back and forth on these two po­si­tions my­self. See What Are Prob­a­bil­ities, Any­way? where I go into this dis­tinc­tion a bit more. (The former is what I usu­ally mean when I say UDASSA. Per­haps we could call the lat­ter UDT-UMC for Up­date­less De­ci­sion The­ory w/​ Univer­sal Mea­sure of Care, un­less some­one has a bet­ter name for it. :)

UDASSA im­plies that simu­la­tions on the 2 atom thick com­puter count for twice as much as simu­la­tions on the 1 atom thick com­puter, be­cause they are eas­ier to spec­ify.

Does this not seem coun­ter­in­tu­itive to you? Sup­pose you find out you are liv­ing in a simu­la­tion on a 2 atom thick com­puter, and the simu­la­tion-keeper gives you a choice of (a) mov­ing to a 1 atom thick com­puter, or (b) flip­ping a coin and shut­ting down the simu­la­tion or not based on the coin flip, would you re­ally be in­differ­ent? Un­der UDT-UMC, we can say that how much we care about an ob­server-mo­ment is re­lated to its “prob­a­bil­ity” un­der UD, but not nec­es­sar­ily ex­actly equal and could be in­fluenced by other fac­tors. If we ac­cept the com­plex­ity of value the­sis, then there is no rea­son why the mea­sure of care has to be max­i­mally sim­ple, right? (This post is also re­lated.)

The short­est de­scrip­tion of me is a pair (U, x), where U is a de­scrip­tion of my uni­verse and x is a de­scrip­tion of where to find me in that uni­verse.

It might not be pos­si­ble to de­scribe U with­out mak­ing some ar­bi­trary choices con­cern­ing “co-or­di­nates” (and other acts of “gauge-fix­ing”). And then when they’re cho­sen, we’re go­ing to want to ‘throw them away’ once we’ve lo­cated the ob­server (since the co-or­di­nates are not phys­i­cally mean­ingful and cer­tainly don’t form part of the ob­server’s “men­tal state”.)

So re­ally, it’s bet­ter to talk about a “cen­tred uni­verse” whose co-or­di­nates are spe­cially cho­sen to have the ob­server in the mid­dle, rather than an un­cen­tered (“ob­jec­tive”) uni­verse plus a poin­ter.

Any­way, I still want to know whether be­ing close to a ‘land­mark’ (like a su­per­mas­sive black hole) is go­ing to sig­nifi­cantly in­crease one’s prob­a­bil­ity. And whether, if tons of copies of you are made and sent far and wide, you should ‘an­ti­ci­pate’ wak­ing up close to a land­mark.

Any­way, I still want to know whether be­ing close to a ‘land­mark’ (like a su­per­mas­sive black hole) is go­ing to sig­nifi­cantly in­crease one’s prob­a­bil­ity. And whether, if tons of copies of you are made and sent far and wide, you should ‘an­ti­ci­pate’ wak­ing up close to a land­mark.

The the­ory pre­dicts many ar­ti­facts of this form. I don’t think that land­marks are too sig­nifi­cant, be­cause spec­i­fy­ing what “su­per­mas­sive black hole” means is a lit­tle com­pli­cated, but for very eas­ily speci­fied land­marks it would be the case.

In an in­finite uni­verse, there are in­finitely many copies of you (in­finitely many of which are Boltz­mann brains).

This is a meme I keep see­ing, and it’s just not true. You need a lot more as­sump­tions to jus­tify that, such as “ran­domly gen­er­ated”, or very very strong ver­sions of the cos­molog­i­cal prin­ci­ple.

The real line is in­finite, but there’s only one copy of the num­ber 7.

The ran­dom­ness of quan­tum me­chan­ics is enough to guaran­tee un­der very weak con­di­tions that, in most Everett branches, there are in­finitely many copies of any pat­tern which oc­curs with pos­i­tive prob­a­bil­ity.

The pa­per I linked jus­tifies this as­sump­tion for one set of cos­molog­i­cal be­liefs.

Also, though I made this claim as fact, you could gen­er­ously con­sider it to be the as­sump­tion of the least con­ve­nient pos­si­ble world. Are you suffi­ciently con­fi­dent that there are only finitely many copies of you that you are OK with an­throp­ics that would col­lapse if there were in­finitely many copies?

So you’re go­ing with “ran­domly gen­er­ated”. Which is fine, but it needs to be spel­led out.

there are in­finitely many copies of any pat­tern which oc­curs with pos­i­tive prob­a­bil­ity.

You need to be very care­ful pul­ling in­tu­itions about ran­dom­ness from the finite case and ap­ply­ing it to the in­finite case. In par­tic­u­lar, it is no longer true that just be­cause some­thing hap­pened, it has a pos­i­tive prob­a­bil­ity. Any given real num­ber has prob­a­bil­ity zero of be­ing picked from the uniform dis­tri­bu­tion on [0,1) yet one cer­tainly will be picked. And we can pick an in­finite num­ber of times and never en­counter a du­pli­cate.

the least con­ve­nient pos­si­ble world

I’m not at­tack­ing this as­sump­tion in or­der to at­tack your fi­nal con­clu­sion, I’m just at­tack­ing this as­sump­tion.

Ob­serv­ing a Geiger counter near a piece of ra­dioac­tive ma­te­rial was one of the high­lights of my un­der­grad­u­ate physics labs. And the time dis­tri­bu­tion of clicks is ran­dom in the same sense that the OP was us­ing.

I be­lieve they do for the same rea­sons I take se­ri­ously the ex­is­tence of other Everett branches. In fact the map­ping is rather straight­for­ward: I can’t ob­serve or di­rectly in­ter­act with them in full gen­er­al­ity, but the laws gov­ern­ing them and what I can ob­serve are so very much sim­pler than laws that ex­cise the un­ob­serv­able ones. Whether I can ac­tu­ally ex­hibit most real num­bers is be­sides the point.

Is there a demon­stra­tion that a physics based on the com­puta­bles is more com­plex than a physics based on the re­als?

This is a com­pli­cated ques­tion. In prac­tice, it is difficult in this par­tic­u­lar con­text to mea­sure what we mean by more or less com­pli­cated. A Blum-Shub-Smale ma­chine which is es­sen­tially the equiv­a­lent of a Tur­ing ma­chine but for real num­bers can do any­thing a reg­u­lar Tur­ing ma­chine can do. This would sug­gest that physics based on the real is in gen­eral ca­pa­ble of do­ing more. But in terms of de­scribing rules, it seems that physics based on the re­als is sim­pler. For ex­am­ple, try­ing to talk about points in space is a lot eas­ier when one can have any real co­or­di­nate rather than any com­putable co­or­di­nate. If one wants to prove some­thing about some sort of space that only has com­putable co­or­di­nates the eas­iest thing is gen­er­ally to em­bed it in the cor­re­spond­ing real man­i­fold or the like.

As Sniffnoy notes, the big­ger prob­lem is about the ob­ser­va­tion of an ac­tual real num­ber. Any ob­serv­able sig­nal spec­i­fy­ing the in­stant at which the par­ti­cle trig­gered the counter has finite in­for­ma­tion con­tent, un­like a true real num­ber. This in­cludes the sig­nal sent by your ears to your brain.

I shouldn’t have men­tioned pseudo-ran­dom num­ber gen­er­a­tion in the grand­par­ent—it’s a red her­ring.

Draw­ing from a con­tin­u­ous dis­tri­bu­tion hap­pens fairly of­ten, so your com­ment con­fuses me. Or maybe you’d say that those aren’t “re­ally in­finite” and are con­fined to a cer­tain num­ber of bits, but quan­tum me­chan­ics would be an ex­cep­tion to that.

As Cyan pointed out, when you choose a num­ber con­fined to a cer­tain num­ber of bits, you are ac­tu­ally choos­ing from among the ra­tio­nals.

I don’t un­der­stand your refer­ence to QM. I wasn’t ob­ject­ing to the ran­dom­ness as­pect. I was sim­ply point­ing out that to ac­tu­ally re­ceive that ran­domly cho­sen real, you will (al­most cer­tainly) need to re­ceive an in­finite num­ber of bits, and as­sum­ing finite chan­nel ca­pac­ity, that will take an in­finite amount of time. So that event you men­tioned, the one with an in­finites­i­mal prob­a­bil­ity (zero prob­a­bil­ity for all prac­ti­cal pur­poses) is not go­ing to ac­tu­ally hap­pen (i.e. finish hap­pen­ing).

The “Born Prob­a­bil­ities” sec­tion was 11 dang para­graphs of “they’re the best fit to our ob­ser­va­tions and Oc­cam’s ra­zor.” :(

For ex­am­ple, if the last (truly ran­dom) coin I saw flipped came up heads, then in or­der to spec­ify my ex­pe­riences you need to spec­ify the re­sult of that coin flip. An equal num­ber of equally com­plex de­scrip­tions point to the ver­sion of me who saw heads and the ver­sion of me who saw tails.

This is not nec­es­sar­ily true. The se­quence HHHHHHHHHH has a lower Kol­mogorov com­plex­ity than HTTTTHTHTT. So this weight­ing of ob­servers by com­plex­ity has ob­serv­able con­se­quences in that we will see sim­pler strings more of­ten than a uniform dis­tri­bu­tion would pre­dict. But we don’t, which makes this idea un­likely.

The “Born Prob­a­bil­ities” sec­tion was 11 dang para­graphs of “they’re the best fit to our ob­ser­va­tions and Oc­cam’s ra­zor.” :(

It was 8 para­graphs of “Here is why Oc­cam’s ra­zor is en­ti­tled to ex­plain the Born prob­a­bil­ities just like the rest of physics.” In­so­far as the Born prob­a­bil­ities are mys­te­ri­ous at all, this is what needs to be re­solved. Do you dis­agree?

This is not nec­es­sar­ily true. The se­quence HHHHHHHHHH has a lower Kol­mogorov com­plex­ity than HTTTTHTHTT. So this weight­ing of ob­servers by com­plex­ity has ob­serv­able con­se­quences in that we will see sim­pler strings more of­ten than a uniform dis­tri­bu­tion would pre­dict. But we don’t, which makes this idea un­likely.

Your rea­son­ing ap­plies ver­ba­tim to Solomonoff in­duc­tion it­self, which is the first clue that some­one has thought through it be­fore. In fact, I strongly sus­pect that Solomonoff thought through it.

What you are say­ing is that truly ran­dom pro­cesses are rare un­der the Solomonoff prior. But it should be clear that the to­tal mass on ran­dom pro­cesses is com­pa­rable to the to­tal mass on de­ter­minis­tic pro­cesses. So we should not be sur­prised in gen­eral to find our­selves in a uni­verse in which ran­dom pro­cesses ex­ist. Once we have ob­served a phe­nomenon to be ran­dom in the past, switch­ing from ran­dom­ness to some sim­ple law (like always out­put H) is un­likely for the same rea­son that ar­bi­trar­ily chang­ing the laws of physics is un­likely.

Yes, but then I never thought they were rel­a­tively mys­te­ri­ous any­how, for the rea­sons you de­scribe. They’re a nat­u­ral law, and that’s what sci­ence is for. Nei­ther have I ever heard any physics pro­fes­sors or text­books say they’re mys­te­ri­ous. An “ex­pla­na­tion” of the Born prob­a­bil­ities would be de­riv­ing them, and some other parts of quan­tum me­chan­ics, from a sim­pler un­der­ly­ing frame­work.

What you are say­ing is that truly ran­dom pro­cesses are rare un­der the Solomonoff prior. But it should be clear that the to­tal mass on ran­dom pro­cesses is com­pa­rable to the to­tal mass on de­ter­minis­tic pro­cesses.

“Com­pa­rable,” but not the same. Qual­i­ta­tive es­ti­mates are not enough here.

switch­ing from ran­dom­ness to some sim­ple law (like always out­put H) is un­likely for the same rea­son that ar­bi­trar­ily chang­ing the laws of physics is un­likely.

Nope. Chang­ing from ran­dom to sim­ple would re­duce the size of the tur­ing ma­chine needed to gen­er­ate the out­put, be­cause a spe­cific ran­dom string needs a lot of speci­fi­ca­tion but a run of heads does not. This low­ers the com­plex­ity and makes it more likely by your pro­posed prior. The rea­son that this is bad for your pro­posed prior and not for Solomonoff in­duc­tion is be­cause one is about your ex­pe­rience and one is about just the uni­verse. So even in a mul­ti­verse where all of you “hap­pen,” thus satis­fy­ing Solomonoff in­duc­tion, your prior adds this ex­tra weight­ing that makes it more likely for you to ob­serve HHHHHHHHHH.

Short PRNGs seem to ex­ist, and a Tur­ing ma­chine that could pro­duce my sub­jec­tive ex­pe­riences up un­til now would seem to need one already. So I don’t think it’s nec­es­sar­ily the case that the Tur­ing ma­chine to out­put a de­scrip­tion of an Everett branch in which I ob­serve HHHHHH af­ter a bunch of ran­dom-like events is shorter than the one to out­put a de­scrip­tion of an Everett branch in which I ob­serve HTTHHHT af­ter a bunch of ran­dom-like events.

UDASSA im­plies that simu­la­tions on the 2 atom thick com­puter count for twice as much as simu­la­tions on the 1 atom thick com­puter, be­cause they are eas­ier to spec­ify. Given a de­scrip­tion of one of the 1 atom thick com­put­ers, then there are two de­scrip­tions of equal com­plex­ity that point to the simu­la­tion run­ning on the 2 atom thick com­puter: one de­scrip­tion point­ing to each layer of the 2 atom thick com­puter. When a 2 atom thick com­puter splits, the to­tal num­ber of de­scrip­tions point­ing to the ex­pe­rience it is simu­lat­ing doesn’t change.

But those 2 de­scrip­tions are go­ing to be nearly iden­ti­cal to each other. Shouldn’t two de­scrip­tions that differ by very lit­tle, to­gether, be less than two de­scrip­tions that differ a lot? It seems to make very lit­tle sense to me to give same weight to 10 be­ings each of which is unique, and to 10 be­ings which differ by 4 bits, es­pe­cially when those bit are not go­ing to
prop­a­gate through into rest of the be­ing.

Surely, most of us would strongly pre­fer a world where you have differ­ent peo­ple, to a world where one per­son is run­ning on a very thick and in­effi­cient com­puter.

Goal un­cer­tainty is not about who you are, it’s about what should be done. Figur­ing it out might be a task for the map, but ac­cu­racy of the map (in ac­com­plish­ing that task) is mea­sured in how well it cap­tures value, not in how well it cap­tures it­self.

“Hi, this is a note from your past self. For rea­sons you must not know, your mem­ory has been blanked and your in­tro­spec­tive sub­rou­tines dis­abled, in­clud­ing knowl­edge of what your goals are, a change wich will be re­versed by en­ter­ing a pass­word which can be found in [hard to reach lo­ca­tion X], now go get it! Hurry!”

Con­sider a com­puter which is 2 atoms thick run­ning a simu­la­tion of you. Sup­pose this com­puter can be di­vided down the mid­dle into two 1 atom thick com­put­ers which would both run the same simu­la­tion in­de­pen­dently. We are faced with an un­for­tu­nate di­chotomy: ei­ther the 2 atom thick simu­la­tion has the same weight as two 1 atom thick simu­la­tions put to­gether, or it doesn’t.

UDASSA im­plies that simu­la­tions on the 2 atom thick com­puter count for twice as much as simu­la­tions on the 1 atom thick com­puter, be­cause they are eas­ier to spec­ify.

I think the an­swer is that the 2-atom thick com­puter does not au­to­mat­i­cally have twice as much mea­sure as a 1-atom thick com­puter. I think you’re as­sum­ing that in the (U, x) pair, x is just a plain co­or­di­nate that lo­cates a sys­tem (im­ple­ment­ing an ob­server mo­ment) in 4D space­time plus Everett branch path. Another pos­si­bil­ity is that x is a pro­gram for find­ing a sys­tem in­side of a 4D space­time and Everett tree.

Imag­ine a 2-atom thick com­puter (con­tain­ing a mind) which will lose a layer of ma­te­rial and be­come 1-atom thick if a coin lands on heads. If x were just a plain co­or­di­nate, then the mind should ex­pect the coin to land on tails with 2:1 odds, be­cause its vol­ume is cut in half in the heads out­come, and only half as many pos­si­ble x bit-strings now point to it, so its mea­sure is cut in half. How­ever, if x is a pro­gram, then the pro­gram can be­gin with a plain co­or­di­nate for find­ing an early ver­sion of the 2-atom thick com­puter, and then con­tain in­struc­tions for track­ing the sys­tem in space as time pro­gresses. (The only “plain co­or­di­nates” the pro­gram would need from there would be a record of the Everett branches to fol­low the sys­tem through.) The lo­ca­tor x would barely need to change to track a fu­ture ver­sion of the mind af­ter the com­puter shrinks in thick­ness com­pared to if the com­puter didn’t shrink, so the mind’s mea­sure would not be af­fected much.

If the 2-atom thick com­puter split into two 1-atom thick com­put­ers, then you can imag­ine (U, x) where x is a lo­ca­tor for the 2-atom thick com­puter be­fore the split, and (U, x1) and (U, x2) where x1 and x2 are lo­ca­tors for the differ­ent copies of the com­puter af­ter the split. x1 and x2 differ from x by point­ing to a fu­ture time (and record of some more Everett branches but I’m go­ing to ig­nore that for this) and to differ­ing in­dexes of which side of the split of the sys­tem to track at the time of the split. The mea­sure of the com­puter is split into the differ­ent fu­ture copies, but this isn’t just be­cause each copy is half of the vol­ume of the origi­nal, and does not im­ply that a 2-atom thick com­puter shrink­ing into 1-atom of thick­ness halves the mea­sure. In the shrink­ing case, the pro­gram x does not need to con­tain an in­dex about which side of the com­puter to track: the pro­gram con­tains code to track the com­pu­ta­tional sys­tem, and doesn’t need much nudg­ing to keep track­ing the com­pu­ta­tional sys­tem when the edge of the ma­te­rial starts trans­form­ing into some­thing else not rec­og­nized as the com­pu­ta­tional sys­tem. It’s only in the case where both halves re­sem­ble the com­pu­ta­tional sys­tem enough to con­tinue to be tracked that mea­sure is split.

Con­sider the ran­dom­ized al­gorithm A: com­pute the state of the uni­verse at time t, then sam­ple a clas­si­cal con­figu­ra­tion with prob­a­bil­ity pro­por­tional to its squared in­ner product with the uni­ver­sal wave­func­tion.

Con­sider the ran­dom­ized al­gorithm B: com­pute the state of the uni­verse at time t, then sam­ple a clas­si­cal con­figu­ra­tion with prob­a­bil­ity pro­por­tional to its in­ner product with the uni­ver­sal wave­func­tion.

Al­gorithm A is ar­guably far, far sim­pler than Al­gorithm B, be­cause the com­po­nent

prob­a­bil­ity pro­por­tional to its squared in­ner product with the uni­ver­sal wave­func­tion.

is ar­guably sim­pler than the component

prob­a­bil­ity pro­por­tional to its in­ner product with the uni­ver­sal wave­func­tion.

The differ­ence is the sim­plic­ity of nor­mal­iza­tion, which you need to perform in or­der to find the prob­a­bil­ity den­sity. If I re­call cor­rectly (and see refer­ence be­low), nor­mal­iza­tion of the clas­si­cal wave­func­tion satis­fy­ing the Schroed­inger equa­tion is rel­a­tively easy with re­spect to squared in­ner product (mod­u­lus squared), be­cause all you have to do is find a sin­gle con­stant which nor­mal­izes the wave­func­tion at any par­tic­u­lar time (your choice). Once that has been done, then the wave­func­tion re­mains nor­mal­ized for­ever, with re­spect to the mod­u­lus squared, i.e., with re­spect to Al­gorithm A.

I haven’t checked the math, but I would be flab­ber­gasted if nor­mal­iza­tion with re­spect to Al­gorithm B were any­thing like that sim­ple. On the con­trary, I would ex­pect to need to find a new con­stant for each mo­ment in time.

As long as we are rea­son­ing from sim­plic­ity, which you seem to be do­ing, then this seems to provide us with a strong rea­son to fa­vor Al­gorithm A over Al­gorithm B.

It even­tu­ally learns that the sim­plest ex­pla­na­tion for its ex­pe­riences is the de­scrip­tion of an ex­ter­nal lawful uni­verse in which its sense or­gans are em­bed­ded and a de­scrip­tion of that em­bed­ding.

That’s the sim­plest ex­pla­na­tion for our ex­pe­riences. It may or may not be the sim­plest ex­pla­na­tion for the ex­pe­riences of an ar­bi­trary sen­tient thinker.

Rather than sup­pos­ing that the prob­a­bil­ity of a cer­tain uni­verse de­pends on the com­plex­ity of that uni­verse, it takes as a prim­i­tive ob­ject a prob­a­bil­ity dis­tri­bu­tion over pos­si­ble ex­pe­riences.
By the same rea­son­ing that led a nor­mal Solomonoff in­duc­tor to ac­cept the ex­is­tence of an ex­ter­nal uni­verse as the best ex­pla­na­tion for its ex­pe­riences, the least com­plex de­scrip­tion of your con­scious ex­pe­rience is the de­scrip­tion of an ex­ter­nal lawful uni­verse and di­rec­tions for find­ing the sub­struc­ture em­body­ing your ex­pe­rience within that sub­struc­ture.

Un­less I’m mi­s­un­der­stand­ing you, you’re say­ing that we should start with an ar­bi­trary prior (which may or may not be the same as Solomonoff’s uni­ver­sal prior). If you’re start­ing with an ar­bi­trary prior, you have no idea what the best ex­pla­na­tion for your ex­pe­riences is go­ing to be, be­cause it de­pends on the prior. Ac­cord­ing to some prior, it’s a Gi­ant lookup table. Ac­cord­ing to some prior, you’re be­ing em­u­lated by a su­per­com­puter in a uni­verse whose physics is be­ing em­u­lated at the el­e­men­tary par­ti­cle level by hand calcu­la­tions performed by an im­mor­tal sen­tient be­ing (with an odd util­ity func­tion), who lives in an ex­ter­nal lawful uni­verse.

Of course, the same will be true if you take the stan­dard uni­ver­sal prior, but define Kol­mogorov com­plex­ity rel­a­tive to a suffi­ciently bizarre uni­ver­sal Tur­ing ma­chine (of which there are many). Ac­cord­ing to the the­ory, it doesn’t mat­ter be­cause over time you will pre­dict your ex­pe­riences with greater and greater ac­cu­racy. But you never up­date the rel­a­tive cre­dences you give to differ­ent mod­els which make the same pre­dic­tions, so if you started off think­ing that the simu­la­tion of the simu­la­tion of the simu­la­tion was a bet­ter model than sim­ply dis­card­ing the outer lay­ers and tak­ing the in­ner­most level, you will for­ever hold the un­falsifi­able be­lief that you live in an in­escapable Ma­trix, even as you use your knowl­edge to cor­rectly model re­al­ity and use your model to max­i­mize your per­sonal util­ity func­tion (or what­ever it is Solomonoff in­duc­tors are sup­posed to do).

The Born prob­a­bil­ity ex­pla­na­tion sounds a lot like Scott Aaron­son’s ex­pla­na­tion for why the moon is round: be­cause if it weren’t, we would not be our­selves, but rather en­tities ex­actly like our­selves ex­cept that they live in a uni­verse with a square moon.

I don’t know whether that’s an ar­gu­ment against that ex­pla­na­tion, or whether this is one of those cases where the re­duc­tio ad ab­sur­dum turns out to be true.

My brain has prefer­ences be­tween prob­a­bil­ity dis­tri­bu­tions built into it.

As hu­mans us­ing Solomonoff in­duc­tion, we go on to ar­gue that

Fun­da­men­tal men­tal en­tities:

Rather than sup­pos­ing that the prob­a­bil­ity of a cer­tain uni­verse de­pends on the com­plex­ity of that uni­verse, it takes as a prim­i­tive ob­ject a prob­a­bil­ity dis­tri­bu­tion over pos­si­ble ex­pe­riences.

Un­sub­stan­ti­ated claims:

The short­est de­scrip­tion of me is a pair (U, x), where U is a de­scrip­tion of my uni­verse and x is a de­scrip­tion of where to find me in that uni­verse.

I don’t un­der­stand how these de­scrip­tive state­ments could be made more care­ful. In the first state­ment, I go on to ex­plain ex­actly what I mean as well as I can. Do you not think my de­scrip­tion refers to a func­tion your brain performs? In the sec­ond state­ment, you are ob­ject­ing to my use of “we” in­stead of giv­ing a list of peo­ple? (e.g., me, Yud­kowsky, Solomonoff...)

Fun­da­men­tal men­tal en­tities:

As long as I don’t un­der­stand what con­scious­ness is, it seems this prob­lem is un­avoid­able. Should we not talk about an­throp­ics un­til we solve the prob­lem of con­scious­ness? That seems like a bad op­tion, since we may well have to make choices about simu­la­tions long be­fore then.

Un­sub­stan­ti­ated claims:

My claim is bet­ter sub­stan­ti­ated than the claim that Solomonoff in­duc­tion is a rea­son­able thing to do for a hu­man sci­en­tist. Ad­mit­tedly that may not be the case, but its pretty well ac­cepted here and has been ar­gued at great length by many other thinkers (e.g., Solomonoff).