Request for input on multiverse-wide superrationality (MSR)

I am cur­rently work­ing on a re­search pro­ject as part of CEA’s sum­mer re­search fel­low­ship. I am build­ing a sim­ple model of so-called “mul­ti­verse-wide co­op­er­a­tion via su­per­ra­tional­ity” (MSR). The model should in­cor­po­rate the most rele­vant un­cer­tain­ties for de­ter­min­ing pos­si­ble gains from trade. To be able to make this model max­i­mally use­ful, I would like to ask oth­ers for their opinions on the idea of MSR. For in­stance, what are the main rea­sons you think MSR might be ir­rele­vant or might not work as it is sup­posed to work? Which ques­tions are unan­swered and need to be ad­dressed be­fore be­ing able to as­sess the merit of the idea? I would be happy about any in­put in the com­ments to this post or via mail to jo­hannes@foun­da­tional-re­search.org.

An overview of re­sources on MSR, in­clud­ing in­tro­duc­tory texts, can be found on the link above. To briefly illus­trate the idea, con­sider two ar­tifi­cial agents with iden­ti­cal source code play­ing a pris­oner’s dilemma. Even if both agents can­not causally in­ter­act, one agent’s ac­tion pro­vides them with strong ev­i­dence about the other agent’s ac­tion. Ev­i­den­tial de­ci­sion the­ory and re­cently pro­posed var­i­ants of causal de­ci­sion the­ory (Yud­kowsky and Soares, 2018; Spohn, 2003; Poel­linger, 2013) say that agents should take such ev­i­dence into ac­count when mak­ing de­ci­sions. MSR is based on the idea that (i) hu­mans on Earth are in a similar situ­a­tion as the two AI agents: there prob­a­bly is a large or in­finite mul­ti­verse con­tain­ing many ex­act copies of hu­mans on Earth (Teg­mark 2003, p. 464), but also agents similar but non-iden­ti­cal to hu­mans. (ii) If hu­mans and these other, similar agents take each other’s prefer­ences into ac­count, then, due to gains from trade, ev­ery­one is bet­ter off than if ev­ery­one were to pur­sue only their own ends. It fol­lows from (i) and (ii) that hu­mans should take the prefer­ences of other, similar agents in the mul­ti­verse into ac­count, to pro­duce the ev­i­dence that they do in turn take hu­mans’ prefer­ences into ac­count, which leaves ev­ery­one bet­ter off.

Ac­cord­ing to Oester­held (2017, sec. 4), this idea could have far-reach­ing im­pli­ca­tions for pri­ori­ti­za­tion. For in­stance, given MSR, some forms of moral ad­vo­cacy could be­come in­effec­tive: ad­vo­cat­ing for their par­tic­u­lar val­ues pro­vides agents with ev­i­dence that oth­ers do the same, po­ten­tially neu­tral­iz­ing each other’s efforts. More­over, MSR could play a role in de­cid­ing which strate­gies to pur­sue in AI al­ign­ment. It could be­come es­pe­cially valuable to en­sure an AGI will en­gage in a mul­ti­verse-wide trade.

It seems like MSR re­quires a mul­ti­verse large enough to have many well-cor­re­lated agents, but not large enough to run into the prob­lems in­volved with in­finite ethics. Most of my cre­dence is on no mul­ti­verse or in­finite mul­ti­verse, al­though I’m not par­tic­u­larly well-read on this is­sue.

My broad in­tu­ition is some­thing like “In­so­far as we can know about the val­ues of other civil­i­sa­tions, they’re prob­a­bly similar to our own. In­so­far as we can’t, MSR isn’t rele­vant.” There are prob­a­bly ex­cep­tions, though (e.g. we could guess the di­rec­tion in which an r-se­lected civil­i­sa­tion’s val­ues would vary from our own).

I worry that MSR is sus­cep­ti­ble to self-mug­ging of some sort. I don’t have a par­tic­u­lar ex­am­ple, but the gen­eral idea is that you’re cor­re­lated with other agents even if you’re be­ing very ir­ra­tional. And so you might end up do­ing things which seem ar­bi­trar­ily ir­ra­tional. But this is just a half-fledged thought, not a proper ob­jec­tion.

And lastly, I would have much more con­fi­dence in FDT and su­per­ra­tional­ity in gen­eral if there were a sen­si­ble met­ric of similar­ity be­tween agents, apart from cor­re­la­tion (be­cause if you always co­op­er­ate in pris­oner’s dilem­mas, then your choices are perfectly cor­re­lated with Co­op­er­ateBot, but in­tu­itively it’d still be more ra­tio­nal to defect against Co­op­er­ateBot, be­cause your de­ci­sion al­gorithm isn’t similar to Co­op­er­ateBot in the same way that it’s similar to your psy­cholog­i­cal twin). I guess this re­quires a solu­tion to log­i­cal un­cer­tainty, though.

Happy to dis­cuss this more with you in per­son. Also, I sug­gest you cross-post to Less Wrong.

Re 4): Cor­re­la­tion or similar­ity be­tween agents is not re­ally nec­es­sary con­di­tion for co­op­er­a­tion in the open source PD. LaVic­toire et al. (2012) and re­lated pa­pers showed that ‘fair’ agents with com­pletely differ­ent im­ple­men­ta­tions can co­op­er­ate. A fair agent, roughly speak­ing, has to con­form to any struc­ture that im­ple­ments “I’ll co­op­er­ate with you if I can show that you’ll co­op­er­ate with me”. So maybe that’s the mea­sure you’re look­ing for.

A pop­u­la­tion of fair agents is also typ­i­cally a Nash equil­ibrium in such games so you might ex­pect that they some­times do evolve.

The ex­am­ple you’ve given me shows that agents which im­ple­ment ex­actly the same (high-level) al­gorithm can co­op­er­ate with each other. The met­ric I’m look­ing for is: how can we de­cide how similar two agents are when their al­gorithms are non-iden­ti­cal? Pre­sum­ably we want a smooth­ness prop­erty for that met­ric such that if our al­gorithms are very similar (e.g. only differ with re­spect to some rad­i­cally un­likely edge case) the re­duc­tion in co­op­er­a­tion is neg­ligible. But it doesn’t seem like any­one knows how to do this.

Hey, a rough point on a doubt I have. Not sure if it’s use­ful/​novel.

Go­ing through the men­tal pro­cesses of a util­i­tar­ian (roughly defined) will cor­re­late with oth­ers mak­ing more util­i­tar­ian de­ci­sions as well (es­pe­cially when they’re similar in rele­vant per­son­al­ity traits and their past ex­po­sure to philo­soph­i­cal ideas).

For ex­am­ple, if you act less scope-in­sen­si­tive, om­mis­sion-bias-y, or in­group-y, oth­ers will tend to do so as well. This in­cludes edge cases – e.g. peo­ple who oth­er­wise would have made de­ci­sions that roughly fall in the de­on­tol­o­gist or virtue ethics bucket.

There­fore, for ev­ery mo­ment you end up shut­ting off util­i­tar­ian-ish men­tal pro­cesses in favour of ones where you think you’re do­ing moral trade (in­clud­ing hid­den mo­ti­va­tions like ra­tio­nal­is­ing act­ing from so­cial proof or dis­com­fort in di­verg­ing from your peers), your multi-uni­ver­sal com­pa­tri­ots will do like­wise (es­pe­cially in similar con­texts).

(In case it looks like I’m jus­tify­ing be­ing a staunch util­i­tar­ian here, I have a more nu­anced anti-re­al­ism view mixed in with lots of un­cer­tainty on what makes sense.)

I still have doubts as to whether you should pay in Coun­ter­fac­tual Mug­ging since I be­lieve that (non-quan­tum) prob­a­bil­ity is in the map rather than the ter­ri­tory. I haven’t had the op­por­tu­nity to write up these thoughts yet as my cur­rent posts are build­ing up to­wards it, but I can link you when I do.

I re­main un­sure with MSR how to calcu­late the mea­sure of agents in wor­lds hold­ing po­si­tions to trade with so that we can figure out how much we should acausally trade with each. Also, how to ad­dress un­cer­tainty about if any­one will in­de­pen­dently ar­rive at the same po­si­tion you hold and so be able to acausally trade with you since you can’t tell them about what you would ac­tu­ally pre­fer.