This blog post is writ­ten for a very spe­cific au­di­ence: peo­ple in­volved in the effec­tive al­tru­ism com­mu­nity who are fa­mil­iar with cause pri­ori­ti­za­tion and ar­gu­ments for the over­whelming im­por­tance of the far fu­ture. It might read as strange and con­fus­ing to peo­ple with­out that do­main knowl­edge. Please con­sider read­ing the ar­ti­cles linked in the Con­text sec­tion to get your bear­ings. This post is also very long, but the sec­tions are fairly in­de­pen­dent as each cov­ers a fairly dis­tinct con­sid­er­a­tion.Many thanks for helpful feed­back to Jo An­der­son, To­bias Bau­mann, Jesse Clif­ton, Max Daniel, Michael Dick­ens, Per­sis Eskan­der, Daniel Filan, Kieran Greig, Zach Groff, Amy Halpern-Laff, Jamie Har­ris, Josh Ja­cob­son, Gre­gory Lewis, Cas­par Oester­held, Carl Shul­man, Gina Stuessy, Brian To­masik, Jo­hannes Treut­lein, Mag­nus Vind­ing, Ben West, and Kelly Witwicki. I also for­warded Ben Todd and Rob Wiblin a small sec­tion of the draft that dis­cusses an 80,000 Hours ar­ti­cle.

Abstract

When peo­ple in the effec­tive al­tru­ism (EA) com­mu­nity have worked to af­fect the far fu­ture, they’ve typ­i­cally fo­cused on re­duc­ing ex­tinc­tion risk, es­pe­cially risks as­so­ci­ated with su­per­in­tel­li­gence or gen­eral ar­tifi­cial in­tel­li­gence al­ign­ment (AIA). I agree with the ar­gu­ments for the far fu­ture be­ing ex­tremely im­por­tant in our EA de­ci­sions, but I ten­ta­tively fa­vor im­prov­ing the qual­ity of the far fu­ture by ex­pand­ing hu­man­ity’s moral cir­cle more than in­creas­ing the like­li­hood of the far fu­ture or hu­man­ity’s con­tinued ex­is­tence by re­duc­ing AIA-based ex­tinc­tion risk be­cause: (1) the far fu­ture seems to not be very good in ex­pec­ta­tion, and there’s a sig­nifi­cant like­li­hood of it be­ing very bad, and (2) moral cir­cle ex­pan­sion seems highly ne­glected both in EA and in so­ciety at large. Also, I think con­sid­er­a­tions of bias are very im­por­tant here, given how nec­es­sar­ily in­tu­itive and sub­jec­tive judg­ment calls make up the bulk of differ­ences in opinion on far fu­ture cause pri­ori­ti­za­tion. I find the ar­gu­ment in fa­vor of AIA that tech­ni­cal re­search might be more tractable than so­cial change to be the most com­pel­ling coun­ter­ar­gu­ment to my po­si­tion.

Context

This post largely ag­gre­gates ex­ist­ing con­tent on the topic, rather than mak­ing origi­nal ar­gu­ments. I offer my views, mostly in­tu­itions, on the var­i­ous ar­gu­ments, but of course I re­main highly un­cer­tain given the limited amount of em­piri­cal ev­i­dence we have on far fu­ture cause pri­ori­ti­za­tion.

Many in the effec­tive al­tru­ism (EA) com­mu­nity think the far fu­ture is a very im­por­tant con­sid­er­a­tion when work­ing to do the most good. The ba­sic ar­gu­ment is that hu­man­ity could con­tinue to ex­ist for a very long time and could ex­pand its civ­i­liza­tion to the stars, cre­at­ing a very large amount of moral value. The main nar­ra­tive has been that this civ­i­liza­tion could be a very good one, and that in the com­ing decades, we face siz­able risks of ex­tinc­tions that could pre­vent us from ob­tain­ing this “cos­mic en­dow­ment.” The ar­gu­ment goes that these risks also seem like they can be re­duced with a fairly small amount of ad­di­tional re­sources (e.g. time, money), and there­fore ex­tinc­tion risk re­duc­tion is one of the most im­por­tant pro­jects of hu­man­ity and the EA com­mu­nity.

How­ever, one can ac­cept the first part of this ar­gu­ment — that there is a very large amount of ex­pected moral value in the far fu­ture and it’s rel­a­tively easy to make a differ­ence in that value — with­out de­cid­ing that ex­tinc­tion risk is the most im­por­tant pro­ject. In slightly differ­ent terms, one can de­cide not to work on re­duc­ing pop­u­la­tion risks, risks that could re­duce the num­ber of morally rele­vant in­di­vi­d­u­als in the far fu­ture (of course, these are only risks of harm if one be­lieves more in­di­vi­d­u­als is a good thing), and in­stead work on re­duc­ing qual­ity risks, risks that could re­duce the qual­ity of morally rele­vant in­di­vi­d­u­als’ ex­is­tence. One spe­cific type of qual­ity risk of­ten dis­cussed is a risk of as­tro­nom­i­cal suffer­ing (s-risk), defined as “events that would bring about suffer­ing on an as­tro­nom­i­cal scale, vastly ex­ceed­ing all suffer­ing that has ex­isted on Earth so far.”This blog post makes the case for fo­cus­ing on qual­ity risks over pop­u­la­tion risks. More speci­fi­cally, though also more ten­ta­tively, it makes the case for fo­cus­ing on re­duc­ing qual­ity risk through moral cir­cle ex­pan­sion (MCE), the strat­egy of im­pact­ing the far fu­ture through in­creas­ing hu­man­ity’s con­cern for sen­tient be­ings who cur­rently re­ceive lit­tle con­sid­er­a­tion (i.e. widen­ing our moral cir­cle so it in­cludes them), over AI al­ign­ment (AIA), the strat­egy of im­pact­ing the far fu­ture through in­creas­ing the like­li­hood that hu­man­ity cre­ates an ar­tifi­cial gen­eral in­tel­li­gence (AGI) that be­haves as its de­sign­ers want it to (known as the al­ign­ment prob­lem).[1][2]

The ba­sic case for MCE is very similar to the case for AIA. Hu­man­ity could con­tinue to ex­ist for a very long time and could ex­pand its civ­i­liza­tion to the stars, cre­at­ing a very large num­ber of sen­tient be­ings. The sort of civ­i­liza­tion we cre­ate, how­ever, seems highly de­pen­dent on our moral val­ues and moral be­hav­ior. In par­tic­u­lar, it’s un­cer­tain whether many of those sen­tient be­ings will re­ceive the moral con­sid­er­a­tion they de­serve based on their sen­tience, i.e. whether they will be in our “moral cir­cle” or not, like the many sen­tient be­ings who have suffered in­tensely over the course of hu­man his­tory (e.g. from tor­ture, geno­cide, op­pres­sion, war). It seems the moral cir­cle can be ex­panded with a fairly small amount of ad­di­tional re­sources (e.g. time, money), and there­fore MCE is one of the most im­por­tant pro­jects of hu­man­ity and the EA com­mu­nity.

Note that MCE is a spe­cific kind of val­ues spread­ing, the par­ent cat­e­gory of MCE that de­scribes any effort to shift the val­ues and moral be­hav­ior of hu­man­ity and its de­cen­dants (e.g. in­tel­li­gent ma­chines) in a pos­i­tive di­rec­tion to benefit the far fu­ture. (Of course, some peo­ple at­tempt to spread val­ues in or­der to benefit the near fu­ture, but in this post we’re only con­sid­er­ing far fu­ture im­pact.)

I’m speci­fi­cally com­par­ing MCE and AIA be­cause AIA is prob­a­bly the most fa­vored method of re­duc­ing ex­tinc­tion risk in the EA com­mu­nity. AIA seems to be the de­fault cause area to fa­vor if one wants to have an im­pact on the far fu­ture, and I’ve been asked sev­eral times why I fa­vor MCE in­stead.

This dis­cus­sion risks con­flat­ing AIA with re­duc­ing ex­tinc­tion risk. Th­ese are two sep­a­rate ideas, since an un­al­igned AGI could still lead to a large num­ber of sen­tient be­ings, and an al­igned AGI could still po­ten­tially cause ex­tinc­tion or pop­u­la­tion stag­na­tion (e.g. if ac­cord­ing to the de­sign­ers’ val­ues, even the best civ­i­liza­tion the AGI could help build is still worse than nonex­is­tence). How­ever, most EAs fo­cused on AIA seem to be­lieve that the main risk is some­thing quite like ex­tinc­tion, such as the text­book ex­am­ple of an AI that seeks to max­i­mize the num­ber of pa­per­clips in the uni­verse. I’ll note when the dis­tinc­tion be­tween AIA and re­duc­ing ex­tinc­tion risk is rele­vant. Similarly, there are some­times im­por­tant pri­ori­ti­za­tion differ­ences be­tween MCE and other types of val­ues spread­ing, and those will be noted when they mat­ter. (This para­graph is an im­por­tant qual­ifi­ca­tion for the whole post. The pos­si­bil­ity of un­al­igned AGI that in­volves a civ­i­liza­tion (and, less so be­cause it seems quite un­likely, the pos­si­bil­ity of an AGI that causes ex­tinc­tion) is im­por­tant to con­sider for far fu­ture cause pri­ori­ti­za­tion. Un­for­tu­nately, elab­o­rat­ing on this would make this post far more com­pli­cated and far less read­able, and would not change many of the con­clu­sions. Per­haps I’ll be able to make a sec­ond post that adds this dis­cus­sion at some point.)It’s also im­por­tant to note that I’m dis­cussing speci­fi­cally AIA here, not all AI safety work in gen­eral. AI safety, which just means in­creas­ing the like­li­hood of benefi­cial AI out­comes, could be in­ter­preted as in­clud­ing MCE, since MCE plau­si­bly makes it more likely that an AI would be built with good val­ues. How­ever, MCE doesn’t seem like a very plau­si­ble route to in­creas­ing the like­li­hood that AI is sim­ply al­igned with the in­ten­tions of its de­sign­ers, so I think MCE and AIA are fairly dis­tinct cause ar­eas.

AI safety can also in­clude work on re­duc­ing s-risks, such as speci­fi­cally re­duc­ing the like­li­hood of an un­al­igned AI that causes as­tro­nom­i­cal suffer­ing, rather than re­duc­ing the like­li­hood of all un­al­igned AI. I think this is an in­ter­est­ing cause area, though I am un­sure about its tractabil­ity and am not con­sid­er­ing it in the scope of this blog post.

The post’s pub­li­ca­tion was sup­ported by Greg Lewis, who was in­ter­ested in this topic and donated $1,000 to Sen­tience In­sti­tute, the think tank I co-founded which re­searches effec­tive strate­gies to ex­pand hu­man­ity’s moral cir­cle, con­di­tional on this post be­ing pub­lished to the Effec­tive Altru­ism Fo­rum. Lewis doesn’t nec­es­sar­ily agree with any of its con­tent. He de­cided on the con­di­tional dona­tion prior to the post be­ing writ­ten, and I did ask him to re­view the post prior to pub­li­ca­tion and it was ed­ited based on his feed­back.

The ex­pected value of the far future

Whether we pri­ori­tize re­duc­ing ex­tinc­tion risk partly de­pends on how good or bad we ex­pect hu­man civ­i­liza­tion to be in the far fu­ture, given it con­tinues to ex­ist. In my opinion, the as­sump­tion that it will be very good is a trag­i­cally un­ex­am­ined as­sump­tion in the EA com­mu­nity.

What if it’s close to zero?

If we think the far fu­ture is very good, that clearly makes re­duc­ing ex­tinc­tion risk more promis­ing. And if we think the far fu­ture is very bad, that makes re­duc­ing ex­tinc­tion risk not just un­promis­ing, but ac­tively very harm­ful. But what if it’s near the mid­dle, i.e. close to zero?[3] 80,000 Hours wrote that to be­lieve re­duc­ing ex­tinc­tion risk is not an EA pri­or­ity on the ba­sis of the ex­pected moral value of the far fu­ture,

...even if you’re not sure how good the fu­ture will be, or sus­pect it will be bad, you may want civil­i­sa­tion to sur­vive and keep its op­tions open. Peo­ple in the fu­ture will have much more time to study whether it’s de­sir­able for civil­i­sa­tion to ex­pand, stay the same size, or shrink. If you think there’s a good chance we will be able to act on those moral con­cerns, that’s a good rea­son to leave any fi­nal de­ci­sions to the wis­dom of fu­ture gen­er­a­tions. Over­all, we’re highly un­cer­tain about these big-pic­ture ques­tions, but that gen­er­ally makes us more con­cerned to avoid mak­ing any ir­re­versible com­mit­ments...

This rea­son­ing seems mis­taken to me be­cause want­ing “civil­i­sa­tion to sur­vive and keep its op­tions open” de­pends on op­ti­mism that civ­i­liza­tion will do re­search, make good[4] de­ci­sions based on that re­search, and be ca­pa­ble of im­ple­ment­ing those de­ci­sions.[5] In other words, while pre­vent­ing ex­tinc­tion keeps op­tions open for good things to hap­pen, it also keeps op­tions open for bad things to hap­pen, and de­siring this op­tion value de­pends on an op­ti­mism that the good things are more likely. In other words, the rea­son­ing as­sumes the op­ti­mism (think­ing the far fu­ture is good, or at least that hu­mans will make good de­ci­sions and be able to im­ple­ment them[6]), which is also its con­clu­sion.

Hav­ing that op­ti­mism makes sense in many de­ci­sions, which is why keep­ing op­tions open is of­ten a good heuris­tic. In EA, for ex­am­ple, peo­ple tend to do good things with their ca­reers, which means ca­reer op­tion value is a use­ful thing. This doesn’t read­ily trans­late to de­ci­sions where it’s not clear whether the ac­tors in­volved will have a pos­i­tive or nega­tive im­pact. (Note 80,000 Hours isn’t mak­ing this com­par­i­son. I’m just mak­ing it to ex­plain my own view here.)

There’s also a sense in which pre­vent­ing ex­tinc­tion risk de­creases op­tion value be­cause if hu­man­ity pro­gresses past cer­tain civ­i­liza­tional mile­stones that make ex­tinc­tion more un­likely — say, the rise of AGI or ex­pan­sion be­yond our own so­lar sys­tem — it might be­come harder or even im­pos­si­ble to press the “off switch” (end­ing civ­i­liza­tion). How­ever, I think most would agree that there’s more over­all op­tion value in a civ­i­liza­tion that has got­ten past these mile­stones be­cause there’s a much wider va­ri­ety of non-ex­tinct civ­i­liza­tions than ex­tinct civ­i­liza­tions.[7]

If you think that the ex­pected moral value of the far fu­ture is close to zero, even if you think it’s slightly pos­i­tive, then re­duc­ing ex­tinc­tion risk is a less promis­ing EA strat­egy than if you think it’s very pos­i­tive.

Key considerations

I think the con­sid­er­a­tions on this topic are best rep­re­sented as ques­tions where peo­ple’s be­liefs (mostly just in­tu­itions) vary on a long spec­trum. I’ll list these in or­der of where I would guess I have the strongest dis­agree­ment with peo­ple who be­lieve the far fu­ture is highly pos­i­tive in ex­pected value (short­ened as HPEV-EAs), and I’ll note where I don’t think I would dis­agree or might even have a more pos­i­tive-lean­ing be­lief than the av­er­age such per­son.

I think there’s a sig­nifi­cant[8] chance that the moral cir­cle will fail to ex­pand to reach all sen­tient be­ings, such as ar­tifi­cial/​small/​weird minds (e.g. a so­phis­ti­cated com­puter pro­gram used to mine as­ter­oids, but one that doesn’t have the nor­mal fea­tures of sen­tient minds like fa­cial ex­pres­sions). In other words, I think there’s a sig­nifi­cant chance that pow­er­ful be­ings in the far fu­ture will have low will­ing­ness to pay for the welfare of many of the small/​weird minds in the fu­ture.[9]

I think it’s likely that the pow­er­ful be­ings in the far fu­ture (analo­gous to hu­mans as the pow­er­ful be­ings on Earth in 2018) will use large num­bers of less pow­er­ful sen­tient be­ings, such as for recre­ation (e.g. safaris, war games), a la­bor force (e.g. colon­ists to dis­tant parts of the galaxy, con­struc­tion work­ers), sci­en­tific ex­per­i­ments, threats, (e.g. threat­en­ing to cre­ate and tor­ture be­ings that a ri­val cares about), re­venge, jus­tice, re­li­gion, or even pure sadism.[10] I be­lieve this be­cause there have been less pow­er­ful sen­tient be­ings for all of hu­man­ity’s ex­is­tence and well be­fore (e.g. pre­da­tion), many of whom are ex­ploited and harmed by hu­mans and other an­i­mals, and there seems to be lit­tle rea­son to think such power dy­nam­ics won’t con­tinue to ex­ist.Alter­na­tive uses of re­sources in­clude sim­ply work­ing to in­crease one’s own hap­piness di­rectly (e.g. chang­ing one’s neu­ro­phys­iol­ogy to be ex­tremely happy all the time), and con­struct­ing large non-sen­tient pro­jects like a work of art. Though each of these types of pro­ject could still in­clude sen­tient be­ings, such as for ex­per­i­men­ta­tion or a la­bor force.With the ex­cep­tion of threats and sadism, the less pow­er­ful minds seem like they could suffer in­tensely be­cause their in­tense suffer­ing could be in­stru­men­tally use­ful. For ex­am­ple, if the recre­ation is nos­talgic, or hu­man psy­chol­ogy per­sists in some form, we could see pow­er­ful be­ings caus­ing in­tense suffer­ing in or­der to see good triumph over evil or in or­der to satisfy cu­ri­os­ity about situ­a­tions that in­volve in­tense suffer­ing (of course, the pow­er­ful be­ings might not ac­knowl­edge the suffer­ing as suffer­ing, in­stead con­ceiv­ing of it as simu­lated but not ac­tu­ally ex­pe­rienced by the simu­lated en­tities). For an­other ex­am­ple, with a sen­tient la­bor force, pun­ish­ment could be a stronger mo­ti­va­tor than re­ward, as in­di­cated by the his­tory of evolu­tion on Earth.[11][12]

I place sig­nifi­cant moral value on ar­tifi­cial/​small/​weird minds.

I think it’s quite un­likely that hu­man de­scen­dants will find the cor­rect moral­ity (in the sense of moral re­al­ism, find­ing these mind-in­de­pen­dent moral facts), and I don’t think I would care much about that cor­rect moral­ity even if it ex­isted. For ex­am­ple, I don’t think I would be com­pel­led to cre­ate suffer­ing if the cor­rect moral­ity said this is what I should do. Of course, such moral facts are very difficult to imag­ine, so I’m quite un­cer­tain about what my re­ac­tion to them would be.[13]

I’m skep­ti­cal about the view that tech­nol­ogy and effi­ciency will re­move the need for pow­er­less, high-suffer­ing, in­stru­men­tal moral pa­tients. An ex­am­ple of this pre­dicted trend is that fac­tory farmed an­i­mals seem un­likely to be nec­es­sary in the far fu­ture be­cause of their in­effi­ciency at pro­duc­ing an­i­mal prod­ucts. There­fore, I’m not par­tic­u­larly con­cerned about the fac­tory farm­ing of biolog­i­cal an­i­mals con­tin­u­ing into the far fu­ture. I am, how­ever, con­cerned about similar but less in­effi­cient sys­tems.An ex­am­ple of how tech­nol­ogy might not ren­der sen­tient la­bor forces and other in­stru­men­tal sen­tient be­ings ob­so­lete is how hu­mans seem mo­ti­vated to have power and con­trol over the world, and in par­tic­u­lar seem more satis­fied by hav­ing power over other sen­tient be­ings than by hav­ing power over non-sen­tient things like bar­ren land­scapes.I do still be­lieve there’s a strong ten­dency to­wards effi­ciency and that this has the po­ten­tial to ren­der much suffer­ing ob­so­lete; I just have more skep­ti­cism about it than I think is of­ten as­sumed by HPEV-EAs.[14]

I’m skep­ti­cal about the view that hu­man de­scen­dants will op­ti­mize their re­sources for hap­piness (i.e. cre­ate he­do­nium) rel­a­tive to op­ti­miz­ing for suffer­ing (i.e. cre­ate do­lorium).[15] Hu­mans cur­rently seem more de­liber­ately driven to cre­ate he­do­nium, but cre­at­ing do­lorium might be more in­stru­men­tally use­ful (e.g. as a threat to ri­vals[16]).On this topic, I similarly do still be­lieve there’s a higher like­li­hood of cre­at­ing he­do­nium; I just have more skep­ti­cism about it than I think is of­ten as­sumed by EAs.

I’m largely in agree­ment with the av­er­age HPEV-EA in my moral ex­change rate be­tween hap­piness and suffer­ing. How­ever, I think those EAs tend to greatly un­der­es­ti­mate how much the em­piri­cal ten­dency to­wards suffer­ing over hap­piness (e.g. wild an­i­mals seem to en­dure much more suffer­ing than hap­piness) is ev­i­dence of a fu­ture em­piri­cal asym­me­try.My view here is partly in­formed by the ca­pac­i­ties for hap­piness and suffer­ing that have evolved in hu­mans and other an­i­mals, the ca­pac­i­ties that seem to be driven by cul­tural forces (e.g. cor­po­ra­tions seem to care more about down­sides than up­sides, per­haps be­cause it’s eas­ier in gen­eral to de­stroy and harm things than to cre­ate and grow them), and spec­u­la­tion about what could be done in more ad­vanced civ­i­liza­tions, such as my best guess on what a planet op­ti­mized for hap­piness and a planet op­ti­mized for suffer­ing would look like. For ex­am­ple, I think a given amount of do­lorium/​dystopia (say, the amount that can be cre­ated with 100 joules of en­ergy) is far larger in ab­solute moral ex­pected value than he­do­nium/​utopia made with the same re­sources.

I’m un­sure of how much I would dis­agree with HPEV-EAs about the ar­gu­ment that we should be highly un­cer­tain about the like­li­hood of differ­ent far fu­ture sce­nar­ios be­cause of how highly spec­u­la­tive our ev­i­dence is, which pushes my es­ti­mate of the ex­pected value of the far fu­ture to­wards the mid­dle of the pos­si­ble range, i.e. to­wards zero.

I’m un­sure of how much I would dis­agree with HPEV-EAs about the per­sis­tence of evolu­tion­ary forces into the fu­ture (i.e. how much fu­ture be­ings will be de­ter­mined by fit­ness, rather than char­ac­ter­is­tics we might hope for like al­tru­ism and hap­piness).[17]

From the his­tor­i­cal per­spec­tive, it wor­ries me that many his­tor­i­cal hu­mans seem like they would be quite un­happy with the way hu­man moral­ity changed af­ter them, such as the way Western coun­tries are less con­cerned about pre­vi­ously-con­sid­ered-im­moral be­hav­ior like ho­mo­sex­u­al­ity and glut­tony than their an­ces­tors were in 500 CE. (Of course, one might think his­tor­i­cal hu­mans would agree with mod­ern hu­mans upon re­flec­tion, or think that much of hu­man­ity’s moral changes have been due to im­proved em­piri­cal un­der­stand­ing of the world.)[18]

I’m largely in agree­ment with HPEV-EAs that hu­man­ity’s moral cir­cle has a track record of ex­pan­sion and seems likely to con­tinue ex­pand­ing. For ex­am­ple, I think it’s quite likely that pow­er­ful be­ings in the far fu­ture will care a lot about charis­matic biolog­i­cal an­i­mals like elephants or chim­panzees, or what­ever be­ings have a similar re­la­tion­ship to those pow­er­ful be­ings as hu­man­ity has to elephants and chim­panzees. (As men­tioned above, my pes­simism about the con­tinued ex­pan­sion is largely due to con­cern about the mag­ni­tude of bad-but-un­likely out­comes and the harms that could oc­cur due to MCE stag­na­tion.)

Un­for­tu­nately, we don’t have much em­piri­cal data or solid the­o­ret­i­cal ar­gu­ments on these top­ics, so the dis­agree­ments I’ve had with HPEV-EAs have mostly just come down to differ­ences in in­tu­ition. This is a com­mon theme for pri­ori­ti­za­tion among far fu­ture efforts. We can out­line the rele­vant fac­tors and a lit­tle em­piri­cal data, but the cru­cial fac­tors seem to be left to spec­u­la­tion and in­tu­ition.

Most of these con­sid­er­a­tions are about how so­ciety will de­velop and uti­lize new tech­nolo­gies, which sug­gests we can de­velop rele­vant in­tu­itions and spec­u­la­tive ca­pac­ity by study­ing so­cial and tech­nolog­i­cal change. So even though these judg­ments are in­tu­itive, we could po­ten­tially im­prove them with more study of big-pic­ture so­cial and tech­nolog­i­cal change, such as Sen­tience In­sti­tute’s MCE re­search or Robin Han­son’s book on The Age of Em that an­a­lyzes what a fu­ture of brain em­u­la­tions would look like. (This sort of em­piri­cal re­search is what I see as the most promis­ing fu­ture re­search av­enue for far fu­ture cause pri­ori­ti­za­tion. I worry EAs overem­pha­size arm­chair re­search (like most of this post, ac­tu­ally) for var­i­ous rea­sons.[19])

I’d per­son­ally be quite in­ter­ested in a sur­vey of peo­ple with ex­per­tise in the rele­vant fields of so­cial, tech­nolog­i­cal, and philo­soph­i­cal re­search, in which they’re asked about each of the con­sid­er­a­tions above, though it might be hard to get a de­cent sam­ple size, and I think it would be quite difficult to de­bias the re­spon­dents (see the Bias sec­tion of this post).

I’m also in­ter­ested in quan­ti­ta­tive analy­ses of these con­sid­er­a­tions — calcu­la­tions in­clud­ing all of these po­ten­tial out­comes and as­so­ci­ated like­li­hoods. As far as I know, this kind of anal­y­sis has only been at­tempted so far by Michael Dick­ens in “A Com­plete Quan­ti­ta­tive Model for Cause Selec­tion,” in which Dick­ens notes that, “Values spread­ing may be bet­ter than ex­is­ten­tial risk re­duc­tion.” While this quan­tifi­ca­tion might seem hope­lessly spec­u­la­tive, I think it’s highly use­ful even in such situ­a­tions. Of course, rigor­ous de­bi­as­ing is also very im­por­tant here.

Over­all, I think the far fu­ture is close to zero in ex­pected moral value, mean­ing it’s not nearly as good as is com­monly as­sumed, im­plic­itly or ex­plic­itly, in the EA com­mu­nity.

Scale

Range of outcomes

It’s difficult to com­pare the scale of far fu­ture im­pacts since they are all as­tro­nom­i­cal, and I find the con­sid­er­a­tion of scale here to over­all not be very use­ful.Tech­ni­cally, it seems like MCE in­volves a larger range of po­ten­tial out­comes than re­duc­ing ex­tinc­tion risk through AIA be­cause, at least from a clas­si­cal con­se­quen­tial­ist per­spec­tive (giv­ing weight to both nega­tive and pos­i­tive out­comes), it could make the differ­ence be­tween some of the worst far fu­tures imag­in­able and the best far fu­tures. Re­duc­ing ex­tinc­tion risk through AIA only makes the differ­ence be­tween nonex­is­tence (a far fu­ture of zero value) and what­ever world comes to ex­ist. If one be­lieves the far fu­ture is highly pos­i­tive, this could still be a very large range, but it would still be less than the po­ten­tial change from MCE.

How much less de­pends on one’s views of how bad the worst fu­ture is rel­a­tive to the best fu­ture. If the ab­solute value is the same, then MCE has a range twice as large as ex­tinc­tion risk.

As men­tioned in the Con­text sec­tion above, the change in the far fu­ture that AIA could achieve might not ex­actly be ex­tinc­tion ver­sus non-ex­tinc­tion. While an al­igned AI would prob­a­bly not in­volve the ex­tinc­tion of all sen­tient be­ings, since that would re­quire the val­ues of its cre­ators to pre­fer ex­tinc­tion over all other op­tions, an un­al­igned AI might not nec­es­sar­ily in­volve ex­tinc­tion. To use the canon­i­cal AIA ex­am­ple of a “pa­per­clip max­i­mizer” (used to illus­trate how an AI could eas­ily have a harm­ful goal with­out any mal­i­cious in­ten­tion), the rogue AI might cre­ate sen­tient be­ings as a la­bor force to im­ple­ment its goal of max­i­miz­ing the num­ber of pa­per­clips in the uni­verse, or cre­ate sen­tient be­ings for some other goal.[20]

This means that the range of AIA is the differ­ence be­tween the po­ten­tial uni­verses with al­igned AI and un­al­igned AI, which could be very good fu­tures con­trasted with very bad fu­tures, rather than just very good fu­tures con­trasted with nonex­is­tence.

Brian To­masik has writ­ten out a thought­ful (though nec­es­sar­ily spec­u­la­tive and highly un­cer­tain) break­down of the risks of suffer­ing in both al­igned and un­al­igned AI sce­nar­ios, which weakly sug­gests that an al­igned AI would lead to more suffer­ing in ex­pec­ta­tion.

All things con­sid­ered, it seems that the range of qual­ity risk re­duc­tion (in­clud­ing MCE) is larger than that of ex­tinc­tion risk re­duc­tion (in­clud­ing AIA, de­pend­ing on one’s view of what differ­ence AI al­ign­ment makes), but this seems like a fairly weak con­sid­er­a­tion to me be­cause (i) it’s a differ­ence of roughly two-fold, which is quite small rel­a­tive to the differ­ences of ten-times, a thou­sand-times, etc. that we fre­quently see in cause pri­ori­ti­za­tion, (ii) there are nu­mer­ous fairly ar­bi­trary judg­ment calls (like con­sid­er­ing re­duc­ing ex­tinc­tion risk from AI ver­sus AIA ver­sus AI safety) that lead to differ­ent re­sults.[21]

Like­li­hood of differ­ent far fu­ture sce­nar­ios[22][23]

MCE is rele­vant for many far fu­ture sce­nar­ios where AI doesn’t un­dergo the sort of “in­tel­li­gence ex­plo­sion” or similar pro­gres­sion that makes AIA im­por­tant; for ex­am­ple, if AGI is de­vel­oped by an in­sti­tu­tion like a for­eign coun­try that has lit­tle in­ter­est in AIA, or if AI is never de­vel­oped, or if it’s de­vel­oped slowly in a way that makes safety ad­just­ments quite easy as that de­vel­op­ment oc­curs. In each of these sce­nar­ios, the way so­ciety treats sen­tient be­ings, es­pe­cially those cur­rently out­side the moral cir­cle, seems like it could still be af­fected by MCE. As men­tioned ear­lier, I think there is a sig­nifi­cant chance that the moral cir­cle will fail to ex­pand to reach all sen­tient be­ings, and I think a small moral cir­cle could very eas­ily lead to sub­op­ti­mal or dystopian far fu­ture out­comes.

On the other hand, some pos­si­ble far fu­ture civ­i­liza­tions might not in­volve moral cir­cles, such as if there is an egal­i­tar­ian so­ciety where each in­di­vi­d­ual is able to fully rep­re­sent their own in­ter­ests in de­ci­sion-mak­ing and this so­cietal struc­ture was not reached through MCE be­cause these be­ings are all equally pow­er­ful for tech­nolog­i­cal rea­sons (and no other be­ings ex­ist and they have no in­ter­est in cre­at­ing ad­di­tional be­ings). Some AI out­comes might not be af­fected by MCE, such as an un­al­igned AI that does some­thing like max­i­miz­ing the num­ber of pa­per­clips for rea­sons other than hu­man val­ues (such as a pro­gram­ming er­ror) or one whose de­sign­ers cre­ate its value func­tion with­out re­gard for hu­man­ity’s cur­rent moral views (“co­her­ent ex­trap­o­lated vo­li­tion” could be an ex­am­ple of this, though I agree with Brian To­masik that cur­rent moral views will likely be im­por­tant in this sce­nario).

Given my cur­rent, highly un­cer­tain es­ti­mates of the like­li­hood of var­i­ous far fu­ture sce­nar­ios, I would guess that MCE is ap­pli­ca­ble in some­what more cases than AIA, sug­gest­ing it’s eas­ier to make a differ­ence to the far fu­ture through MCE. (This is analo­gous to say­ing the risk of MCE-failure seems greater than the risk of AIA-failure, though I’m try­ing to avoid sim­plify­ing these into bi­nary out­comes.)

Tractability

How much of an im­pact can we ex­pect our marginal re­sources to have on the prob­a­bil­ity of ex­tinc­tion risk, or on the moral cir­cle of the far fu­ture?

So­cial change ver­sus tech­ni­cal research

One may be­lieve chang­ing peo­ple’s at­ti­tudes and be­hav­ior is quite difficult, and di­rect work on AIA in­volves a lot less of that. While AIA likely in­volves in­fluenc­ing some peo­ple (e.g. poli­cy­mak­ers, re­searchers, and cor­po­rate ex­ec­u­tives), MCE is al­most en­tirely in­fluenc­ing peo­ple’s at­ti­tudes and be­hav­ior.[24]

How­ever, one could in­stead be­lieve that tech­ni­cal re­search is more difficult in gen­eral, point­ing to po­ten­tial ev­i­dence such as the large amount of money spent on tech­ni­cal re­search (e.g. by Sili­con Valley) with of­ten very lit­tle to show for it, while huge so­cial change seems to some­times be effected by small groups of ad­vo­cates with rel­a­tively lit­tle money (e.g. or­ga­niz­ers of rev­olu­tions in Egypt, Ser­bia, and Turkey). (I don’t mean this as a very strong or per­sua­sive ar­gu­ment, just as a pos­si­bil­ity. There are plenty of ex­am­ples of tech done with few re­sources and so­cial change done with many.)

It’s hard to speak so gen­er­ally, but I would guess that tech­ni­cal re­search tends to be eas­ier than caus­ing so­cial change. And this seems like the strongest ar­gu­ment in fa­vor of work­ing on AIA over work­ing on MCE.

Track record

In terms of EA work ex­plic­itly fo­cused on the goals of AIA and MCE, AIA has a much bet­ter track record. The past few years have seen sig­nifi­cant tech­ni­cal re­search out­put from or­ga­ni­za­tions like MIRI and FHI, as doc­u­mented by user Larks on the EA Fo­rum for 2016 and 2017. I’d defer read­ers to those posts, but as a brief ex­am­ple, MIRI had an ac­claimed pa­per on “Log­i­cal In­duc­tion,” which used a fi­nan­cial mar­ket pro­cess to es­ti­mate the like­li­hood of log­i­cal facts (e.g. math­e­mat­i­cal propo­si­tions like the Rie­mann hy­poth­e­sis) that we aren’t yet sure of. This is analo­gous to how we use prob­a­bil­ity the­ory to es­ti­mate the like­li­hood of em­piri­cal facts (e.g. a dice roll). In the big­ger pic­ture of AIA, this re­search could help lay the tech­ni­cal foun­da­tion for build­ing an al­igned AGI. See Larks’ post for a dis­cus­sion of more pa­pers like this, as well as non-tech­ni­cal work done by AI-fo­cused or­ga­ni­za­tions such as the Fu­ture of Life In­sti­tute’s open let­ter on AI safety signed by lead­ing AI re­searchers and cited by the White House’s “Re­port on the Fu­ture of Ar­tifi­cial In­tel­li­gence.”

Us­ing an analo­gous defi­ni­tion for MCE, EA work ex­plic­itly fo­cused on MCE (mean­ing ex­pand­ing the moral cir­cle in or­der to im­prove the far fu­ture) ba­si­cally only started in 2017 with the found­ing of Sen­tience In­sti­tute (SI), though there were var­i­ous blog posts and ar­ti­cles dis­cussing it be­fore then. SI has ba­si­cally finished four re­search pro­jects: (1) Foun­da­tional Ques­tion Sum­maries that sum­ma­rize ev­i­dence we have on im­por­tant effec­tive an­i­mal ad­vo­cacy (EAA) ques­tions, in­clud­ing a sur­vey of EAA re­searchers, (2) a case study of the Bri­tish an­ti­s­lav­ery move­ment to bet­ter un­der­stand how they achieved one of the first ma­jor moral cir­cle ex­pan­sions in mod­ern his­tory, (3) a case study of nu­clear power to bet­ter un­der­stand how some coun­tries (e.g. France) en­thu­si­as­ti­cally adopted this new tech­nol­ogy, but oth­ers (e.g. the US) didn’t, (4) a na­tion­ally rep­re­sen­ta­tive poll of US at­ti­tudes to­wards an­i­mal farm­ing and an­i­mal-free food.With a broader defi­ni­tion of MCE that in­cludes ac­tivi­ties that peo­ple pri­ori­tiz­ing MCE tend to think are quite in­di­rectly effec­tive (see the Ne­glect­ed­ness sec­tion for dis­cus­sion of defi­ni­tions), we’ve seen EA achieve quite a lot more, such as the work done by The Hu­mane League, Mercy For An­i­mals, An­i­mal Equal­ity, and other or­ga­ni­za­tions on cor­po­rate welfare re­forms to an­i­mal farm­ing prac­tices, and the work done by The Good Food In­sti­tute and oth­ers on sup­port­ing a shift away from an­i­mal farm­ing, es­pe­cially through sup­port­ing new tech­nolo­gies like so-called “clean meat.”

Since I fa­vor the nar­rower defi­ni­tion, I think AIA out­performs MCE on track record, but the differ­ence in track record seems largely ex­plained by the greater re­sources spent on AIA, which makes it a less im­por­tant con­sid­er­a­tion. (Also, when I per­son­ally de­cided to fo­cus on MCE, SI did not yet ex­ist, so the lack of track record was an even stronger con­sid­er­a­tion in fa­vor of AIA (though MCE was also more ne­glected at that time).)To be clear, the track records of all far fu­ture pro­jects tend to be weaker than near-term pro­jects where we can di­rectly see the re­sults.

Robustness

If one val­ues ro­bust­ness, mean­ing a higher cer­tainty that one is hav­ing a pos­i­tive im­pact, ei­ther for in­stru­men­tal or in­trin­sic rea­sons, then AIA might be more promis­ing be­cause once we de­velop an al­igned AI (that con­tinues to be al­igned over time), the work of AIA is done and won’t need to be re­done in the fu­ture. With MCE, as­sum­ing the ad­vent of AI or similar de­vel­op­ments won’t fix so­ciety’s val­ues in place (known as “value lock-in”), then MCE progress could more eas­ily be un­done, es­pe­cially if one be­lieves there’s a so­cial set­point that hu­man­ity drifts back to­wards when moral progress is made.[25]

I think the as­sump­tions of this ar­gu­ment make it quite weak: I’d guess an “in­tel­li­gence ex­plo­sion” has a sig­nifi­cant chance of value lock-in,[26][27] and I don’t think there’s a set­point in the sense that pos­i­tive moral change in­creases the risk of nega­tive moral change. I also don’t value ro­bust­ness in­trin­si­cally at all or in­stru­men­tally very much; I think that there is so much un­cer­tainty in all of these strate­gies and such weak prior be­liefs[28] that differ­ences in cer­tainty of im­pact mat­ter rel­a­tively lit­tle.

Miscellaneous

Work on ei­ther cause area runs the risk of back­firing. The main risk for AIA seems to be that the tech­ni­cal re­search done to bet­ter un­der­stand how to build an al­igned AI will in­crease AI ca­pa­bil­ities gen­er­ally, mean­ing it’s also eas­ier for hu­man­ity to pro­duce an un­al­igned AI. The main risk for MCE seems to be that cer­tain ad­vo­cacy strate­gies will end up hav­ing the op­po­site effect as in­tended, such as a con­fronta­tional protest for an­i­mal rights that ends up putting peo­ple off of the cause.

It’s un­clear which pro­ject has bet­ter near-term prox­ies and feed­back loops to as­sess and in­crease long-term im­pact. AIA has tech­ni­cal prob­lems with solu­tions that can be math­e­mat­i­cally proven, but these might end up hav­ing lit­tle bear­ing on fi­nal AIA out­comes, such as if an AGI isn’t de­vel­oped us­ing the method that was ad­vised or if tech­ni­cal solu­tions aren’t im­ple­mented by policy-mak­ers. MCE has met­rics like pub­lic at­ti­tudes and prac­tices. My weak in­tu­ition here, and the weak in­tu­ition of other rea­son­able peo­ple I’ve dis­cussed this with, is that MCE has bet­ter near-term prox­ies.

It’s un­clear which pro­ject has more his­tor­i­cal ev­i­dence that EAs can learn from to be more effec­tive. AIA has pre­vi­ous sci­en­tific, math­e­mat­i­cal, and philo­soph­i­cal re­search and tech­nolog­i­cal suc­cesses and failures, while MCE has pre­vi­ous psy­cholog­i­cal, so­cial, poli­ti­cal, and eco­nomic re­search and ad­vo­cacy suc­cesses and failures.

Fi­nally, I do think that we learn a lot about tractabil­ity just by work­ing di­rectly on an is­sue. Given how lit­tle effort has gone into MCE it­self (see Ne­glect­ed­ness be­low), I think we could re­solve a sig­nifi­cant amount of un­cer­tainty with more work in the field.Over­all, con­sid­er­ing only di­rect tractabil­ity (i.e. ig­nor­ing in­for­ma­tion value due to ne­glect­ed­ness, which would help other EAs with their cause pri­ori­ti­za­tion), I’d guess AIA is a lit­tle more tractable.

Neglectedness

With ne­glect­ed­ness, we also face a challenge of how broadly to define the cause area. In this case, we have a fairly clear goal with our defi­ni­tion: to best as­sess how much low-hang­ing fruit is available. To me, it seems like there are two sim­ple defi­ni­tions that meet this goal: (i) or­ga­ni­za­tions or in­di­vi­d­u­als work­ing ex­plic­itly on the cause area, (ii) or­ga­ni­za­tions or in­di­vi­d­u­als work­ing on the strate­gies that are seen as top-tier by peo­ple fo­cused on the cause area. How much one fa­vors (i) ver­sus (ii) de­pends largely on whether one thinks the top-tier strate­gies are fairly well-es­tab­lished and thus (ii) makes sense, or whether they will change over time such that one should fa­vor (i) be­cause those or­ga­ni­za­tions and in­di­vi­d­u­als will be bet­ter able to ad­just.[29]

With the ex­plicit fo­cus defi­ni­tions of AIA and MCE (re­call this in­cludes hav­ing a far fu­ture fo­cus), it seems that MCE is much more ne­glected and has more low-hang­ing fruit.[30] For ex­am­ple, there is only one or­ga­ni­za­tion that I know of ex­plic­itly com­mit­ted to MCE in the EA com­mu­nity (SI), while nu­mer­ous or­ga­ni­za­tions (MIRI, CHAI, part of FHI, part of CSER, even parts of AI ca­pa­bil­ities or­ga­ni­za­tions like Mon­treal In­sti­tute for Learn­ing Al­gorithms, Deep­Mind, and OpenAI, etc.) are ex­plic­itly com­mit­ted to AIA. Be­cause MCE seems more ne­glected, we could learn a lot about MCE through SI’s ini­tial work, such as how eas­ily ad­vo­cates have achieved MCE through­out his­tory.

If we in­clude those work­ing on the cause area with­out an ex­plicit fo­cus, then that seems to widen the defi­ni­tion of MCE to in­clude some of the top strate­gies be­ing used to ex­pand the moral cir­cle in the near-term, such as farmed an­i­mal work done by An­i­mal Char­ity Eval­u­a­tors and it’s top-recom­mended char­i­ties, which have a com­binedbud­getofaround $7.5 mil­lion in 2016. The com­bined bud­gets of top-tier AIA work is harder to es­ti­mate, but the Cen­tre for Effec­tive Altru­ism es­ti­mates all AIA work in 2016 was around $6.6 mil­lion. The AIA bud­gets seem to be in­creas­ing more quickly than the MCE bud­gets, es­pe­cially given the grant-mak­ing of the Open philan­thropy pro­ject. We could also in­clude EA move­ment-build­ing or­ga­ni­za­tions that place a strong fo­cus on re­duc­ing ex­tinc­tion risk, and even AIA speci­fi­cally, such as 80,000 Hours. The cat­e­go­riza­tion for MCE seems to have more room to broaden, per­haps all the way to main­stream an­i­mal ad­vo­cacy strate­gies like the work of Peo­ple for the Eth­i­cal Treat­ment of An­i­mals (PETA), which might make AIA more ne­glected. (It could po­ten­tially go even farther, such as ad­vo­cat­ing for hu­man sweat­shop la­bor­ers, but that seems too far re­moved and I don’t know any MCE ad­vo­cates who think it’s plau­si­bly top-tier.)

I think there’s a differ­ence in ap­ti­tude that sug­gests MCE is more ne­glected. Mo­ral ad­vo­cacy seems like a field which, while quite crowded, seems rel­a­tively easy for de­liber­ate, thought­ful peo­ple to vastly out­perform the av­er­age ad­vo­cate,[31] which can lead to sur­pris­ingly large im­pact (e.g. EAs have already had far more suc­cess in pub­lish­ing their writ­ing, such as books and op-eds, than most writ­ers hope for).[32] Ad­di­tion­ally, de­spite cen­turies of ad­vo­cacy, very lit­tle qual­ity re­search has been done to crit­i­cally ex­am­ine what ad­vo­cacy is effec­tive and what’s not, while the fields of math, com­puter sci­ence, and ma­chine learn­ing in­volve sub­stan­tial self-re­flec­tion and are largely worked on by aca­demics who seem to use more crit­i­cal think­ing than the av­er­age ac­tivist (e.g. there’s far more skep­ti­cism in these aca­demic com­mu­ni­ties, a de­mand for rigor and ex­per­i­men­ta­tion that’s rarely seen among ad­vo­cates). In gen­eral, I think the ap­ti­tude of the av­er­age so­cial change ad­vo­cate is much lower than that of the av­er­age tech­nolog­i­cal re­searcher, sug­gest­ing MCE is more ne­glected, though of course other fac­tors also count.

The rel­a­tive ne­glect­ed­ness of MCE also seems likely to con­tinue, given the greater self-in­ter­est hu­man­ity has in AIA rel­a­tive to MCE and, in my opinion, the net bi­ases to­wards AIA de­scribed in the Bi­ases sec­tion of this blog post. (This self-in­ter­est ar­gu­ment is a par­tic­u­larly im­por­tant con­sid­er­a­tion for pri­ori­tiz­ing MCE over AIA in my view.[33])

How­ever, while ne­glect­ed­ness is typ­i­cally thought to make a pro­ject more tractable, it seems that ex­ist­ing work in the ex­tinc­tion risk space has made marginal con­tri­bu­tions more im­pact­ful in some ways. For ex­am­ple, tal­ented AI re­searchers can find work rel­a­tively eas­ily at an or­ga­ni­za­tion ded­i­cated to AIA, while the path for tal­ented MCE re­searchers is far less clear and easy. This al­ludes to the differ­ence in tractabil­ity that might ex­ist be­tween la­bor re­sources and fund­ing re­sources, as it cur­rently seems like MCE is much more fund­ing-con­strained[34] while AIA is largely tal­ent-con­strained.

As an­other ex­am­ple, there are already solid in­roads be­tween the AIA com­mu­nity and the AI de­ci­sion-mak­ers, and AI de­ci­sion-mak­ers have already ex­pressed in­ter­est in AIA, sug­gest­ing that in­fluenc­ing them with re­search re­sults will be fairly easy once those re­search re­sults are in hand. This means both that our es­ti­ma­tion of AIA’s ne­glect­ed­ness should de­crease, and that our es­ti­ma­tion of its non-ne­glect­ed­ness tractabil­ity should in­crease, in the sense that ne­glect­ed­ness is a part of tractabil­ity. (The defi­ni­tions in this frame­work vary.)All things con­sid­ered, I find MCE to be more com­pel­ling from a ne­glect­ed­ness per­spec­tive, par­tic­u­larly due to the cur­rent EA re­source al­lo­ca­tion and the self-in­ter­est hu­man­ity has, and will most likely con­tinue to have, in AIA. When I de­cided to fo­cus on MCE, there was an even stronger case for ne­glect­ed­ness be­cause no or­ga­ni­za­tion ex­isted com­mit­ted to that goal (SI was founded in 2017), though there was an in­creased down­side to MCE — the even more limited track record.

Cooperation

Values spread­ing as a far fu­ture in­ter­ven­tion has been crit­i­cized on the fol­low­ing grounds: Peo­ple have very differ­ent val­ues, so try­ing to pro­mote your val­ues and change other peo­ple’s could be seen as un­co­op­er­a­tive. Co­op­er­a­tion seems to be use­ful both di­rectly (e.g. how will­ing are other peo­ple to help us out if we’re fight­ing them?) and in a broader sense be­cause of su­per­ra­tional­ity, an ar­gu­ment that one should help oth­ers even when there’s no causal mechanism for re­cip­ro­ca­tion.[35]

I think this is cer­tainly a good con­sid­er­a­tion against some forms of val­ues spread­ing. For ex­am­ple, I don’t think it’d be wise for an MCE-fo­cused EA to dis­rupt the Effec­tive Altru­ism Global con­fer­ences (e.g. yell on stage and try to keep the con­fer­ence from con­tin­u­ing) if they have an in­suffi­cient fo­cus on MCE. This seems highly in­effec­tive be­cause of how un­co­op­er­a­tive it is, given the EA space is sup­posed to be one for hav­ing challeng­ing dis­cus­sions and solv­ing prob­lems, not merely ad­vo­cat­ing one’s po­si­tions like a poli­ti­cal rally.

How­ever, I don’t think it holds much weight against MCE in par­tic­u­lar for two rea­sons: First, be­cause I don’t think MCE is par­tic­u­larly un­co­op­er­a­tive. For ex­am­ple, I never bring up MCE with some­one and hear, “But I like to keep my moral cir­cle small!” I think this is be­cause there are many differ­ent com­po­nents of our at­ti­tudes and wor­ld­view that we re­fer to as val­ues and morals. Peo­ple have some deeply-held val­ues that seem strongly re­sis­tant to change, such as their re­li­gion or the welfare of their im­me­di­ate fam­ily, but very few peo­ple seem to have small moral cir­cles as a deeply-held value. In­stead, the small moral cir­cle seems to mostly be a su­perfi­cial, ca­sual value (though it’s of­ten con­nected to the deeper val­ues) that peo­ple are okay with — or even happy about — chang­ing.[36]

Se­cond, in­so­far as MCE is un­co­op­er­a­tive, I think a large num­ber of other EA in­ter­ven­tions, in­clud­ing AIA, are similarly un­co­op­er­a­tive. Many peo­ple even in the EA com­mu­nity are con­cerned with, or even op­posed to, AIA. For ex­am­ple, if one be­lieves an al­igned AI would cre­ate a worse far fu­ture than an un­al­igned AI, or if one thinks AIA is harm­fully dis­tract­ing from more im­por­tant is­sues and gives EA a bad name. This isn’t to say I think AIA is bad be­cause it’s un­co­op­er­a­tive — on the con­trary, this seems like a level of un­co­op­er­a­tive­ness that’s of­ten nec­es­sary for ded­i­cated EAs. (In a triv­ial way, ba­si­cally all ac­tion in­volves un­co­op­er­a­tive­ness be­cause it’s always about chang­ing the sta­tus quo or pre­vent­ing the sta­tus quo from chang­ing.[37] Even in­ac­tion can in­volve un­co­op­er­a­tive­ness if it means not work­ing to help some­one who would like your help.)I do think it’s more im­por­tant to be co­op­er­a­tive in some other situ­a­tions, such as if one has a very differ­ent value sys­tem than some of their col­leagues, as might be the case for the Foun­da­tional Re­search In­sti­tute, which ad­vo­cates strongly for co­op­er­a­tion with other EAs.

Co­op­er­a­tion with fu­ture do-gooders

Another ar­gu­ment against val­ues spread­ing goes some­thing like, “We can worry about val­ues af­ter we’ve safely de­vel­oped AGI. Our trade­off isn’t, ‘Should we work on val­ues or AI?’ but in­stead ‘Should we work on AI now and val­ues later, or val­ues now and maybe AI later if there’s time?’”

I agree with one in­ter­pre­ta­tion of the first part of this ar­gu­ment, that ur­gency is an im­por­tant fac­tor and AIA does seem like a time-sen­si­tive cause area. How­ever, I think MCE is similarly time-sen­si­tive be­cause of risks of value lock-in where our de­scen­dants’ moral­ity be­comes much harder to change, such as if AI de­sign­ers choose to fix the val­ues of an AGI, or at least to make them in­de­pen­dent of other peo­ple’s opinions (they could still be amenable to self-re­flec­tion of the de­signer and new em­piri­cal data about the uni­verse other than peo­ple’s opinions)[38]; if hu­man­ity sends out coloniza­tion ves­sels across the uni­verse that are trav­el­ing too fast for us to ad­just based on our chang­ing moral views; or if so­ciety just be­comes too wide and dis­parate to have effec­tive so­cial change mechanisms like we do to­day on Earth.

I dis­agree with the stronger in­ter­pre­ta­tion, that we can count on some sort of co­op­er­a­tion with or con­trol over fu­ture peo­ple. There might be some ex­tent to which we can do this, such as via su­per­ra­tional­ity, but that seems like a fairly weak effect. In­stead, I think we’re largely on our own, de­cid­ing what we do in the next few years (or per­haps in our whole ca­reer), and just mak­ing our best guess of what fu­ture peo­ple will do. It sounds very difficult to strike a deal with them that will en­sure they work on MCE in ex­change for us work­ing on AIA.

Bias

I’m always cau­tious about bring­ing con­sid­er­a­tions of bias into an im­por­tant dis­cus­sion like this. Con­sid­er­a­tions eas­ily turn into messy, per­sonal at­tacks, and of­ten you can fling roughly-equal con­sid­er­a­tions of counter-bi­ases when ac­cu­sa­tions of bias are hurled at you. How­ever, I think we should give them se­ri­ous con­sid­er­a­tion in this case. First, I want to be ex­haus­tive in this blog post, and that means throw­ing ev­ery con­sid­er­a­tion on the table, even messy ones. Se­cond, my own cause pri­ori­ti­za­tion “jour­ney” led me first to AIA and other non-MCE/​non-an­i­mal-ad­vo­cacy EA pri­ori­ties (mainly EA move­ment-build­ing), and it was con­sid­er­a­tions of bias that al­lowed me to look at the ob­ject-level ar­gu­ments with fresh eyes and de­cide that I had been way off in my pre­vi­ous as­sess­ment.

Third and most im­por­tantly, peo­ple’s views on this topic are in­evitably driven mostly by in­tu­itive, sub­jec­tive judg­ment calls. One could eas­ily read ev­ery­thing I’ve writ­ten in this post and say they lean in the MCE di­rec­tion on ev­ery topic, or the AIA di­rec­tion, and there would be lit­tle ob­ject-level crit­i­cism one could make against that if they just based their view on a differ­ent in­tu­itive syn­the­sis of the con­sid­er­a­tions. This sub­jec­tivity is dan­ger­ous, but it is also hum­bling. It re­quires us to take an hon­est look at our own thought pro­cesses in or­der to avoid the sub­tle, ir­ra­tional effects that might push us in ei­ther di­rec­tion. It also re­quires cau­tion when eval­u­at­ing “ex­pert” judg­ment, given how much ex­perts could be af­fected by per­sonal and so­cial bi­ases them­selves.

The best way I know of to think about bias in this case is to con­sider the bi­ases and other fac­tors that fa­vor ei­ther cause area and see which case seems more pow­er­ful, or which par­tic­u­lar bi­ases might be af­fect­ing our own views. The fol­low­ing lists are pre­sum­ably not ex­haus­tive but lay out what I think are some com­mon key parts of peo­ple’s jour­neys to AIA or MCE. Of course, these fac­tors are not en­tirely de­ter­minis­tic and prob­a­bly not all will ap­ply to you, nor do they nec­es­sar­ily mean that you are wrong in your cause pri­ori­ti­za­tion. Based on the cir­cum­stances that ap­ply more to you, con­sider tak­ing a more skep­ti­cal look at the pro­ject you fa­vor and your cur­rent views on the ob­ject-level ar­gu­ments for it.

They are a part of the EA com­mu­nity, and there­fore drift to­wards the sta­tus quo of EA lead­ers and peers. (The views of EA lead­ers can of course be gen­uine ev­i­dence of the cor­rect cause pri­ori­ti­za­tion, but they can also lead to bias.)

The idea of “sav­ing the world” ap­peals to them.

They take pride in their in­tel­li­gence, and would love if they could save the world just by do­ing brilli­ant tech­ni­cal re­search.

They are com­pet­i­tive, and like the feel­ing/​mind­set of do­ing as­tro­nom­i­cally more good than the av­er­age do-gooder, or even the av­er­age EA. (I’ve ar­gued in this post that MCE has this as­tro­nom­i­cal im­pact, but it lacks the feel­ing of liter­ally “sav­ing the world” or oth­er­wise hav­ing a clear im­pact that makes a good hero’s jour­ney cli­max, and it’s closely tied to lesser, near-term im­pacts.)

They have lit­tle per­sonal ex­pe­rience of ex­treme suffer­ing, the sort that makes one pes­simistic about the far fu­ture, es­pe­cially re­gard­ing s-risks. (Per­sonal ex­pe­rience could be one’s own ex­pe­rience or the ex­pe­riences of close friends and fam­ily.)

They have lit­tle per­sonal ex­pe­rience of op­pres­sion, such as due to their gen­der, race, dis­abil­ities, etc.

They are gen­er­ally a happy per­son.

They are gen­er­ally op­ti­mistic, or at least averse to think­ing about bad out­comes like how hu­man­ity could cause as­tro­nom­i­cal suffer­ing. (Though some pes­simism is re­quired for AIA in the sense that they don’t count on AI ca­pa­bil­ities re­searchers end­ing up with an al­igned AI with­out their help.)

One might be bi­ased to­wards MCE if...

They are ve­gan, es­pe­cially if they went ve­gan for non-an­i­mal or non-far-fu­ture rea­sons, such as for bet­ter per­sonal health.

Their gut re­ac­tion when they hear about ex­tinc­tion risk or AI risk is to judge it non­sen­si­cal.

They have per­sonal con­nec­tions to an­i­mals, such as grow­ing up with pets.

They are or have been a fan of so­cial move­ment/​ac­tivism liter­a­ture and me­dia, es­pe­cially if they dreamed of be­ing a move­ment leader.

They have a ten­dency to­wards so­cial pro­jects over tech­ni­cal re­search.

They have benefit­ted from above-av­er­age so­cial skills.

They are in­clined to­wards so­cial sci­ence.

They have a pos­i­tive per­cep­tion of ac­tivists, per­haps see­ing them as the true lead­ers of his­tory.

They have so­cial ties to ve­g­ans and an­i­mal ad­vo­cates. (The views of these peo­ple can of course be gen­uine ev­i­dence of the cor­rect cause pri­ori­ti­za­tion, but they can also lead to bias.)

The idea of “helping the worst off” ap­peals to them.

They take pride in their so­cial skills, and would love if they could help the worst off just by be­ing so­cially savvy.

They are not com­pet­i­tive, and like the thought of be­ing a part of a friendly so­cial move­ment.

They have a lot of per­sonal ex­pe­rience of ex­treme suffer­ing, the sort that makes one pes­simistic about the far fu­ture, es­pe­cially re­gard­ing s-risks. (Per­sonal ex­pe­rience could be one’s own ex­pe­rience or the ex­pe­riences of close friends and fam­ily.)

They have a lot of per­sonal ex­pe­rience of op­pres­sion, such as due to their gen­der, race, dis­abil­ities, etc.

They are gen­er­ally an un­happy per­son.

They are gen­er­ally pes­simistic, or at least don’t like think­ing about good out­comes. (Though some op­ti­mism is re­quired for MCE in the sense that they be­lieve work on MCE can make a large pos­i­tive differ­ence in so­cial at­ti­tudes and be­hav­ior.)

They care a lot about di­rectly see­ing the im­pact of their work, even if the bulk of their im­pact is hard to see. (E.g. see­ing im­prove­ments in the con­di­tions of farmed an­i­mals, which can be seen as a proxy for helping farmed-an­i­mal-like be­ings in the far fu­ture.)

Implications

I per­son­ally found my­self far more com­pel­led to­wards AIA in my early in­volve­ment with EA be­fore I had thought in de­tail about the is­sues dis­cussed in this post. I think the list items in the AIA sec­tion ap­ply to me much more strongly than the MCE list. When I con­sid­ered these bi­ases, in par­tic­u­lar speciesism and my de­sire to fol­low the sta­tus quo of my EA friends, a fresh look at the ob­ject-level ar­gu­ments changed my mind.

From my read­ing and con­ver­sa­tions in EA, I think the bi­ases in fa­vor of AIA are also quite a bit stronger in the com­mu­nity, though of course some EAs — mainly those already work­ing on an­i­mal is­sues for near-term rea­sons — prob­a­bly feel a stronger pull in the other di­rec­tion.

How you think about these bias con­sid­er­a­tions also de­pends on how bi­ased you think the av­er­age EA is. If you, for ex­am­ple, think EAs tend to be quite bi­ased in an­other way like “mea­sure­ment bias” or “quan­tifi­a­bil­ity bias” (a ten­dency to fo­cus too much on eas­ily-quan­tifi­able, low-risk in­ter­ven­tions), then con­sid­er­a­tions of bi­ases on this topic should prob­a­bly be more com­pel­ling to you than they will be to peo­ple who think EAs are less bi­ased.

Notes

[1] This post at­tempts to com­pare these cause ar­eas over­all, but since that’s some­times too vague, I speci­fi­cally mean the strate­gies within each cause area that seem most promis­ing. I think this is ba­si­cally equal to “what EAs work­ing on MCE most strongly pri­ori­tize” and “what EAs work­ing on AIA most strongly pri­ori­tize.”

[2] There’s a sense in which AIA is a form of MCE sim­ply be­cause AIA will tend to lead to cer­tain val­ues. I’m ex­clud­ing that AIA ap­proach of MCE from my anal­y­sis here to avoid over­lap be­tween these two cause ar­eas.

[3] Depend­ing on how close we’re talk­ing about, this could be quite un­likely. If we’re dis­cussing the range of out­comes from dystopia across the uni­verse to utopia across the uni­verse, then a range like “be­tween mod­ern earth and the op­po­site value of mod­ern earth” seems like a very tiny frac­tion of the to­tal pos­si­ble range.

[4] I mean “good” in a “pos­i­tive im­pact” sense here, so it in­cludes not just ra­tio­nal­ity ac­cord­ing to the de­ci­sion-maker but also value al­ign­ment, luck, be­ing em­piri­cally well-in­formed, be­ing ca­pa­ble of do­ing good things, etc.

[5] One rea­son for op­ti­mism is that you might think most ex­tinc­tion risk is in the next few years, such that you and other EAs you know to­day will still be around to do this re­search your­selves and make good de­ci­sions af­ter those risks are avoided.

[6] Tech­ni­cally one could be­lieve the far fu­ture is nega­tive but also that hu­mans will make good de­ci­sions about ex­tinc­tion, such as if one be­lieves the far fu­ture (given non-ex­tinc­tion) will be bad only due to non­hu­man forces, such as aliens or evolu­tion­ary trends, but has op­ti­mism about hu­man de­ci­sion-mak­ing, in­clud­ing both that hu­mans will make good de­ci­sions about ex­tinc­tion and that they will be lo­gis­ti­cally able to make those de­ci­sions. I think this is an un­likely view to set­tle on, but it would make op­tion value a good thing in a “close to zero” sce­nario.[7] Non-ex­tinct civ­i­liza­tions could be max­i­mized for hap­piness, max­i­mized for in­ter­est­ing­ness, set up like Star Wars or an­other sci-fi sce­nario, etc. while ex­tinct civ­i­liza­tions would all be de­void of sen­tient be­ings, per­haps with some vari­a­tion in phys­i­cal struc­ture like differ­ent planets or rem­nant struc­tures of hu­man civ­i­liza­tion. [8] My views on this are cur­rently largely qual­i­ta­tive, but if I had to put a num­ber on the word “sig­nifi­cant” in this con­text, it’d be some­where around 5-30%. This is a very in­tu­itive es­ti­mate, and I’m not pre­pared to jus­tify it.[9] Paul Chris­ti­ano made a gen­eral ar­gu­ment in fa­vor of hu­man­ity reach­ing good val­ues in the long run due to re­flec­tion in his post “Against Mo­ral Ad­vo­cacy” (see the “Op­ti­mism about re­flec­tion” sec­tion) though he doesn’t speci­fi­cally ad­dress con­cern for all sen­tient be­ings as a po­ten­tial out­come, which might be less likely than other good val­ues that are more driven by co­op­er­a­tion.“[10] Nick Bostrom has con­sid­ered some of these risks of ar­tifi­cial suffer­ing us­ing the term “mind crime,” which speci­fi­cally refers to harm­ing sen­tient be­ings cre­ated in­side a su­per­in­tel­li­gence. See his book, Su­per­in­tel­li­gence.[11] The Foun­da­tional Re­search In­sti­tute has writ­ten about risks of as­tro­nom­i­cal suffer­ing in “Re­duc­ing Risks of Astro­nom­i­cal Suffer­ing: A Ne­glected Pri­or­ity.” The TV se­ries Black Mir­ror is an in­ter­est­ing dra­matic ex­plo­ra­tion of how the far fu­ture could in­volve vasts amounts of suffer­ing, such as the epi­sodes “White Christ­mas” and “USS Cal­lister.” Of course, the de­tails of these situ­a­tions of­ten veer to­wards en­ter­tain­ment over re­al­ism, but their ex­plo­ra­tion of the po­ten­tial for dystopias in which peo­ple abuse sen­tient digi­tal en­tities is thought-pro­vok­ing.[12] I’m highly un­cer­tain about what sort of mo­ti­va­tions (like hap­piness and suffer­ing in hu­mans) fu­ture digi­tal sen­tient be­ings will have. For ex­am­ple, is pun­ish­ment be­ing a stronger mo­ti­va­tor in earth-origi­nat­ing life just an evolu­tion­ary fluke that we can ex­pect to dis­si­pate in ar­tifi­cial be­ings? Could they be just as mo­ti­vated to at­tain re­ward as we are to avoid pun­ish­ment? I think this is a promis­ing av­enue for fu­ture re­search, and I’m glad it’s be­ing dis­cussed by some EAs.[13] Brian To­masik dis­cusses this in his es­say on “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” sug­gest­ing that, “there’s not an ob­vi­ous similar mechanism push­ing or­ganisms to­ward the things that I care about.” How­ever, Paul Chris­ti­ano notes in “Against Mo­ral Ad­vo­cacy” that he ex­pects “[c]on­ver­gence of val­ues” be­cause “the space of all hu­man val­ues is not very broad,” though this seems quite de­pen­dent on how one defines the pos­si­ble space of val­ues.[14] This effi­ciency ar­gu­ment is also dis­cussed in Ben West’s ar­ti­cle on “An Ar­gu­ment for Why the Fu­ture May Be Good.”[15] The term “re­sources” is in­ten­tion­ally quite broad. This means what­ever the limi­ta­tions are on the abil­ity to pro­duce hap­piness and suffer­ing, such as en­ergy or com­pu­ta­tion.[16] One can also cre­ate he­do­nium as a promise to get things from ri­vals, but promises seem less com­mon than threats be­cause threats tend to be more mo­ti­vat­ing and eas­ier to im­ple­ment (e.g it’s eas­ier to de­stroy than cre­ate). How­ever, some so­cial norms en­courage promises over threats be­cause promises are bet­ter for so­ciety as a whole. Ad­di­tion­ally, threats against pow­er­ful be­ings (e.g. other cit­i­zens in the same coun­try) do less than threats against less pow­er­ful, or more dis­tant be­ings, and the lat­ter cat­e­gory might be in­creas­ingly com­mon in the fu­ture. Ad­di­tion­ally, threats and promises mat­ter less when one con­sid­ers that they are of­ten un­fulfilled be­cause the other party doesn’t do the ac­tion that was the sub­ject of the threat or promise.[17] Paul Chris­ti­ano’s blog post on “Why might the fu­ture be good?” ar­gues that “the fu­ture will be char­ac­ter­ized by much higher in­fluence for al­tru­is­tic val­ues [than self-in­ter­est],” though he seems to just be dis­cussing the po­ten­tial of al­tru­ism and self-in­ter­est to cre­ate pos­i­tive value, rather than their po­ten­tial to cre­ate nega­tive value.

Brian To­masik dis­cusses Chris­ti­ano’s ar­gu­ment and oth­ers in “The Fu­ture of Dar­winism” and con­cludes, “Whether the fu­ture will be de­ter­mined by Dar­winism or the de­liber­ate de­ci­sions of a unified gov­ern­ing struc­ture re­mains un­clear.”[18] One dis­cus­sion of changes in moral­ity on a large scale is Robin Han­son’s blog post, “For­ager, Farmer Mo­rals.”

[19] Arm­chair re­search is rel­a­tively easy, in the sense that all it re­quires is writ­ing and think­ing rather than also dig­ging through his­tor­i­cal texts, run­ning sci­en­tific stud­ies, or en­gag­ing in sub­stan­tial con­ver­sa­tion with ad­vo­cates, re­searchers, and/​or other stake­hold­ers. It’s also more similar to the math­e­mat­i­cal and philo­soph­i­cal work that most EAs are used to do­ing. And it’s more at­trac­tive as a demon­stra­tion of per­sonal prowess to think your way into a cru­cial con­sid­er­a­tion than to ar­rive at one through the te­dious work of re­search. (Th­ese rea­sons are similar to the rea­sons I feel most far-fu­ture-fo­cused EAs are bi­ased to­wards AIA over MCE.)[20] Th­ese sen­tient be­ings prob­a­bly won’t be the biolog­i­cal an­i­mals we know to­day, but in­stead digi­tal be­ings who can more effi­ciently achieve the AI’s goals.[21] The ne­glect­ed­ness heuris­tic in­volves a similar messi­ness of defi­ni­tions, but the choices seem less ar­bi­trary to me, and the differ­ent defi­ni­tions lead to more similar re­sults.[22] Ar­guably this con­sid­er­a­tion should be un­der Tractabil­ity rather than Scale.[23] There’s a re­lated fram­ing here of “lev­er­age,” with the ba­sic ar­gu­ment be­ing that AIA seems more com­pel­ling than MCE be­cause AIA is speci­fi­cally tar­geted at an im­por­tant, nar­row far fu­ture fac­tor (the de­vel­op­ment of AGI) while MCE is not as speci­fi­cally tar­geted. This also sug­gests that we should con­sider spe­cific MCE tac­tics fo­cused on im­por­tant, nar­row far fu­ture fac­tors, such as en­sur­ing the AI de­ci­sion-mak­ers have wide moral cir­cles even if the rest of so­ciety lags be­hind. I find this ar­gu­ment fairly com­pel­ling, in­clud­ing the im­pli­ca­tion that MCE ad­vo­cates should fo­cus more on ad­vo­cat­ing for digi­tal sen­tience and ad­vo­cat­ing in the EA com­mu­nity than they would oth­er­wise.[24] Though plau­si­bly MCE in­volves only in­fluenc­ing a few de­ci­sion-mak­ers, such as the de­sign­ers of an AGI.[25] Brian To­masik dis­cusses this in, “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” ar­gu­ing that, “Very likely our val­ues will be lost to en­tropy or Dar­wi­nian forces be­yond our con­trol. How­ever, there’s some chance that we’ll cre­ate a sin­gle­ton in the next few cen­turies that in­cludes goal-preser­va­tion mechanisms al­low­ing our val­ues to be “locked in” in­definitely. Even ab­sent a sin­gle­ton, as long as the vast­ness of space al­lows for dis­tinct re­gions to ex­e­cute on their own val­ues with­out take-over by other pow­ers, then we don’t even need a sin­gle­ton; we just need goal-preser­va­tion mechanisms.”[26] Brian To­masik dis­cusses the like­li­hood of value lock-in in his es­say, “Will Fu­ture Civ­i­liza­tion Even­tu­ally Achieve Goal Preser­va­tion?”

[27] The ad­vent of AGI seems like it will have similar effects on the lock-in of val­ues and al­ign­ment, so if you think AI timelines are shorter (i.e. ad­vanced AI will be de­vel­oped sooner), then that in­creases the ur­gency of both cause ar­eas. If you think timelines are so short that we will strug­gle to suc­cess­fully reach AI al­ign­ment, then that de­creases the tractabil­ity of AIA, but MCE seems like it could more eas­ily have a par­tial effect on AI out­comes than AIA could.[28] In the case of near-term, di­rect in­ter­ven­tions, one might be­lieve that “most so­cial pro­grammes don’t work,” which sug­gests that we should have low, strong pri­ors for in­ter­ven­tion effec­tive­ness that we need ro­bust­ness to over­come.[29] Cas­par Oester­held dis­cusses the am­bi­guity of ne­glect­ed­ness defi­ni­tions in his blog post, “Com­pli­ca­tions in eval­u­at­ing ne­glect­ed­ness.” Other EAs have also raised con­cern about this com­monly-used heuris­tic, and I al­most in­cluded this con­tent in this post un­der the “Tractabil­ity” sec­tion for this rea­son.[30] This is a fairly in­tu­itive sense of the word “matched.” I’m tak­ing the topic of ways to af­fect the far fu­ture, di­vid­ing it into pop­u­la­tion risk and qual­ity risk cat­e­gories, then treat­ing AIA and MCE as sub­cat­e­gories of each. I’m also think­ing in terms of each pro­ject (AIA and MCE) be­ing in the cat­e­gory of “cause ar­eas with at least pretty good ar­gu­ments in their fa­vor,” and I think “put de­cent re­sources into all such pro­jects un­til the ar­gu­ments are re­but­ted” is a good ap­proach for the EA com­mu­nity.

[31] I mean “ad­vo­cate” quite broadly here, just any­one work­ing to effect so­cial change, such as peo­ple sub­mit­ting op-eds to news­pa­pers or try­ing to get pedes­tri­ans to look at their protest or take their leaflets.[32] It’s un­clear what the ex­pla­na­tion is for this. It could just be de­mo­graphic differ­ences such as high IQ, go­ing to elite uni­ver­si­ties, etc. but it could also be ex­cep­tional “ra­tio­nal­ity skills” like find­ing loop­holes in the pub­lish­ing sys­tem.[33] In Brian To­masik’s es­say on “Values Spread­ing is Often More Im­por­tant than Ex­tinc­tion Risk,” he ar­gues that “[m]ost peo­ple want to pre­vent ex­tinc­tion” while, “In con­trast, you may have par­tic­u­lar things that you value that aren’t widely shared. Th­ese things might be easy to cre­ate, and the in­tu­ition that they mat­ter is prob­a­bly not too hard to spread. Thus, it seems likely that you would have higher lev­er­age in spread­ing your own val­ues than in work­ing on safety mea­sures against ex­tinc­tion.”[34] This is just my per­sonal im­pres­sion from work­ing in MCE, es­pe­cially with my or­ga­ni­za­tion Sen­tience In­sti­tute. With in­di­rect work, The Good Food In­sti­tute is a po­ten­tial ex­cep­tion since they have strug­gled to quickly hired tal­ented peo­ple af­ter their large amounts of fund­ing.[35] See “Su­per­ra­tional­ity” in “Rea­sons to Be Nice to Other Value Sys­tems” for an EA in­tro­duc­tion to the idea. See “In fa­vor of ‘be­ing nice’” in “Against Mo­ral Ad­vo­cacy” as ex­am­ple of co­op­er­a­tion as an ar­gu­ment against val­ues spread­ing. In “Mul­ti­verse-wide Co­op­er­a­tion via Cor­re­lated De­ci­sion Mak­ing,” Cas­par Oester­held ar­gues that su­per­ra­tional co­op­er­a­tion makes MCE more im­por­tant.[36] This dis­cus­sion is com­pli­cated by the widely vary­ing de­grees of MCE. While, for ex­am­ple, most US res­i­dents seem perfectly okay with ex­pand­ing con­cern to ver­te­brates, there would be more op­po­si­tion to ex­pand­ing to in­sects, and even more to some sim­ple com­puter pro­grams that some ar­gue should fit into the edges of our moral cir­cles. I do think the farthest ex­pan­sions are much less co­op­er­a­tive in this sense, though if the mes­sage is just framed as, “ex­pand our moral cir­cle to all sen­tient be­ings,” I still ex­pect strong agree­ment.[37] One ex­cep­tion is a situ­a­tion where ev­ery­one wants a change to hap­pen, but no­body else wants it badly enough to put the work into chang­ing the sta­tus quo.[38] My im­pres­sion is that the AI safety com­mu­nity cur­rently wants to avoid fix­ing these val­ues, though they might still be try­ing to make them re­sis­tant to ad­vo­cacy from other peo­ple, and in gen­eral I think many peo­ple to­day would pre­fer to fix the val­ues of an AGI when they con­sider that they might not agree with po­ten­tial fu­ture val­ues.

Thanks for writ­ing this, I thought it was a good ar­ti­cle. And thanks to Greg for fund­ing it.

My push­back would be on the co­op­er­a­tion and co­or­di­na­tion point. It seems that a lot of other peo­ple, with other moral val­ues, could make a very similar ar­gu­ment: that they need to pro­mote their val­ues now, as the stakes as very high with pos­si­ble up­com­ing value lock-in. To peo­ple with those val­ues, these ar­gu­ments should seem roughly as im­por­tant as the above ar­gu­ment is to you.

Chris­ti­ans could ar­gue that, if the sin­gu­lar­ity is ap­proach­ing, it is vi­tally im­por­tant that we en­sure the uni­verse won’t be filled with sin­ners who will go to hell.

Egal­i­tar­i­ans could ar­gue that, if the sin­gu­lar­ity is ap­proach­ing, it is vi­tally im­por­tant that we en­sure the uni­verse won’t be filled with wider and wider di­ver­si­ties of wealth.

Liber­tar­i­ans could ar­gue that, if the sin­gu­lar­ity is ap­proach­ing, it is vi­tally im­por­tant that we en­sure the uni­verse won’t be filled with prop­erty rights vi­o­la­tions.

Nat­u­ral­ists could ar­gue that, if the sin­gu­lar­ity is ap­proach­ing, it is vi­tally im­por­tant that we en­sure the beauty of na­ture won’t be be­spoiled all over the uni­verse.

Na­tion­al­ists could ar­gue that, if the sin­gu­lar­ity is ap­proach­ing, it is vi­tally im­por­tant that we en­sure the uni­verse will be filled with peo­ple who re­spect the flag.

But it seems that it would be very bad if ev­ery­one took this ad­vice liter­ally. We would all end up spend­ing a lot of time and effort on pro­pa­ganda, which would prob­a­bly be great for ad­ver­tis­ing com­pa­nies but not much else, as so much of it is zero sum. Even though it might make sense, by their val­ues, for ex­pand­ing-moral-cir­cle peo­ple and pro-abor­tion peo­ple to have a big pro­pa­ganda war over whether foe­tuses de­serve moral con­sid­er­a­tion, it seems plau­si­ble we’d be bet­ter off if they both de­cided to spend the money on anti-malaria bed­nets.

In con­trast, pre­vent­ing the ex­tinc­tion of hu­man­ity seems to oc­cupy a priv­ileged po­si­tion—not ex­actly com­pa­rable with the above agen­das, though I can’t ex­actly cache out why it seems this way to me. Per­haps to de­vout Con­fu­ci­ans a pre-oc­cu­pa­tion with pre­vent­ing ex­tinc­tion seems to be just an­other dis­trac­tion from the im­por­tant task of ex­press­ing filial piety – though I doubt this.

(Mo­ral Real­ists, of course, could ar­gue that the situ­a­tion is not re­ally sym­met­ric, be­cause pro­mot­ing the true val­ues is dis­tinctly differ­ent from pro­mot­ing any other val­ues.)

How­ever, MCE is com­pet­ing in a nar­rower space than just val­ues. It’s in the MC space, which is just the space of ad­vo­cacy on what our moral cir­cle should look like. So I think it’s fairly dis­tinct from the list items in that sense, though you could still say they’re in the same space be­cause all ad­vo­cacy com­petes for news cov­er­age, ad buys, re­cruit­ing ad­vo­cacy-ori­ented peo­ple, etc. (Tech­nol­ogy pro­jects could also com­pete for these things, though there are sep­a­ra­tions, e.g. jour­nal­ists with a so­cial beat ver­sus jour­nal­ists with a tech beat.)

I think the com­pa­rably nar­row space of ERR is ER, which also in­cludes peo­ple who don’t want ex­tinc­tion risk re­duced (or even want it in­creased), such as some hard­core en­vi­ron­men­tal­ists, anti­na­tal­ists, and nega­tive util­i­tar­i­ans.

I think these are le­gi­t­i­mate co­op­er­a­tion/​co­or­di­na­tion per­spec­tives, and it’s not re­ally clear to me how they add up. But in gen­eral, I think this mat­ters mostly in situ­a­tions where you ac­tu­ally can co­or­di­nate. For ex­am­ple, in the US gen­eral elec­tion when Democrats and Repub­li­cans come to­gether and agree not to give to their re­spec­tive cam­paigns (in ex­change for their coun­ter­part also not do­ing so). Or if there were anti-MCE EAs with whom MCE EAs could co­or­di­nate (which I think is ba­si­cally what you’re say­ing with “we’d be bet­ter off if they both de­cided to spend the money on anti-malaria bed­nets”).

As EA as a move­ment has grown so far, the com­mu­nity ap­pears to con­verge upon a ra­tio­nal­iza­tion pro­cess whereby most of us have re­al­ized what is cen­trally morally im­por­tant is the ex­pe­riences of well-be­ing of a rel­a­tively wide breadth of moral pa­tients, and the rel­a­tively equal moral weight as­signed to well-be­ing of each moral pa­tient. The differ­ence be­tween SI and those who fo­cus on AIA is pri­mar­ily their differ­ing es­ti­mates of the ex­pected value of far-fu­ture in terms of av­er­age or to­tal well-be­ing. Among the ex­am­ples you pro­vided, it seems some wor­ld­views are more amenable to the ra­tio­nal­iza­tion pro­cess which lends it­self to con­se­quen­tial­ism and EA. Many com­mu­nity mem­bers were egal­i­tar­i­ans and liber­tar­i­ans who find com­mon cause now in try­ing to figure out if to fo­cus on AIA or MCE. I think your point is im­por­tant in that ul­ti­mately ad­vo­cat­ing for this type of val­ues spread­ing could be bad. How­ever what ap­pears to be an ex­treme amount of di­ver­sity could end up look­ing less fraught in a com­pe­ti­tion among val­ues as di­ver­gent wor­ld­views con­verge on similar goals.

Since differ­ent types of wor­ld­views, like any amenable to ag­gre­gate con­se­quen­tial­ist frame­works, can col­late around a sin­gle goal of some­thing like MCE. The rele­vance of your point, then, would hinge upon how uni­ver­sal MCE re­ally is or can be across wor­ld­views, rel­a­tive to other types of val­ues, such that it wouldn’t clash with many wor­ld­views in a val­ues-spread­ing con­test. This is a mat­ter of de­bate I haven’t thought of. It seems an im­por­tant way to frame solu­tions to the challenge to Jacy’s point you raise.

But it seems that it would be very bad if ev­ery­one took this ad­vice liter­ally.

For­tu­nately, not ev­ery­one does take this ad­vice liter­ally :).

This is very similar to the tragedy of the com­mons. If ev­ery­one acts out of their own self mo­ti­vated in­ter­ests, then ev­ery­one will be worse off. How­ever, the situ­a­tion as you de­scribed does not fully re­flect re­al­ity be­cause none of the groups you men­tioned are ac­tu­ally try­ing to in­fluence AI re­searchers at the mo­ment. There­fore, MCE has a de­ci­sive ad­van­tage. Of course, this is always sub­ject to change.

In con­trast, pre­vent­ing the ex­tinc­tion of hu­man­ity seems to oc­cupy a priv­ileged position

I find that it is of­ten the case that peo­ple will dis­miss any spe­cific moral recom­men­da­tion for AI ex­cept this one. Per­son­ally I don’t see a rea­son to think that there are cer­tain uni­ver­sal prin­ci­ples of min­i­mal al­ign­ment. You may ar­gue that hu­man ex­tinc­tion is some­thing that al­most ev­ery­one agrees is bad—but now the prin­ci­ple of min­i­mal al­ign­ment has shifted to “have the AI pre­vent things that al­most ev­ery­one agrees is bad” which is an­other priv­ileged moral judge­ment that I see no in­trin­sic rea­son to hold.

In truth, I see no neu­tral as­sump­tions to ground AI al­ign­ment the­ory in. I think this is made even more difficult be­cause even rel­a­tively small differ­ences in moral the­ory from the point of view of in­for­ma­tion the­o­retic de­scrip­tions of moral val­ues can lead to dras­ti­cally differ­ent out­comes. How­ever, I do find hope in moral com­pro­mise.

Thank you for writ­ing this post. An ev­er­green difficulty that ap­plies to dis­cussing top­ics of such a broad scope is the large num­ber of mat­ters that are rele­vant, difficult to judge, and where one’s judge­ment (what­ever it may be) can be rea­son­ably challenged. I hope to offer a crisper sum­mary of why I am not per­suaded.

I un­der­stand from this the pri­mary mo­ti­va­tion of MCE is avoid­ing AI-based dystopias, with the im­plied causal chain be­ing along the lines of, “If we en­sure the hu­mans gen­er­at­ing the AI have a broader cir­cle of moral con­cern, the re­sult­ing post-hu­man civ­i­liza­tion is less likely to in­clude dystopic sce­nar­ios in­volv­ing great mul­ti­tudes of suffer­ing sen­tiences.”

There are two con­sid­er­a­tions that speak against this be­ing a greater pri­or­ity than AI al­ign­ment re­search: 1) Back-chain­ing from AI dystopias leaves rel­a­tively few oc­ca­sions where MCE would make a cru­cial differ­ence. 2) The cur­rent port­fo­lio of ‘EA-based’ MCE is poorly ad­dressed to avert­ing AI-based dystopias.

Re. 1): MCE may prove nei­ther nec­es­sary nor suffi­cient for en­sur­ing AI goes well. On one hand, AI de­sign­ers, even if speciesist them­selves, might nonethe­less provide the right ap­para­tus for value learn­ing such that re­sult­ing AI will not prop­a­gate the moral mis­takes of its cre­ators. On the other, even if the AI-de­sign­ers have the de­sired broad moral cir­cle, they may have other cru­cial moral faults (maybe parochial in other re­spects, maybe self­ish, maybe in­suffi­ciently re­flec­tive, maybe some mis­taken par­tic­u­lar moral judge­ments, maybe naive ap­proaches to co­op­er­a­tion or pop­u­la­tion ethics, and so on) - even if they do not, there are man­i­fold ways in the wider en­vi­ron­ment (e.g. arms races), or in terms of tech­ni­cal im­ple­men­ta­tion, that may in­cur dis­aster.

It seems clear to me that, pro tanto, the less speciesist the AI-de­signer, the bet­ter the AI. Yet for this is­sue to be of such fun­da­men­tal im­por­tance to be com­pa­rable to AI safety re­search gen­er­ally, the im­pli­ca­tion is of an im­plau­si­ble doc­trine of ‘AI im­mac­u­late con­cep­tion’: only by en­sur­ing we our­selves are free from sin can we con­ceive an AI which will not err in a morally im­por­tant way.

Re 2): As Plant notes, MCE does not arise from an­i­mal causes alone: global poverty, cli­mate change also act to ex­tend moral cir­cles, as well as prop­a­gat­ing other valuable moral norms. Look­ing at things the other way, one should ex­pect the an­i­mal causes found most valuable from the per­spec­tive of avoid­ing AI-based dystopia to di­verge con­sid­er­ably from those picked on face-value an­i­mal welfare. Com­pan­ion an­i­mal causes are far in­fe­rior from the lat­ter per­spec­tive, but un­clear on the former if this a good way of fos­ter­ing con­cern for an­i­mals; if the cru­cial thing is for AI-cre­ators not to be speciest over the gen­eral pop­u­la­tion, tar­geted in­ter­ven­tions like ‘Start a pet­ting zoo at Deep­mind’ look bet­ter than broader ones, like the abo­li­tion of fac­tory farm­ing.

The up­shot is that, even if there are some par­tic­u­larly high yield in­ter­ven­tions in an­i­mal welfare from the far fu­ture per­spec­tive, this should be fairly far re­moved from typ­i­cal EAA ac­tivity di­rected to­wards hav­ing the great­est near-term im­pact on an­i­mals. If this post her­alds a pivot of Sen­tience In­sti­tute to di­rec­tions pretty or­thog­o­nal to the prin­ci­pal com­po­nent of effec­tive an­i­mal ad­vo­cacy, this would be wel­come in­deed.

Notwith­stand­ing the above, the ap­proach out­lined above has a role to play in some ideal ‘far fu­ture port­fo­lio’, and it may be rea­son­able for some peo­ple to pri­ori­tise work on this area, if only for rea­sons of com­par­a­tive ad­van­tage. Yet I aver it should re­main a fairly ju­nior mem­ber of this port­fo­lio com­pared to AI-safety work.

Those con­sid­er­a­tions make sense. I don’t have much more to add for/​against than what I said in the post.

On the com­par­i­son be­tween differ­ent MCE strate­gies, I’m pretty un­cer­tain which are best. The main rea­sons I cur­rently fa­vor farmed an­i­mal ad­vo­cacy over your ex­am­ples (global poverty, en­vi­ron­men­tal­ism, and com­pan­ion an­i­mals) are that (1) farmed an­i­mal ad­vo­cacy is far more ne­glected, (2) farmed an­i­mal ad­vo­cacy is far more similar to po­ten­tial far fu­ture dystopias, mainly just be­cause it in­volves vast num­bers of sen­tient be­ings who are largely ig­nored by most of so­ciety. I’m not rel­a­tively very wor­ried about, for ex­am­ple, far fu­ture dystopias where dog-and-cat-like-be­ings (e.g. small, en­ter­tain­ing AIs kept around for com­pan­ion­ship) are suffer­ing in vast num­bers. And en­vi­ron­men­tal­ism is typ­i­cally ad­vo­cat­ing for non-sen­tient be­ings, which I think is quite differ­ent than MCE for sen­tient be­ings.

I think the bet­ter com­peti­tors to farmed an­i­mal ad­vo­cacy are ad­vo­cat­ing broadly for an­ti­speciesism/​fun­da­men­tal rights (e.g. Non­hu­man Rights Pro­ject) and ad­vo­cat­ing speci­fi­cally for digi­tal sen­tience (e.g. a larger, more so­phis­ti­cated ver­sion of Peo­ple for the Eth­i­cal Treat­ment of Re­in­force­ment Learn­ers). There are good ar­gu­ments against these, how­ever, such as that it would be quite difficult for an ea­ger EA to get much trac­tion with a new digi­tal sen­tience non­profit. (We con­sid­ered found­ing Sen­tience In­sti­tute with a fo­cus on digi­tal sen­tience. This was a big rea­son we didn’t.) Whereas given the cur­rent ex­cite­ment in the farmed an­i­mal space (e.g. the com­ing re­lease of “clean meat,” real meat grown with­out an­i­mal slaugh­ter), the farmed an­i­mal space seems like a fan­tas­tic place for gain­ing trac­tion.

I’m cur­rently not very ex­cited about “Start a pet­ting zoo at Deep­mind” (or similar di­rect out­reach strate­gies) be­cause it seems like it would pro­duce a ton of back­lash be­cause it seems too ad­ver­sar­ial and ag­gres­sive. There are ad­di­tional con­sid­er­a­tions for/​against (e.g. I worry that it’d be difficult to push a niche de­mo­graphic like AI re­searchers very far away from the rest of so­ciety, at least the rest of their so­cial cir­cles; I also have the same trac­tion con­cern I have with ad­vo­cat­ing for digi­tal sen­tience), but this one just seems quite damn­ing.

The up­shot is that, even if there are some par­tic­u­larly high yield in­ter­ven­tions in an­i­mal welfare from the far fu­ture per­spec­tive, this should be fairly far re­moved from typ­i­cal EAA ac­tivity di­rected to­wards hav­ing the great­est near-term im­pact on an­i­mals. If this post her­alds a pivot of Sen­tience In­sti­tute to di­rec­tions pretty or­thog­o­nal to the prin­ci­pal com­po­nent of effec­tive an­i­mal ad­vo­cacy, this would be wel­come in­deed.

I agree this is a valid ar­gu­ment, but given the other ar­gu­ments (e.g. those above), I still think it’s usu­ally right for EAAs to fo­cus on farmed an­i­mal ad­vo­cacy, in­clud­ing Sen­tience In­sti­tute at least for the next year or two.

(FYI for read­ers, Gre­gory and I also dis­cussed these things be­fore the post was pub­lished when he gave feed­back on the draft. So our com­ments might seem a lit­tle re­hearsed.)

The main rea­sons I cur­rently fa­vor farmed an­i­mal ad­vo­cacy over your ex­am­ples (global poverty, en­vi­ron­men­tal­ism, and com­pan­ion an­i­mals) are that (1) farmed an­i­mal ad­vo­cacy is far more ne­glected, (2) farmed an­i­mal ad­vo­cacy is far more similar to po­ten­tial far fu­ture dystopias, mainly just be­cause it in­volves vast num­bers of sen­tient be­ings who are largely ig­nored by most of so­ciety.

Wild an­i­mal ad­vo­cacy is far more ne­glected than farmed an­i­mal ad­vo­cacy, and it in­volves even larger num­bers of sen­tient be­ings ig­nored by most of so­ciety. If the su­pe­ri­or­ity of farmed an­i­mal ad­vo­cacy over global poverty along these two di­men­sions is a suffi­cient rea­son for not work­ing on global poverty, why isn’t the su­pe­ri­or­ity of wild an­i­mal ad­vo­cacy over farmed an­i­mal ad­vo­cacy along those same di­men­sions not also a suffi­cient rea­son for not work­ing on farmed an­i­mal ad­vo­cacy?

I per­son­ally don’t think WAS is as similar to the most plau­si­ble far fu­ture dystopias, so I’ve been pri­ori­tiz­ing it less even over just the past cou­ple of years. I don’t ex­pect far fu­ture dystopias to in­volve as much natur­o­genic (na­ture-caused) suffer­ing, though of course it’s pos­si­ble (e.g. if hu­mans cre­ate large num­bers of sen­tient be­ings in a simu­la­tion, but then let the simu­la­tion run on its own for a while, then the simu­la­tion could come to be viewed as natur­o­genic-ish and those at­ti­tudes could be­come more rele­vant).

I think if one wants some­thing very ne­glected, digi­tal sen­tience ad­vo­cacy is ba­si­cally across-the-board bet­ter than WAS ad­vo­cacy.

That be­ing said, I’m highly un­cer­tain here and these rea­sons aren’t over­whelming (e.g. WAS ad­vo­cacy pushes on more than just the “care about natur­o­genic suffer­ing” lever), so I think WAS ad­vo­cacy is still, in Gre­gory’s words, an im­por­tant part of the ‘far fu­ture port­fo­lio.’ And of­ten one can work on it while work­ing on other things, e.g. I think An­i­mal Char­ity Eval­u­a­tors’ WAS con­tent (e.g. ]guest blog post by Os­car Horta](https://​an­i­malchar­i­tye­val­u­a­tors.org/​blog/​why-the-situ­a­tion-of-an­i­mals-in-the-wild-should-con­cern-us/​)) has helped them be more well-rounded as an or­ga­ni­za­tion, and didn’t di­rectly trade off with their farmed an­i­mal con­tent.

But hu­man­ity/​AI is likely to ex­pand to other planets. Won’t those planets need to have com­plex ecosys­tems that could in­volve a lot of suffer­ing? Or do you think it will all be done with some fancy tech that’ll be too differ­ent from to­day’s wildlife for it to be rele­vant? It’s true that those ecosys­tems would (mostly?) be non-natur­o­genic but I’m not that sure that peo­ple would care about them, it’d still be an­i­mals/​dis­eases/​hunger.etc. hurt­ing an­i­mals. Maybe it’d be eas­ier to en­g­ineer an ecosys­tem with­out pre­da­tion and dis­eases but that is a non-triv­ial as­sump­tion and suffer­ing could then arise in other ways.

Also, some hu­mans want to spread life to other planets for its own sake and rel­a­tively few peo­ple need to want that to cause a lot of suffer­ing if no one works on pre­vent­ing it.

This could be less rele­vant if you think that most of the ex­pected value comes from simu­la­tions that won’t in­volve ecosys­tems.

Yes, ter­raform­ing is a big way in which close-to-WAS sce­nar­ios could arise. I do think it’s smaller in ex­pec­ta­tion than digi­tal en­vi­ron­ments that de­velop on their own and thus are close-to-WAS.

I don’t think ter­raform­ing would be done very differ­ently than to­day’s wildlife, e.g. done with­out pre­da­tion and dis­eases.

Ul­ti­mately I still think the digi­tal, not-close-to-WAS sce­nar­ios seem much larger in ex­pec­ta­tion.

Osten­si­bly it seems like much of Sen­tience In­sti­tute’s (SI) cur­rent re­search is fo­cused on iden­ti­fy­ing those MCE strate­gies which his­tor­i­cally have turned out to be more effec­tive among the strate­gies which have been tried. I think SI as an or­ga­ni­za­tion is based on the ex­pe­rience of EA as a move­ment in hav­ing sig­nifi­cant suc­cess with MCE in a rel­a­tively short pe­riod of time. Suc­cess­fully spread­ing the meme of effec­tive giv­ing; in­creas­ing con­cern for the far fu­ture in no­table ways; and cor­po­rate an­i­mal welfare cam­paigns are all dra­matic achieve­ments for a young so­cial move­ment like EA. While these aren’t on the scale of shap­ing MCE over the course of the far fu­ture, these achieve­ments makes it seem more pos­si­ble EA and al­lied move­ments can have an out­sized im­pact by pur­su­ing ne­glected strate­gies for val­ues-spread­ing.

On ter­minol­ogy, to say the fo­cus is on non-hu­man an­i­mals, or even moral pa­tients which typ­i­cally come to mind when de­scribing ‘an­i­mal-like’ minds, i.e., fa­mil­iar ver­te­brates is in­ac­cu­rate. “Sen­tient be­ing”, “moral pa­tient” or “non-hu­man agents/​be­ings” are terms which are in­clu­sive of non-hu­man an­i­mals, and other types of po­ten­tial moral pa­tients posited. Ad­mit­tedly these aren’t catchy terms.

I tend to think of moral val­ues as be­ing pretty con­tin­gent and pretty ar­bi­trary, such that what val­ues you start with makes a big differ­ence to what val­ues you end up with even on re­flec­tion. Peo­ple may “im­print” on the val­ues they re­ceive from their cul­ture to a greater or lesser de­gree.

I’m also skep­ti­cal that so­phis­ti­cated philo­soph­i­cal-type re­flec­tion will have sig­nifi­cant in­fluence over posthu­man val­ues com­pared with more or­di­nary poli­ti­cal/​eco­nomic forces. I sup­pose philoso­phers have some­times had big in­fluences on hu­man poli­tics (re­li­gions, Marx­ism, the En­light­en­ment), though not nec­es­sar­ily in a clean “care­fully con­sider lots of philo­soph­i­cal ar­gu­ments and pick the best ones” kind of way.

I’d qual­ify this by adding that the philo­soph­i­cal-type re­flec­tion seems to lead in ex­pec­ta­tion to more moral value (pos­i­tive or nega­tive, e.g. he­do­nium or do­lorium) than other forces, de­spite over­all hav­ing less in­fluence than those other forces.

Hm, yeah, I don’t think I fully un­der­stand you here ei­ther, and this seems some­what differ­ent than what we dis­cussed via email.

My con­cern is with (2) in your list. “[T]hey do not wish to be con­vinced to ex­pand their moral cir­cle” is ex­tremely am­bigu­ous to me. Pre­sum­ably you mean they—with­out MCE ad­vo­cacy be­ing done—wouldn’t put in wide-MC* val­ues or val­ues that lead to wide-MC into an al­igned AI. But I think it’s be­ing con­flated with, “they ac­tively op­pose” or “they would an­swer ‘no’ if asked, ‘Do you think your val­ues are wrong when it comes to which moral be­ings de­serve moral con­sid­er­a­tion?’”

I think they don’t ac­tively op­pose it, they would mostly an­swer “no” to that ques­tion, and it’s very un­cer­tain if they will put the wide-MC-lead­ing val­ues into an al­igned AI. I don’t think CEV or similar re­flec­tion pro­cesses re­li­ably lead to wide moral cir­cles. I think they can still be heav­ily in­fluenced by their ini­tial set-up (e.g. what the val­ues of hu­man­ity when re­flec­tion be­gins).

This leads me to think that you only need (2) to be true in a very weak sense for MCE to mat­ter. I think it’s quite plau­si­ble that this is the case.

I don’t think CEV or similar re­flec­tion pro­cesses re­li­ably lead to wide moral cir­cles. I think they can still be heav­ily in­fluenced by their ini­tial set-up (e.g. what the val­ues of hu­man­ity when re­flec­tion be­gins).

Why do you think this is the case?
Do you think there is an al­ter­na­tive re­flec­tion pro­cess (ei­ther im­ple­mented by an AI, by a hu­man so­ciety, or com­bi­na­tion of both) that could be defined that would re­li­ably lead to wide moral cir­cles? Do you have any thoughts on what would it look like?

If we go through some kind of re­flec­tion pro­cess to de­ter­mine our val­ues, I would much rather have a re­flec­tion pro­cess that wasn’t de­pen­dent on whether or not MCE oc­curred be­fore hand, and I think not lead­ing to a wide moral cir­cle should be con­sid­ered a se­ri­ous bug in any defi­ni­tion of a re­flec­tion pro­cess. It seems to me that work­ing on pro­duc­ing this would be a plau­si­ble al­ter­na­tive or at least par­allel path to di­rectly perform­ing MCE.

I think that there’s an in­evitable trade­off be­tween want­ing a re­flec­tion pro­cess to have cer­tain prop­er­ties and wor­ries about this vi­o­lat­ing goal preser­va­tion for at least some peo­ple. This blog­post is not about MCE di­rectly, but if you think of “BAAN thought ex­per­i­ment” as “we do moral re­flec­tion and the out­come is such a wide cir­cle that most peo­ple think it is ex­tremely coun­ter­in­tu­itive” then the rea­son­ing in large parts of the blog­post should ap­ply perfectly to the dis­cus­sion here.

That is not to say that try­ing to fine tune re­flec­tion pro­cesses is pointless: I think it’s very im­por­tant to think about what our desider­ata should be for a CEV-like re­flec­tion pro­cess. I’m just say­ing that there will be trade­offs be­tween cer­tain com­monly men­tioned desider­ata that peo­ple don’t re­al­ize are there be­cause they think there is such a thing as “gen­uinely free and open-ended de­liber­a­tion.”

Thanks for com­ment­ing, Lukas. I think Lukas, Brian To­masik, and oth­ers af­fili­ated with FRI have thought more about this, and I ba­si­cally defer to their views here, es­pe­cially be­cause I haven’t heard any rea­son­able peo­ple dis­agree with this par­tic­u­lar point. Namely, I agree with Lukas that there seems to be an in­evitable trade­off here.

I thought this piece was good. I agree that MCE work is likely quite high im­pact—per­haps around the same level as X-risk work—and that it has been gen­er­ally ig­nored by EAs. I also agree that it would be good for there to be more MCE work go­ing for­ward. Here’s my 2 cents:

You seem to be say­ing that AIA is a tech­ni­cal prob­lem and MCE is a so­cial prob­lem. While I think there is some­thing to this, I think there are very im­por­tant tech­ni­cal and so­cial sides to both of these. Much of the work re­lated to AIA so far has been about rais­ing aware­ness about the prob­lem (eg the book Su­per­in­tel­li­gence), and this is more a so­cial solu­tion than a tech­ni­cal one. Also, avoid­ing a tech­nolog­i­cal race for AGI seems im­por­tant for AIA, and this also is more a so­cial prob­lem than a tech­ni­cal one.

For MCE, the 2 best things I can imag­ine (that I think are plau­si­ble) are both tech­ni­cal in na­ture. First, I ex­pect clean meat will lead to the moral cir­cle ex­pand­ing more to an­i­mals. I re­ally don’t see any ve­gan so­cial move­ment suc­ceed­ing in end­ing fac­tory farm­ing any­where near as much as I ex­pect clean meat to. Se­cond, I’d imag­ine that a ma­ture sci­ence of con­scious­ness would in­crease MCE sig­nifi­cantly. Many peo­ple don’t think an­i­mals are con­scious, and al­most no one thinks any­thing be­sides an­i­mals can be con­scious. How would we even know if an AI was con­scious, and if so, if it was ex­pe­rienc­ing joy or suffer­ing? The only way would be if we de­velop the­o­ries of con­scious­ness that we have high con­fi­dence in. But right now we’re very limited in study­ing con­scious­ness, be­cause our tools at in­ter­fac­ing with the brain are crude. Ad­vanced neu­rotech­nolo­gies could change that—they could al­low us to po­ten­tially test hy­pothe­ses about con­scious­ness. Again, de­vel­op­ing these tech­nolo­gies would be a tech­ni­cal prob­lem.

Of course, these are just the first ideas that come into my mind, and there very well may be so­cial solu­tions that could do more than the tech­ni­cal solu­tions I men­tioned, but I don’t think we should rule out the po­ten­tial role of tech­ni­cal solu­tions, ei­ther.

Pre­sum­ably we want some peo­ple work­ing on both of these prob­lems, some peo­ple have skills more suited to one than the other, and some peo­ple are just go­ing to be more pas­sion­ate about one than the other.

If one is con­vinced non-ex­tinc­tion civ­i­liza­tion is net pos­i­tive, this seems true and im­por­tant. Sorry if I framed the post too much as one or the other for the whole com­mu­nity.

Much of the work re­lated to AIA so far has been about rais­ing aware­ness about the prob­lem (eg the book Su­per­in­tel­li­gence), and this is more a so­cial solu­tion than a tech­ni­cal one.

Maybe. My im­pres­sion from peo­ple work­ing on AIA is that they see it as mostly tech­ni­cal, and in­deed they think much of the so­cial work has been net nega­tive. Per­haps not Su­per­in­tel­li­gence, but at least the work that’s been done to get me­dia cov­er­age and wide­spread at­ten­tion with­out the tech­ni­cal at­ten­tion to de­tail of Bostrom’s book.

I think the more im­por­tant so­cial work (from a pro-AIA per­spec­tive) is about con­vinc­ing AI de­ci­sion-mak­ers to use the tech­ni­cal re­sults of AIA re­search, but my im­pres­sion is that AIA pro­po­nents still think get­ting those tech­ni­cal re­sults is prob­a­bly the more im­por­tant pro­jects.

There’s also so­cial work in co­or­di­nat­ing the AIA com­mu­nity.

First, I ex­pect clean meat will lead to the moral cir­cle ex­pand­ing more to an­i­mals. I re­ally don’t see any ve­gan so­cial move­ment suc­ceed­ing in end­ing fac­tory farm­ing any­where near as much as I ex­pect clean meat to.

Sure, though one big is­sue with tech­nol­ogy is that it seems like we can do far less to steer its di­rec­tion than we can do with so­cial change. Clean meat tech re­search prob­a­bly just helps us get clean meat sooner in­stead of mak­ing the tech progress hap­pen when it wouldn’t oth­er­wise. The di­rec­tion of the far fu­ture (e.g. whether clean meat is ever adopted, whether the moral cir­cle ex­pands to ar­tifi­cial sen­tience) prob­a­bly mat­ters a lot more than the speed at which it ar­rives.

Of course, this gets very com­pli­cated very quickly, as we con­sider things like value lock-in. Sen­tience In­sti­tute has a bit of ba­sic sketch­ing on the topic on this page.

Se­cond, I’d imag­ine that a ma­ture sci­ence of con­scious­ness would in­crease MCE sig­nifi­cantly. Many peo­ple don’t think an­i­mals are con­scious, and al­most no one thinks any­thing be­sides an­i­mals can be conscious

I dis­agree that “many peo­ple don’t think an­i­mals are con­scious.” I al­most ex­clu­sively hear that view in from the ra­tio­nal­ist/​LessWrong com­mu­nity. A re­cent sur­vey sug­gested that 87.3% of US adults agree with the state­ment, “Farmed an­i­mals have roughly the same abil­ity to feel pain and dis­com­fort as hu­mans,” and pre­sum­ably even more think they have at least some abil­ity.

Ad­vanced neu­rotech­nolo­gies could change that—they could al­low us to po­ten­tially test hy­pothe­ses about con­scious­ness.

I’m fairly skep­ti­cal of this per­son­ally, partly be­cause I don’t think there’s a fact of the mat­ter when it comes to whether a be­ing is con­scious. I think Brian To­masik has writ­ten elo­quently on this. (I know this is an un­for­tu­nate view for an an­i­mal ad­vo­cate like me, but it seems to have the best ev­i­dence fa­vor­ing it.)

I’m fairly skep­ti­cal of this per­son­ally, partly be­cause I don’t think there’s a fact of the mat­ter when it comes to whether a be­ing is con­scious.

I would guess that in­creas­ing un­der­stand­ing of cog­ni­tive sci­ence would gen­er­ally in­crease peo­ple’s moral cir­cles if only be­cause peo­ple would think more about these kinds of ques­tions. Of course, un­der­stand­ing cog­ni­tive sci­ence is no guaran­tee that you’ll con­clude that an­i­mals mat­ter, as we can see from peo­ple like Den­nett, Yud­kowsky, Peter Car­ruthers, etc.

Se­cond, I’d imag­ine that a ma­ture sci­ence of con­scious­ness would in­crease MCE sig­nifi­cantly. Many peo­ple don’t think an­i­mals are con­scious, and al­most no one thinks any­thing be­sides an­i­mals can be con­scious. How would we even know if an AI was con­scious, and if so, if it was ex­pe­rienc­ing joy or suffer­ing? The only way would be if we de­velop the­o­ries of con­scious­ness that we have high con­fi­dence in. But right now we’re very limited in study­ing con­scious­ness, be­cause our tools at in­ter­fac­ing with the brain are crude. Ad­vanced neu­rotech­nolo­gies could change that—they could al­low us to po­ten­tially test hy­pothe­ses about con­scious­ness. Again, de­vel­op­ing these tech­nolo­gies would be a tech­ni­cal prob­lem.

I think that’s right. Speci­fi­cally, I would ad­vo­cate con­scious­ness re­search as a foun­da­tion for prin­ci­pled moral cir­cle ex­pan­sion. I.e., if we do con­scious­ness re­search cor­rectly, the equa­tions them­selves will tell us how con­scious in­sects are, whether al­gorithms can suffer, how much moral weight we should give an­i­mals, and so on.

On the other hand, if there is no fact of the mat­ter as to what is con­scious, we’re headed to­ward a very weird, very con­tentious fu­ture of con­flict­ing/​in­com­pat­i­ble moral cir­cles, with no ‘ground truth’ or shared prin­ci­ples to ar­bi­trate dis­putes.

Edit: I’d also like to thank Jacy for post­ing this- I find it a no­table con­tri­bu­tion to the space, and clearly a product of a lot of hard work and deep thought.

The main risk for AIA seems to be that the tech­ni­cal re­search done to bet­ter un­der­stand how to build an al­igned AI will in­crease AI ca­pa­bil­ities gen­er­ally, mean­ing it’s also eas­ier for hu­man­ity to pro­duce an un­al­igned AI.

This doesn’t seem like a big con­sid­er­a­tion to me. Even if un­friendly AI comes sooner by an en­tire decade, this mat­ters lit­tle on a cos­mic timescale. An ar­gu­ment I find more com­pel­ling: If we plot the ex­pected util­ity of an AGI as a func­tion of the amount of effort put into al­ign­ing it, there might be a “valley of bad al­ign­ment” that is worse than no at­tempt at al­ign­ment at all. (A pa­per­clip max­i­mizer will quickly kill us and not gen­er­ate much long-term suffer­ing, whereas an AI that un­der­stands the im­por­tance of hu­man sur­vival but doesn’t un­der­stand any other val­ues will im­prison us for all eter­nity. Some­thing like that.)

I’d like to know more about why peo­ple think that our moral cir­cles have ex­panded. I sus­pect ac­tivism plays a smaller role than you think. Steven Pinker talks about pos­si­ble rea­sons for de­clin­ing vi­o­lence in his book The Bet­ter An­gels of Our Na­ture. I’m guess­ing this is highly re­lated to moral cir­cle ex­pan­sion.

One the­ory I haven’t seen el­se­where is that self-in­ter­est plays a big role in moral cir­cle ex­pan­sion. Con­sider the ex­am­ple of slav­ery. The BBC writes:

It be­comes clear that hu­man­i­tar­i­anism and im­pe­rial muscling were able bed­fel­lows...

One can be cer­tain that the high ideals of abo­li­tion and the pro­mo­tion of le­gi­t­i­mate trade were equally matched by eco­nomic and ter­ri­to­rial am­bi­tions, im­pulses which brought for­ward par­ti­tion and colo­nial rule in Africa in the late 19th cen­tury.

You’ll note that the villains of the slave story are the slavers—peo­ple with an in­ter­est in slav­ery. The heroes seem to have been Bri­tons who would not lose much if slav­ery was out­lawed (though I guess boy­cotting sugar would go against their self-in­ter­est?) Similarly, I think I re­mem­ber read­ing that poor north­ern whites were mo­ti­vated to fight in the US Civil War be­cause they were wor­ried their la­bor would be dis­placed by slave la­bor.

Ac­cord­ing to this story, the ex­pand­ing cir­cle is a side effect of the world grow­ing wealthier. As lower lev­els of Maslow’s hi­er­ar­chy are met, peo­ple care more about hu­man­i­tar­ian is­sues. (I’m as­sum­ing that ge­netic re­lat­ed­ness pre­dicts where on the hi­er­ar­chy an­other be­ing falls.) Con­quest is less com­mon now be­cause it’s more prof­itable to con­trol a multi­na­tional com­pany than con­trol lots of ter­ri­tory. Slav­ery is less com­mon be­cause un­skil­led la­bor­ers are less of an as­set & more of a li­a­bil­ity, and it’s hard to co­erce skil­led la­bor. Violence has de­clined be­cause sub-re­place­ment fer­til­ity means we’re no longer in a zero-sum com­pe­ti­tion for re­sources. (Note that the blood­iest war in re­cent mem­ory hap­pened in the Demo­cratic Repub­lic of Congo, a coun­try where women av­er­age six chil­dren each—source. Congo has a lot of min­eral wealth, which seems to in­cen­tivize con­flict. Prob­a­bly this wealth doesn’t diminish as much in the pres­ence of con­flict as much as e.g. man­u­fac­tur­ing wealth would.)

I sup­pose a quick test for the Maslow’s hi­er­ar­chy story is to check whether wealthy peo­ple are more likely to be ve­gan (con­trol­ling for meat calories be­ing as ex­pen­sive as non-meat calories).

I don’t think ev­ery­one is com­pletely self-in­ter­ested all the time, but I think peo­ple are self-in­ter­ested enough that it makes sense for ac­tivists to ap­ply lev­er­age strate­gi­cally.

Re: a com­puter pro­gram used to mine as­ter­oids, I’d ex­pect cer­tain AI al­ign­ment work to be use­ful here. If we un­der­stand AI al­gorithms more deeply, an as­ter­oid miner can be sim­pler and less likely sen­tient. Con­trast with the sce­nario where AI progress is slow, brain em­u­la­tions come be­fore AGI, and the as­ter­oid miner is pi­loted us­ing an em­u­la­tion of some­one’s brain.

I’m not com­fortable rely­ing on in­nate hu­man good­ness to deal with moral dilem­mas. I’d rather elimi­nate in­cen­tives for im­moral be­hav­ior. In the pres­ence of bad in­cen­tives, I worry about ac­tivism back­firing as peo­ple come up with ra­tio­nal­iza­tions for their im­moral be­hav­ior. See e.g. bibli­cal jus­tifi­ca­tions for slav­ery in the an­te­bel­lum south. In­stead of see­ing the EA move­ment as some­thing that will sweep the globe and make ev­ery­one al­tru­is­tic, I’m more in­clined to see it as a team of spe­cial forces work­ing to ad­just the in­cen­tives that ev­ery­one else op­er­ates un­der in or­der to cre­ate good out­comes as a side effect of ev­ery­one else work­ing to­wards their in­cen­tives.

Singer and Pinker talk a lot about the im­por­tance of rea­son and em­pa­thy to the ex­pand­ing moral cir­cle. This might be achieved through bet­ter on­line dis­cus­sion plat­forms, wide­spread adop­tion of med­i­ta­tion, etc.

Any­way, I think that if we take a broad view of moral cir­cle ex­pan­sion, the best way to achieve it might be some un­ex­pected thing: im­prov­ing the hap­piness of vot­ers who con­trol nu­clear weapons, helping work­ers deal with tech­nolog­i­cal job dis­place­ment, and so on. IMO, more EAs should work on worldpeace.

I thought this was very in­ter­est­ing, thanks for writ­ing up. Two comments

It was use­ful to have a list of rea­sons why you think the EV of the fu­ture could be around zero, but it still found it quite vague/​hard to imag­ine—why ex­actly would more pow­er­ful minds be mis­treat­ing less pow­er­ful minds? etc. - so I’d would have liked to see that sketched in slightly more depth.

It’s not ob­vi­ous to me it’s cor­rect/​char­i­ta­ble to draw the ne­glect­ed­ness of MCE so nar­rowly. Can’t we con­ceive of a huge am­mount of moral philos­o­phy, and well as so­cial ac­tivism, both new and old, as MCE? Isn’t all EA out­reach an in­di­rect form of MCE?

1) I con­sid­ered that, and in ad­di­tion to time con­straints, I know oth­ers haven’t writ­ten on this be­cause there’s a big con­cern of talk­ing about it mak­ing it more likely to hap­pen. I err more to­wards shar­ing it de­spite this con­cern, but I’m pretty un­cer­tain. Even the de­tail of this post was more than sev­eral peo­ple wanted me to in­clude.

But mostly, I’m just limited on time.

2) That’s rea­son­able. I think all of these bound­aries are fairly ar­bi­trary; we just need to try to use the same stan­dards across cause ar­eas, e.g. con­sid­er­ing only work with this as its ex­plicit fo­cus. The­o­ret­i­cally, since Ne­glect­ed­ness is ba­si­cally just a heuris­tic to es­ti­mate how much low-hang­ing fruit there is, we’re aiming at “The space of work that might take such low-hang­ing fruit away.” In this sense, Ne­glect­ed­ness could vary widely. E.g. there’s limited room for ad­vo­cat­ing (e.g. pass­ing out leaflets, giv­ing lec­tures) di­rectly to AI re­searchers, but this isn’t af­fected much by ad­vo­cacy to­wards the gen­eral pop­u­la­tion.

I do think moral philos­o­phy that leads to ex­pand­ing moral cir­cles (e.g. writ­ing pa­pers sup­port­ive of util­tiar­i­anism), moral-cir­cle-fo­cused so­cial ac­tivism (e.g. anti-racism, not as much some­thing like cam­paign­ing for in­creased arts fund­ing that seems fairly or­thog­o­nal to MCE), and EA out­reach (in the sense that the A of EA means a wide moral cir­cle) are MCE in the broad­est some­what-use­ful defi­ni­tion.

Cas­par’s blog post is a pretty good read on the nu­ances of defin­ing/​uti­liz­ing Ne­glect­ed­ness.

I’m pretty un­cer­tain about the best lev­ers, and I think re­search can help a lot with that. Ten­ta­tively, I do think that MCE ends up al­ign­ing fairly well with con­ven­tional EAA (per­haps it should be un­sur­pris­ing that the most im­por­tant lev­ers to push on for near-term val­ues are also most im­por­tant for long-term val­ues, though it de­pends on how nar­rowly you’re draw­ing the lines).

A few ex­cep­tions to that:

Digi­tal sen­tience prob­a­bly mat­ters the most in the long run. There are good rea­sons to be skep­ti­cal we should be ad­vo­cat­ing for this now (e.g. it’s quite out­side of the main­stream so it might be hard to ac­tu­ally get at­ten­tion and change minds; it’d prob­a­bly be hard to get fund­ing for this sort of ad­vo­cacy (in­deed that’s one big rea­son SI started with farmed an­i­mal ad­vo­cacy)), but I’m pretty com­pel­led by the gen­eral claim, “If you think X value is what mat­ters most in the long-term, your de­fault ap­proach should be work­ing on X di­rectly.”
Ad­vo­cat­ing for digi­tal sen­tience is of course ne­glected ter­ri­tory, but Sen­tience In­sti­tute, the Non­hu­man Rights Pro­ject, and An­i­mal Ethics have all worked on it. Peo­ple for the Eth­i­cal Treat­ment of Re­in­force­ment Learn­ers has been the only ded­i­cated or­ga­ni­za­tion AFAIK, and I’m not sure what their sta­tus is or if they’ve ever paid full-time or part-time staff.

I think views on value lock-in mat­ter a lot be­cause of how they af­fect food tech (e.g. sup­port­ing The Good Food In­sti­tute). I place sig­nifi­cant weight on this and a few other things (see this sec­tion of an SI page) that make me think GFI is ac­tu­ally a pretty good bet, de­spite my con­cern that tech­nol­ogy pro­gresses mono­ton­i­cally.

Be­cause what might mat­ter most is so­ciety’s gen­eral con­cern for weird/​small minds, we should be more sym­pa­thetic to in­di­rect an­ti­speciesism work like that done by An­i­mal Ethics and the fun­da­men­tal rights work of the Non­hu­man Rights Pro­ject. From a near-term per­spec­tive, I don’t think these look very good be­cause I don’t think we’ll see fun­da­men­tal rights be a big re­ducer of fac­tory farm suffer­ing.

This is a less-re­fined view of mine, but I’m less fo­cused than I used to be on wild an­i­mal suffer­ing. It just seems to cost a lot of weird­ness points, and natur­o­genic suffer­ing doesn’t seem nearly as im­por­tant as an­thro­pogenic suffer­ing in the far fu­ture. Fac­tory farm suffer­ing seems a lot more similar to far fu­ture dystopias than does wild an­i­mal suffer­ing, de­spite WAS dom­i­nat­ing util­ity calcu­la­tions for the next, say, 50 years.

I could talk more about this if you’d like, es­pe­cially if you’re fac­ing spe­cific de­ci­sions like where ex­actly to donate in 2018 or what sort of job you’re look­ing for with your skil­lset.

Next to the coun­ter­points men­tioned by Gre­gory Lewis, I think there is an ad­di­tional rea­son why MCE seems less effec­tive than more tar­geted in­ter­ven­tions to im­prove the qual­ity of the long-term fu­ture: Gains from trade be­tween hu­mans with differ­ent val­ues be­come eas­ier to im­ple­ment as the reach of tech­nol­ogy in­creases. As long as a non-triv­ial frac­tion of hu­mans end up car­ing about an­i­mal wellbe­ing or digi­tal minds, it seems likely it would be cheap for other coal­i­tions to offer trades. So whether 10% of fu­ture peo­ple end up with an ex­panded moral cir­cle or 100% may not make much of a differ­ence to the out­come: It will be rea­son­ably good ei­ther way if peo­ple reap the gains from trade.

One might ob­ject that it is un­likely that hu­mans would be able to co­op­er­ate effi­ciently, given that we don’t see this type of co­op­er­a­tion hap­pen­ing to­day. How­ever, I think it’s rea­son­able to as­sume that stay­ing in con­trol of tech­nolog­i­cal progress be­yond the AGI tran­si­tion re­quires a de­gree of wis­dom and fore­sight that is very far away from where most so­cietal groups are at to­day. And if hu­mans do stay in con­trol, then find­ing a good solu­tion for value dis­agree­ments may be the eas­ier prob­lem, or at worst similarly hard. So it feels to me that most likely, we ei­ther we get a fu­ture that goes badly for rea­sons re­lated to lack of co­or­di­na­tion and so­phis­ti­ca­tion in the pre-AGI stage, or we get a fu­ture where hu­mans set things up wisely enough to ac­tu­ally de­sign an out­come that is nice (or at least not amongst the 10% of worst out­comes) by the lights of nearly ev­ery­one.

Brian To­masik made the point that con­di­tional on hu­man val­ues stay­ing in con­trol, we may be very un­likely to get some­thing like broad moral re­flec­tion. In­stead, val­ues could be de­ter­mined by a very small group of in­di­vi­d­u­als who hap­pened to be in power by the time AGI ar­rives (as op­posed to in­di­vi­d­u­als end­ing up there be­cause they were un­usu­ally fore­sighted and also morally mo­ti­vated). This feels pos­si­ble too, but it seems to not be the likely de­fault to me be­cause I sus­pect that you’d need to nec­es­sar­ily in­crease your philo­soph­i­cal so­phis­ti­ca­tion in or­der to stay in con­trol of AGI, and that prob­a­bly gives you more pleas­ant out­comes (cor­re­la­tional claim). Iter­ated am­plifi­ca­tion for in­stance, as an ap­proach to AI al­ign­ment, has sev­eral uses for hu­mans: Hu­mans are not only where the re­sult­ing val­ues come from, but they’re also in charge of keep­ing the boot­strap­ping pro­cess on track and cor­rigible. And as this post on fac­tored cog­ni­tion illus­trates, this re­quires so­phis­ti­ca­tion to set up. So if that’s the bar that AGI cre­ators need to pass be­fore they can de­ter­mine how “hu­man val­ues” are to be ex­trap­o­lated, maybe we shouldn’t be too pes­simistic about the out­come. It seems kind of un­likely that some­one would go through all of that only to be like “I’m go­ing to im­ple­ment my per­sonal best guess about what mat­ters to me, with lit­tle fur­ther re­flec­tion, and no other hu­mans get a say here.” Similarly, it also feels un­likely that peo­ple would go through with all that and not find a way to make sub­parts of the pop­u­la­tion rea­son­ably con­tent about how sen­tient sub­rou­tines are go­ing to be used.

Now, I feel a bit con­fused about the fea­si­bil­ity of AI al­ign­ment if you were to do it some­what slop­pily and with lower stan­dards. I think that there’s a spec­trum from “it just wouldn’t work at all and not be com­pet­i­tive” (and then peo­ple would have to try some other ap­proach) to “it would pro­duce a ca­pa­ble AGI but it would be vuln­er­a­ble to failure modes like ad­ver­sar­ial ex­ploits or op­ti­miza­tion dae­mons, and so it would end up with not hu­man val­ues”. Th­ese failure modes, to the very small de­gree I cur­rently un­der­stand them, sound like they would not be sen­si­tive to whether the hu­man whose ap­proval you tried to ap­prox­i­mate had an ex­panded moral cir­cle or not. I might be wrong about that. If peo­ple mostly want so­phis­ti­cated al­ign­ment pro­ce­dures be­cause they care about pre­serv­ing the op­tion for philo­soph­i­cal re­flec­tion, rather than be­cause they also think that you sim­ply run into large failure modes oth­er­wise, then it seems like (con­di­tional on some kind of value al­ign­ment) whether we get an out­come with broad moral re­flec­tion is not so clear. If it’s tech­ni­cally eas­ier to build value-al­igned AI with very parochial val­ues, then MCE could make a rele­vant differ­ence to these non-re­flec­tion out­comes.

But all in all my ar­gu­ment is that it’s some­what strange to as­sume that a group of peo­ple could suc­ceed at build­ing an AGI op­ti­mized for its cre­ators’ val­ues, with­out hav­ing to put in so much think­ing about how to get this out­come right that they’d al­most can’t help but be­come rea­son­ably philo­soph­i­cally so­phis­ti­cated in the pro­cess. And sure, philo­soph­i­cally so­phis­ti­cated peo­ple can still have fairly strange val­ues by your own lights, but it seems like there’s more con­ver­gence. Plus I’d at least be op­ti­mistic about their propen­sity to strive to­wards pos­i­tive-sum out­comes, given how lit­tle scarcity you’d have if the tran­si­tion does go well.

Of course, maybe value-al­ign­ment is go­ing to work very differ­ently from what peo­ple cur­rently think. The main way I’d crit­i­cize my above points is that they’re based on heavy-handed in­side-view think­ing about how difficult I (and oth­ers I’m up­dat­ing to­wards) ex­pect the AGI tran­si­tion to be. If AGI will be more like the In­dus­trial Revolu­tion rather than some­thing that is even more difficult to stay re­motely in con­trol of, or if some other tech­nol­ogy proves to be more con­se­quen­tial than AGI, then my ar­gu­ment has less force. I mainly see this as yet an­other rea­son to caveat that the ex ante plau­si­ble-seem­ing po­si­tion that MCE can have a strong im­pact on AGI out­comes starts to feel more and more con­junc­tive the more you zoom in and try to iden­tify con­crete path­ways.

In­ter­est­ing points. :) I think there could be sub­stan­tial differ­ences in policy be­tween 10% sup­port and 100% sup­port for MCE de­pend­ing on the costs of ap­peas­ing this fac­tion and how pas­sion­ate it is. Or be­tween 1% and 10% sup­port for MCE ap­plied to more fringe en­tities.

philo­soph­i­cally so­phis­ti­cated peo­ple can still have fairly strange val­ues by your own lights, but it seems like there’s more con­ver­gence.

I’m not sure if so­phis­ti­ca­tion in­creases con­ver­gence. :) If any­thing, peo­ple who think more about philos­o­phy tend to di­verge more and more from com­mon­sense moral as­sump­tions.

Yud­kowsky and I seem to share the same meta­physics of con­scious­ness and have both thought about the topic in depth, yet we oc­cupy al­most an­tipo­dal po­si­tions on the ques­tion of how many en­tities we con­sider moral pa­tients. I tend to as­sume that one’s start­ing points mat­ter a lot for what views one ends up with.

I agree with this. It seems like the world where Mo­ral Cir­cle Ex­pan­sion is use­ful is the world where:

The cre­ators of AI are philo­soph­i­cally so­phis­ti­cated (or per­suad­able) enough to ex­pand their moral cir­cle if they are ex­posed to the right ar­gu­ments or work is put into per­suad­ing them.

They are not philo­soph­i­cally so­phis­ti­cated enough to re­al­ize the ar­gu­ments for ex­pand­ing the moral cir­cle on their own (seems plau­si­ble).

They are not philo­soph­i­cally so­phis­ti­cated enough to re­al­ize that they might want to con­sider a dis­tri­bu­tion of ar­gu­ments that they could have faced and could have per­suaded them about what is morally right, and de­sign AI with this in mind (ie CEV), or with the goal of achiev­ing a pe­riod of re­flec­tion where they can sort out the sort of ar­gu­ments that they would want to con­sider.

I think I’d pre­fer push­ing on point 3, as it also en­com­passes a bunch of other po­ten­tial philo­soph­i­cal mis­takes that AI cre­ators could make.

I think there’s a sig­nifi­cant[8] chance that the moral cir­cle will fail to ex­pand to reach all sen­tient be­ings, such as ar­tifi­cial/​small/​weird minds (e.g. a so­phis­ti­cated com­puter pro­gram used to mine as­ter­oids, but one that doesn’t have the nor­mal fea­tures of sen­tient minds like fa­cial ex­pres­sions). In other words, I think there’s a sig­nifi­cant chance that pow­er­ful be­ings in the far fu­ture will have low will­ing­ness to pay for the welfare of many of the small/​weird minds in the fu­ture.[9]

I think it’s likely that the pow­er­ful be­ings in the far fu­ture (analo­gous to hu­mans as the pow­er­ful be­ings on Earth in 2018) will use large num­bers of less pow­er­ful sen­tient beings

So I’m cu­ri­ous for your thoughts. I see this con­cern about “in­ci­den­tal suffer­ing of worker-agents” stated fre­quently, which may be likely in many fu­ture sce­nar­ios. How­ever, it doesn’t seem to be a cru­cial con­sid­er­a­tion, speci­fi­cally be­cause I care about small/​weird minds with non-com­plex ex­pe­riences (your first con­sid­er­a­tion).

Car­ing about small minds seems to im­ply that “Op­por­tu­nity Cost/​Lost Risks” are the dom­i­nate con­sid­er­a­tion—if small minds have moral value com­pa­rable to large minds, then the largest-EV risk is not op­ti­miz­ing for small minds and wast­ing re­sources thrown at large minds with com­plex/​ex­pen­sive ex­pe­riences (or thrown at some­thing even less effi­cient, like biolog­i­cal be­ings, any non-to­tal-con­se­quen­tial­ist view, etc). This would you lose you many or­ders of mag­ni­tude of op­ti­mized hap­piness, and this loss would be worse than the other sce­nar­ios’ ag­gre­gate in­ci­den­tal suffer­ing.
Even if this in­effi­cient moral po­si­tion merely re­duced op­ti­mized hap­piness by 10% - far less than an or­der of mag­ni­tude—this would dom­i­nate in­ci­den­tal suffer­ing, even if the in­ci­den­tal suffer­ing sce­nar­ios were sig­nifi­cantly more prob­a­ble. And even if you very heav­ily weight suffer­ing com­pared to hap­piness, my math still sug­gests this con­clu­sion sur­vives by a sig­nifi­cant mar­gin).

Also note that Mo­ral Cir­cle Ex­pan­sion is rele­vant con­di­tional on solv­ing the al­ign­ment prob­lem, so we’re in the set of wor­lds where the al­ign­ment prob­lem was ac­tu­ally solved in some way (hu­man­ity’s val­ues are some­what in­tact). So, the risk is that what­ever-we’re-op­ti­miz­ing-the-fu­ture-for is far less effi­cient than ideal he­do­nium could have been, be­cause we’re wast­ing it on com­plex minds, ex­pe­riences that re­quire lots of ma­te­rial in­put, or other not-effi­ciently-value-cre­at­ing things. “Oh, what might have been”, etc. Note this still says val­ues spread­ing might be very im­por­tant, but I think this ver­sion has a slightly differ­ent fla­vor that im­plies some­what differ­ent ac­tions. Thoughts?

On this topic, I similarly do still be­lieve there’s a higher like­li­hood of cre­at­ing he­do­nium; I just have more skep­ti­cism about it than I think is of­ten as­sumed by EAs.

This is the main rea­son I think the far fu­ture is high EV. I think we should be fo­cus­ing on p(He­do­nium) and p(Delo­rium) more than any­thing else. I’m skep­ti­cal that, from a he­do­nis­tic util­i­tar­ian per­spec­tive, byprod­ucts of civ­i­liza­tion could come close to match­ing the ex­pected value from de­liber­ately tiling the uni­verse (po­ten­tially mul­ti­verse) with con­scious­ness op­ti­mized for plea­sure or pain. If p(H)>p(D), the fu­ture of hu­man­ity is very likely pos­i­tive EV.

I agree that peo­ple of­ten un­der­es­ti­mate the value of strate­gic value spread­ing. Of­ten­times, pro­posed moral mod­els that AI agents will fol­low have some lin­ger­ing nar­row­ness to them, even when they at­tempt to ap­ply the broad­est of moral prin­ci­ples. For in­stance, in Chap­ter 14 of Su­per­in­tel­li­gence, Bostrom high­lights his com­mon good prin­ci­ple:

Su­per­in­tel­li­gence should be de­vel­oped only for the benefit of all of hu­man­ity and in the ser­vice of widely shared eth­i­cal ideals.

Clearly, even some­thing as broad as that can be con­tro­ver­sial. Speci­fi­cally, it doesn’t speak at all about any non-hu­man in­ter­ests ex­cept in­so­far as hu­mans ex­press widely held be­liefs to pro­tect them.

I think one thing to add is that AIA re­searchers who hold more tra­di­tional moral be­liefs (as op­posed to wide moral cir­cles and tran­shu­man­ist be­liefs) are prob­a­bly less likely to be­lieve that moral value spread­ing is worth much. The rea­son for this is ob­vi­ous: if ev­ery­one around you holds, more or less, the same val­ues that you do, then why change any­one’s mind? This may ex­plain why many peo­ple dis­miss the ac­tivity you pro­posed.

I think one thing to add is that AIA re­searchers who hold more tra­di­tional moral be­liefs (as op­posed to wide moral cir­cles and tran­shu­man­ist be­liefs) are prob­a­bly less likely to be­lieve that moral value spread­ing is worth much.

His­tor­i­cally it doesn’t seem to be true. As AIA be­comes more main­stream, it’ll be at­tract­ing a wider di­ver­sity of peo­ple, which may in­duce a form of com­mon ground­ing and nor­mal­iza­tion of the val­ues in the com­mu­nity. We should be look­ing for op­por­tu­ni­ties to col­lect data on this in the fu­ture to see how at­ti­tudes within AIA change. Of course this could lead to at­tempts to di­rectly in­fluence the pro­por­tionate rep­re­sen­ta­tion of differ­ent val­ues within EA. That’d be prone to all the haz­ards of an in­ter­nal tug of war pointed out in other com­ments on this post. Be­cause the vast ma­jor­ity of the EA move­ment fo­cused on the im­pact of ad­vanced AI on the far fu­ture are rel­a­tively co­or­di­nated and with suffi­ciently similar goals there isn’t much risk of in­ter­nal frac­tion in the near fu­ture. I think or­ga­ni­za­tions from MIRI to FRI are also averse to grow­ing AIA in ways which drive the tra­jec­tory of the field away from what EA cur­rently val­ues.

My cur­rent po­si­tion is that the amount of plea­sure/​suffer­ing that con­scious en­tities will ex­pe­rience in a far-fu­ture tech­nolog­i­cal civ­i­liza­tion will not be well-defined. Some ar­gu­ments for this:

The clean sep­a­ra­tion of our civ­i­liza­tion into many differ­ent in­di­vi­d­u­als is an ar­ti­fact of how evolu­tion op­er­ates. I don’t ex­pect far fu­ture civ­i­liza­tion to have a similar di­vi­sion of its in­ter­nal pro­cesses into agents. There­fore the method of count­ing con­scious en­tities with differ­ent lev­els of plea­sure is in­ap­pli­ca­ble.

The­o­ret­i­cal com­puter sci­ence gives many ways to em­bed one com­pu­ta­tional pro­cess within an­other so that it is un­clear whether or how many times the in­ner pro­cess “oc­curs”, such as run­ning iden­ti­cal copies of the same pro­gram, us­ing a quan­tum com­puter to run the same pro­gram with many in­puts in su­per­po­si­tion, and ho­mo­mor­phic en­cryp­tion. Similar meth­ods we don’t know about will likely be dis­cov­ered in the fu­ture.

Our no­tions of plea­sure and suffer­ing are mostly defined ex­ten­sion­ally with ex­am­ples from the pre­sent and the past. I see no rea­son that such an ex­ten­sion­ally-de­rived con­cept to have a nat­u­ral defi­ni­tion that ap­plies to ex­tremely differ­ent situ­a­tions. Un­char­i­ta­bly, it seems like the main rea­son peo­ple as­sume this is a sort of wish­ful think­ing due to their nor­mal moral rea­son­ing break­ing down if they al­low plea­sure/​suffer­ing to be un­defined.

I’m cur­rently un­cer­tain about how to make de­ci­sions re­lat­ing to the far fu­ture in light of the above ar­gu­ments. My cur­rent fa­vorite po­si­tion is to try to un­der­stand the far fu­ture well enough un­til I find some­thing I have strong moral in­tu­itions about.

If you as­sume bits mat­ter, then I think this nat­u­rally leads into a con­cept cluster where speak­ing about util­ity func­tions, prefer­ence satis­fac­tion, com­plex­ity of value, etc, makes sense. You also get a lot of weird un­re­solved thought-ex­per­i­ments like ho­mo­mor­phic en­cryp­tion.

If you as­sume atoms mat­ter, I think this sub­tly but un­avoid­ably leads to a very differ­ent con­cept cluster—qualia turns out to be a nat­u­ral kind in­stead of a leaky reifi­ca­tion, for in­stance. Talk­ing about the ‘unity of value the­sis’ makes more sense than talk­ing about the ‘com­plex­ity of value the­sis’.

TL;DR: I think you’re right that if we as­sume com­pu­ta­tion­al­ism/​func­tion­al­ism is true, then plea­sure and suffer­ing are in­her­ently ill-defined, not crisp. They do seem well-defin­able if we as­sume phys­i­cal­ism is true, though.

Thanks for re­mind­ing me that I was im­plic­itly as­sum­ing com­pu­ta­tion­al­ism. Nonethe­less, I don’t think phys­i­cal­ism sub­stan­tially af­fects the situ­a­tion. My ar­gu­ments #2 and #4 stand un­af­fected; you have not backed up your claim that qualia is a nat­u­ral kind un­der phys­i­cal­ism. While it’s true that phys­i­cal­ism gives clear an­swers for the value of two iden­ti­cal sys­tems or a sys­tem simu­lated with ho­mo­mor­phic en­cryp­tion, it may still be pos­si­ble to have quan­tum com­pu­ta­tions in­volv­ing phys­i­cally in­stan­ti­ated con­scious be­ings, by iso­lat­ing the phys­i­cal en­vi­ron­ment of this be­ing and run­ning the CPT re­ver­sal of this phys­i­cal sys­tem af­ter an out­put has been ex­tracted to main­tain co­her­ence. Fi­nally, phys­i­cal­ism adds its own ques­tions, namely, given a bunch of phys­i­cal sys­tems that all ap­pear to have be­hav­ior that ap­pears to be con­scious, which ones are ac­tu­ally con­scious and which are not. If I un­der­stood you cor­rectly, phys­i­cal­ism as a state­ment about con­scious­ness is pri­mary a nega­tive state­ment, “the com­pu­ta­tional be­hav­ior of a sys­tem is not suffi­cient to de­ter­mine what sort of con­scious ac­tivity oc­curs there”, which doesn’t by it­self tell you what sort of con­scious ac­tivity oc­curs.

It seems to me your #2 and #4 still im­ply com­pu­ta­tion­al­ism and/​or are speak­ing about a straw man ver­sion of phys­i­cal­ism. Differ­ent phys­i­cal the­o­ries will ad­dress your CPT re­ver­sal ob­jec­tion differ­ently, but it seems pretty triv­ial to me.

If I un­der­stood you cor­rectly, phys­i­cal­ism as a state­ment about con­scious­ness is pri­mary a nega­tive state­ment, “the com­pu­ta­tional be­hav­ior of a sys­tem is not suffi­cient to de­ter­mine what sort of con­scious ac­tivity oc­curs there”, which doesn’t by it­self tell you what sort of con­scious ac­tivity oc­curs.

I would gen­er­ally agree, but would per­son­ally phrase this differ­ently; rather, as noted here, there is no ob­jec­tive fact-of-the-mat­ter as to what the ‘com­pu­ta­tional be­hav­ior’ of a sys­tem is. I.e., no way to ob­jec­tively de­rive what com­pu­ta­tions a phys­i­cal sys­tem is perform­ing. In terms of a pos­i­tive state­ment about phys­i­cal­ism & qualia, I’m as­sum­ing some­thing on the or­der of dual-as­pect monism /​ neu­tral monism. And yes in­so­far as a for­mal the­ory of con­scious­ness which has broad pre­dic­tive power would de­part from folk in­tu­ition, I’d definitely go with the for­mal the­ory.

In prac­tice, is it a use­ful ap­proach to look for com­pu­ta­tional struc­tures ex­hibit­ing plea­sure/​suffer­ing in the dis­tant fu­ture as a means to judge pos­si­ble out­comes?

Brian To­masik an­swers these ques­tions “No/​Yes”, and a sup­porter of the Sen­tience In­sti­tute would prob­a­bly an­swer “Yes” to the sec­ond ques­tion. Your an­swers are “Yes/​No”, and so you pre­fer to work on find­ing the un­der­ly­ing the­ory for plea­sure/​suffer­ing. My an­swers are “No/​No”, and am at a loss.

I see two rea­sons why a per­son might think that plea­sure/​pain of con­scious en­tities is a solid enough con­cept to an­swer “Yes” to ei­ther of these ques­tions (not count­ing con­ser­va­tive opinions over what fu­tures are pos­si­ble for ques­tion 2). The first is a con­fu­sion caused by sub­tle im­plicit as­sump­tions in the way we talk about con­scious­ness, which makes a sort of con­scious ex­pe­rience from which in­cludes in it plea­sure and pain seem more on­tolog­i­cally ba­sic than it re­ally is. I won’t elab­o­rate on this in this com­ment, but for now you can round me as an elimi­na­tivist.

The sec­ond is what I was call­ing “a sort wish­ful think­ing” in ar­gu­ment #4: Th­ese peo­ple have moral in­tu­itions that tell them to care about oth­ers’ plea­sure and pain, which im­plies not fool­ing them­selves about how much plea­sure and pain oth­ers ex­pe­rience. On the other hand, there are many situ­a­tions where their in­tu­ition does not give them a clear an­swer, but also tells them that pick­ing an an­swer ar­bi­trar­ily is like fool­ing them­selves. They re­solve this ten­sion by tel­ling them­selves, “there is a ‘cor­rect an­swer’ to this dilemma, but I don’t know what it is. I should act to best ap­prox­i­mate this ‘cor­rect an­swer’ with the in­for­ma­tion I have.” Peo­ple then treat these “cor­rect an­swers” like other things they are ig­no­rant about, and in par­tic­u­lar imag­ine that a sci­en­tific the­ory might be able to an­swer these ques­tions in the same way sci­ence an­swered other things we used to be ig­no­rant about.

How­ever, this ex­pec­ta­tion in­fers some­thing ex­ter­nal, the ex­is­tence of a cer­tain kind of sci­en­tific the­ory, from ev­i­dence that is in­ter­nal, their own cog­ni­tive ten­sions. This seems fal­la­cious to me.

Another re­frame to con­sider is to dis­re­gard talk about pain/​plea­sure, and in­stead fo­cus on whether value is well-defined on phys­i­cal sys­tems (i.e. the sub­ject of Teg­mark’s worry here). Con­fla­tion of emo­tional valence & moral value can then be split off as a sub­ar­gu­ment.

Gen­er­ally speak­ing, I think if one ac­cepts that it’s pos­si­ble in prin­ci­ple to talk about qualia in a way that ‘carves re­al­ity at the joints’, it’s not much of a stretch to as­sume that emo­tional valence is one such nat­u­ral kind (ar­guably the ‘c. el­e­gans of qualia’). I don’t think we’re log­i­cally forced to as­sume this, but I think it’s prima fa­cie plau­si­ble, and paired with some of our other work it gives us a hand­hold for ap­proach­ing qualia in a sci­en­tific/​pre­dic­tive/​falsifi­able way.

TL;DR: If con­scious­ness is a ‘crisp’ thing with dis­cov­er­able struc­ture, we should be able to build/​pre­dict use­ful things with this that can­not be built/​pre­dicted oth­er­wise, similar to how dis­cov­er­ing the struc­ture of elec­tro­mag­netism let us build/​pre­dict use­ful things we could not have oth­er­wise. This is prob­a­bly the best route to solve these meta­phys­i­cal dis­agree­ments.

It wasn’t clear to me from your com­ment, but based on your link I am pre­sum­ing that by “crisp” you mean “amenable to gen­er­al­iz­able sci­en­tific the­o­ries” (rather than “on­tolog­i­cally ba­sic”). I was us­ing “plea­sure/​pain” as a catch-all term and would not mind sub­sti­tut­ing “emo­tional valence”.

It’s worth em­pha­siz­ing that just be­cause a par­tic­u­lar fea­ture is crisp does not im­ply that it gen­er­al­izes to any par­tic­u­lar do­main in any par­tic­u­lar way. For ex­am­ple, a sin­gle ice crys­tal­line has a set of di­rec­tions in which the molec­u­lar bonds are ori­ented which is the same through­out the crys­tal, and this surely qual­ifies as a “crisp” fea­ture. Nonethe­less, when the ice melts, this fea­ture be­comes un­defined—no di­rec­tion is dis­t­in­guished from any other di­rec­tion in wa­ter. When figur­ing out whether a con­cept from one do­main ex­tends to a new do­main, to posit that there’s a crisp the­ory de­scribing the con­cept does not an­swer this ques­tion with­out any in­for­ma­tion on what that the­ory looks like.

So even if there ex­isted a the­ory de­scribing qualia and emo­tional valence as it ex­ists on Earth, it need not ex­tend to be­ing able to de­scribe ev­ery phys­i­cally pos­si­ble ar­range­ment of mat­ter, and I see no rea­son to ex­pect it to. Since a far fu­ture civ­i­liza­tion will be likely to ap­proach the phys­i­cal limits of mat­ter in many ways, we should not as­sume that it is not one such ar­range­ment of mat­ter where the no­tion of qualia is in­ap­pli­ca­ble.

This is an im­por­tant point and seems to hinge on the no­tion of refer­ence, or the ques­tion of how lan­guage works in differ­ent con­texts. The fol­low­ing may or may not be new to you, but try­ing to be ex­plicit here helps me think through the ar­gu­ment.

Mostly, words gain mean­ing from con­tex­tual em­bed­ding- i.e. they’re mean­ingful as nodes in a larger net­work. Wittgen­stein ob­served that of­ten, philo­soph­i­cal con­fu­sion stems from tak­ing a perfectly good word and try­ing to use it out­side its nat­u­ral re­mit. His fa­mous ex­am­ple is the ques­tion, “what time is it on the sun?”. As you note, maybe no­tions about emo­tional valence are similar- try­ing to ‘uni­ver­sal­ize’ valence may be like try­ing to uni­ver­sal­ize time-zones, an im­proper move.

But there’s an­other no­table the­ory of mean­ing, where parts of lan­guage gain mean­ing through deep struc­tural cor­re­spon­dence with re­al­ity. Much of physics fits this de­scrip­tion, for in­stance, and it’s not a type er­ror to uni­ver­sal­ize the no­tion of the elec­tro­mag­netic force (or elec­troweak force, or what­ever the fun­da­men­tal unifi­ca­tion turns out to be). I am es­sen­tially as­sert­ing that qualia is like this- that we can find uni­ver­sal prin­ci­ples for qualia that are equally and ex­actly true in hu­mans, dogs, dinosaurs, aliens, con­scious AIs, etc. When I note I’m a phys­i­cal­ist, I in­tend to in­herit many of the se­man­tic prop­er­ties of physics, how mean­ing in physics ‘works’.

I sus­pect all con­scious ex­pe­riences have an emo­tional valence, in much the same way all par­ti­cles have a charge or spin. I.e. it’s well-defined across all phys­i­cal pos­si­bil­ities.

Do you think we should move the con­ver­sa­tion to pri­vate mes­sages? I don’t want to clut­ter a dis­cus­sion thread that’s mostly on a differ­ent topic, and I’m not sure whether the av­er­age reader of the com­ments benefits or is dis­tracted by long con­ver­sa­tions on a nar­row subtopic.

Your com­ment ap­pears to be just re­fram­ing the point I just made in your own words, and then af­firm­ing that you be­lieve that the no­tion of qualia gen­er­al­izes to all pos­si­ble ar­range­ments of mat­ter. This doesn’t an­swer the ques­tion, why do you be­lieve this?

By the way, al­though there is no ev­i­dence for this, it is com­monly spec­u­lated by physi­cists that the laws of physics al­low mul­ti­ple metastable vac­uum states, and the ob­serv­able uni­verse only oc­cu­pies one such vac­uum, and near differ­ent vacua there differ­ent fields and forces. If this is true then the elec­tro­mag­netic field and other parts of the Stan­dard Model are not much differ­ent from my ear­lier ex­am­ple of the al­ign­ment of an ice crys­tal. One rea­son this view is con­sid­ered plau­si­ble is sim­ply the fact that it’s pos­si­ble: It’s not con­sid­ered so un­usual for a quan­tum field the­ory to have mul­ti­ple vac­uum states, and if the en­tire ob­serv­able uni­verse is close to one vac­uum then none of our ex­per­i­ments give us any ev­i­dence on what other vac­uum states are like or whether they ex­ist.

This ex­am­ple is meant to illus­trate a broader point: I think that mak­ing a bi­nary dis­tinc­tion be­tween con­tex­tual con­cepts and uni­ver­sal con­cepts is over­sim­plified. Rather, here’s how I would put it: Many phe­nom­ena gen­er­al­ize be­yond the con­text in which they were origi­nally ob­served. Tak­ing ad­van­tage of this, physi­cists de­liber­ate seek out the phe­nom­ena that gen­er­al­ize as far as pos­si­ble, and over his­tory broad­ened their grasp very far. Nonethe­less, they avoid think­ing about any con­cept as “uni­ver­sal”, and of­ten when they do think a con­cept gen­er­al­izes they have a spe­cific ex­pla­na­tion for why it should, while if there’s a clear al­ter­na­tive to the con­cept gen­er­al­iz­ing they keep an open mind.

So again: Why do you think that qualia and emo­tional valence gen­er­al­ize to all pos­si­ble ar­range­ments of mat­ter?

I don’t think you’re fully ac­count­ing for the differ­ence in my two mod­els of mean­ing. And, I think the ob­jec­tions you raise to con­scious­ness be­ing well-defined would also ap­ply to physics be­ing well-defined, so your ar­gu­ments seem to prove too much.

To at­tempt to ad­dress your spe­cific ques­tion, I find the hy­poth­e­sis that ‘qualia (and emo­tional valence) are well-defined across all ar­range­ments of mat­ter’ con­vinc­ing be­cause (1) it seems to me the al­ter­na­tive is not co­her­ent (as I noted in the piece on com­pu­ta­tion­al­ism I linked for you) and (2) it seems gen­er­a­tive and to lead to novel and plau­si­ble pre­dic­tions I think will be proven true (as noted in the linked piece on quan­tify­ing bliss and also in Prin­cipia Qualia).

I haven’t re­sponded to you for so long firstly be­cause I felt like we got to the point in the dis­cus­sion where it’s difficult to get across any­thing new and I wanted to be at­ten­tive to what I say, and then be­cause af­ter a while with­out writ­ing any­thing I be­came dis­in­clined from con­tin­u­ing. The con­ver­sa­tion may close soon.

Some quick points:

My whole point in my pre­vi­ous com­ment is that the con­cep­tual struc­ture of physics is not what you make it out to be, and so your anal­ogy to physics is in­valid. If you want to say that my ar­gu­ments against con­scious­ness ap­ply equally well to physics you will need to ex­plain the anal­ogy.

My views on con­scious­ness that I men­tioned ear­lier but did not elab­o­rate on are be­com­ing more rele­vant. It would be a good idea for me to ex­plain them in more de­tail.

I read your linked piece on quan­tify­ing bliss and I am unim­pressed. I con­cur with the last para­graph of this com­ment.

Here are my rea­sons for the be­lief wild an­i­mal/​small minds/​… suffer­ing agenda is based mostly on er­rors and un­cer­tain­ties. Some of the un­cer­tain­ties should war­rant re­search effort, but I do not be­lieve the cur­rent state of knowl­edge jus­tifies pri­ori­ti­za­tion ofany kind of ad­vo­cacy or value spread­ing.

1] The en­deav­our seems to be based on ex­trap­o­lat­ing in­tu­itive mod­els far out­side the scope for which we have data. The whole suffer­ing calcu­lus is based on ex­trap­o­lat­ing the con­cept of suffer­ing far away from the do­main for which we have data from hu­man ex­pe­rience.

2] Big part of it seems ar­bi­trary. When ex­pand­ing the moral cir­cle to­ward small com­pu­ta­tional pro­cesses and sim­ple sys­tems, why not ex­pand it to­ward large com­pu­ta­tional pro­cesses and com­plex sys­tems? E.g. we can think about the DNA based evolu­tion as about large com­pu­ta­tional/​op­ti­miza­tion pro­cess—sud­denly “wild an­i­mal suffer­ing” has a pur­pose and tra­di­tional en­vi­ron­m­net and bio­di­ver­sity pro­tec­tion efforts make sense.

(Similarly we could ar­gue much “hu­man util­ity” is in the larger sys­tem struc­ture above in­di­vi­d­ual hu­mans)

3] We do not know how to mea­sure and ag­gre­gate util­ity of mind states. Like, re­ally don’t know. E.g. it sems to me com­pletely plau­si­ble the util­ity of 10 peo­ple reach­ing some highly joyful mind­states is the dom­i­nanat con­tri­bu­tion over all hu­man and an­i­mal minds.

4] Part of the rea­son­ing usu­ally seems con­tra­dic­tory. If the hu­man cog­ni­tive pro­cesses are in the priv­iledged po­si­tion of cre­at­ing mean­ing in this uni­verse … well, then they are in the priv­iledged pos­tion, and there _is_ a cat­e­gor­i­cal differ­ence be­tween hu­mans and other minds. If they are not in the priv­iledged po­si­tions, how it comes hu­mans should im­pose their ideas about mean­ing on other agents?

5] MCE efforts di­rected to­ward AI re­searchers with the in­tent of in­fluenc­ing val­ues of some pow­er­ful AI may in­crease x-risk. E.g. if the AI is not “speci­ist” and gives the same weight to satys­fing pre­frences of all hu­mans and all chicken, the chicken would out­num­ber hu­mans.

You raise some good points. (The fol­low­ing re­ply doesn’t nec­es­sar­ily re­flect Jacy’s views.)

I think the an­swers to a lot of these is­sues are some­what ar­bi­trary mat­ters of moral in­tu­ition. (As you said, “Big part of it seems ar­bi­trary.“) How­ever, in a sense, this makes MCE more im­por­tant rather than less, be­cause it means ex­panded moral cir­cles are not an in­evitable re­sult of bet­ter un­der­stand­ing con­scious­ness/​etc. For ex­am­ple, Yud­kowsky’s stance on con­scious­ness is a rea­son­able one that is not based on a mis­taken un­der­stand­ing of pre­sent-day neu­ro­science (as far as I know), yet some feel that Yud­kowsky’s view about moral pa­tient­hood isn’t wide enough for their moral tastes.

Another pos­si­ble re­ply (that would sound bet­ter in a poli­ti­cal speech than the pre­vi­ous re­ply) could be that MCE aims to spark dis­cus­sion about these hard ques­tions of what kinds of minds mat­ter, with­out claiming to have all the an­swers. I per­son­ally main­tain sig­nifi­cant moral un­cer­tainty re­gard­ing how much I care about what kinds of minds, and I’m happy to learn about other peo­ple’s moral in­tu­itions on these things be­cause my own in­tu­itions aren’t set­tled.

E.g. we can think about the DNA based evolu­tion as about large com­pu­ta­tional/​op­ti­miza­tion pro­cess—sud­denly “wild an­i­mal suffer­ing” has a pur­pose and tra­di­tional en­vi­ron­m­net and bio­di­ver­sity pro­tec­tion efforts make sense.

Or if we take a suffer­ing-fo­cused ap­proach to these large sys­tems, then this could provide a fur­ther ar­gu­ment against en­vi­ron­men­tal­ism. :)

If the hu­man cog­ni­tive pro­cesses are in the priv­iledged po­si­tion of cre­at­ing mean­ing in this uni­verse … well, then they are in the priv­iledged pos­tion, and there is a cat­e­gor­i­cal differ­ence be­tween hu­mans and other minds.

I self­ishly con­sider my moral view­point to be “priv­ileged” (in the sense that I pre­fer it to other peo­ple’s moral view­points), but this view­point can have in its con­tent the de­sire to give sub­stan­tial moral weight to non-hu­man (and hu­man-but-not-me) minds.

Ran­dom thought: (fac­tory farm) an­i­mal welfare is­sues will likely even­tu­ally be solved by cul­tured (lab grown) meat when it be­comes cheaper than grow­ing ac­tual an­i­mals. This may take a few decades, but so­cial change might take even longer. The ar­ti­cle even sug­gests tech­ni­cal is­sues may be eas­ier to solve, so why not fo­cus more on that (rather than on MCE)?

I just took it as an as­sump­tion in this post that we’re fo­cus­ing on the far fu­ture, since I think ba­si­cally all the the­o­ret­i­cal ar­gu­ments for/​against that have been made el­se­where. Here’s a good ar­ti­cle on it. I per­son­ally mostly fo­cus on the far fu­ture, though not over­whelm­ingly so. I’m at some­thing like 80% far fu­ture, 20% near-term con­sid­er­a­tions for my cause pri­ori­ti­za­tion de­ci­sions.

This may take a few decades, but so­cial change might take even longer.

To clar­ify, the post isn’t talk­ing about end­ing fac­tory farm­ing. And I don’t think any­one in the EA com­mu­nity thinks we should try to end fac­tory farm­ing with­out tech­nol­ogy as an im­por­tant com­po­nent. Though I think there are good rea­sons for EAs to fo­cus on the so­cial change com­po­nent, e.g. there is less for-profit in­ter­est in that com­po­nent (most of the tech money is from for-profit com­pa­nies, so it’s less ne­glected in this sense).

Thank you for this piece. I en­joyed read­ing it and I’m glad that we’re see­ing more peo­ple be­ing ex­plicit about their cause-pri­ori­ti­za­tion de­ci­sions and open­ing up dis­cus­sion on this cru­cially im­por­tant is­sue.

I know that it’s a weak con­sid­er­a­tion, but I hadn’t, be­fore I read this, con­sid­ered the ar­gu­ment for the scale of val­ues spread­ing be­ing larger than the scale of AI al­ign­ment (per­haps be­cause, as you pointed out, the num­bers in­volved in both are huge) so thanks for bring­ing that up.

I’m in agree­ment with Michael_S that he­do­nium and de­lo­rium should be the most im­por­tant con­sid­er­a­tions when we’re es­ti­mat­ing the value of the far-fu­ture, and from my per­spec­tive the higher prob­a­bil­ity of he­do­nium likely does make the far-fu­ture ro­bustly pos­i­tive, de­spite the valid points you bring up. This doesn’t nec­es­sar­ily mean that we should fo­cus on AIA over MCE (I don’t), but it does make it more likely that we should.

Another use­ful con­tri­bu­tion, though oth­ers may dis­agree, was the bi­ases sec­tion: the bi­ases that could po­ten­tially favour AIA did res­onate with me, and they are use­ful to keep in mind.

That makes sense. If I were con­vinced he­do­nium/​do­lorium dom­i­nated to a very large de­gree, and that he­do­nium was as good as do­lorium is bad, I would prob­a­bly think the far fu­ture was at least mod­er­ately +EV.

Isn’t he­do­nium in­her­ently as good as do­lorium is bad? If it’s not, can’t we just nor­mal­ize and then treat them as the same? I don’t un­der­stand the point of say­ing there will be more he­do­nium than do­lorium in the fu­ture, but the do­lorium will mat­ter more. They’re vague and made-up quan­tities, so can’t we just set it so that “more he­do­nium than do­lorium” im­plies “more good than bad”?

He defines he­do­nium/​do­lorium as the max­i­mum pos­i­tive/​nega­tive util­ity you can gen­er­ate with a cer­tain amount of en­ergy:

“For ex­am­ple, I think a given amount of do­lorium/​dystopia (say, the amount that can be cre­ated with 100 joules of en­ergy) is far larger in ab­solute moral ex­pected value than he­do­nium/​utopia made with the same re­sources.”

All of these are in­fluenced both by strate­gies such as ac­tivism, im­prov­ing in­sti­tu­tions, and im­prov­ing ed­u­ca­tion, as well as by AIA. I am in­clined to think of AIA as a par­tic­u­larly high-lev­er­age point at which we can have in­fluence on these.

How­ever, these are is­sues are widely en­coun­tered. Con­sider 2b: we have to de­cide how to ed­u­cate the next gen­er­a­tion of hu­mans, and they may well end up with eth­i­cal be­liefs that are differ­ent from ours, so we must judge how much to try and in­fluence or con­strain them, and how much to ac­cept that the changes are ac­tu­ally progress. This is similar to the prob­lem of defin­ing CEV: we have some vague idea of the di­rec­tion in which bet­ter val­ues lie (more em­pa­thy, more wis­dom, more knowl­edge), but we can’t say ex­actly what the val­ues should be. For this in­ter­ven­tion, work­ing on AIA may be more im­por­tant than ac­tivism be­cause it has more lev­er­age—it is likely to be more tractable and have greater in­fluence on the fu­ture than the more diffuse ways in we can push on ed­u­ca­tion and in­ter­gen­er­a­tional moral progress.

This frame­work also sug­gests that MCE is just one ex­am­ple of a col­lec­tion of similar in­ter­ven­tions. MCE in­volves push­ing for a fairly spe­cific be­lief and be­havi­our change on a prin­ci­ple that’s fairly un­con­tro­ver­sial. You could also imag­ine similar in­ter­ven­tions—for in­stance, helping peo­ple re­duce un­wanted ag­gres­sive or sadis­tic be­havi­our. We could call this some­thing like ‘un­con­tro­ver­sial moral progress’: helping in­di­vi­d­u­als and civil­i­sa­tion to live by their val­ues more. (on a side note: some­times I think of this as the min­i­mal core of EA: try­ing to live ac­cord­ing to your best guess of what’s right)

The choice be­tween work­ing on 2a and 2b de­pends, among other things, on your level of moral un­cer­tainty.

I am in­clined to think that AIA is the best way to work on 1 and 2b, as it is a par­tic­u­larly high-lev­er­age in­ter­ven­tion point to shape the power struc­tures and moral be­liefs that ex­ist in the fu­ture. It gives us more of a clean slate to de­sign a good sys­tem, rather than hav­ing to work within a faulty sys­tem.

I would re­ally like to see more work on MCE and other ex­am­ples of ‘un­con­tro­ver­sial moral progress’. His­tor­i­cal case stud­ies of value changes seem like a good start­ing point, as well as ac­tu­ally test­ing the tractabil­ity of chang­ing peo­ple’s be­havi­our.

I also re­ally ap­pre­ci­ated your per­spec­tive on differ­ent trans­for­ma­tive AI sce­nar­ios, as I’m wor­ried I’m think­ing about it in an overly nar­row way.

Im­pres­sive ar­ti­cle—I es­pe­cially liked the bi­ases sec­tion. I would recom­mend do­ing a quan­ti­ta­tive model of cost effec­tive­ness com­par­ing to AIA, as I have done for global agri­cul­tural catas­tro­phes, es­pe­cially be­cause ne­glect­ed­ness is hard to define in your case.

I think a given amount of do­lorium/​dystopia (say, the amount that can be cre­ated with 100 joules of en­ergy) is far larger in ab­solute moral ex­pected value than he­do­nium/​utopia made with the same resources

Could you elab­o­rate more on why this is the case? I would tend to think that a prior would be that they’re equal, and then you up­date on the fact that they seem to be asym­met­ri­cal, and try to work out why that is the case, and whether those fac­tors will ap­ply in fu­ture. They could be fun­da­men­tally asym­met­ri­cal, or evolu­tion­ary pres­sures may tend to cre­ate minds with these asym­me­tries. The ar­gu­ments I’ve heard for why are:

The worst thing that can hap­pen to an an­i­mal, in terms of ge­netic suc­cess, is much worse than the best thing.

This isn’t en­tirely clear to me: I can imag­ine a large ge­netic win such as se­cur­ing a large harem could be com­pa­rable to the ge­netic loss of dy­ing, and many an­i­mals will in fact risk death for this. This seems par­tic­u­larly true con­sid­er­ing that dy­ing leav­ing no offspring doesn’t make your con­tri­bu­tion to the gene pool zero, just that it’s only via your rel­a­tives.

There is se­lec­tion against strong pos­i­tive ex­pe­riences in a way that there isn’t against strong nega­tive ex­pe­riences.

The ar­gu­ment here is, I think, that strong pos­i­tive ex­pe­riences will likely re­sult in the an­i­mal stick­ing in the bliss­ful state, and ne­glect­ing to feed, sleep, etc, whereas strong nega­tive ex­pe­riences will just re­sult in the an­i­mal avoid­ing a par­tic­u­lar state, which is less mal­adap­tive. This ar­gu­ment seems stronger to me but still not en­tirely satis­fy­ing—it seems to be quite sen­si­tive to how you define states.

@Matthew_Bar­nett
As a se­nior elec­tri­cal en­g­ineer­ing stu­dent, profi­cient in a va­ri­ety of pro­gram­ming lan­guages, I do think and be­lieve that AI is im­por­tant to think about and dis­cuss. The the­o­ret­i­cal threat of a malev­olent strong AI would be im­mense. But that does not mean one has cause or a valid rea­son to sup­port CS grad stu­dents fi­nan­cially.

A large, sig­nifi­cant, as­ter­oid col­li­sion with Earth would also be quite dev­as­tat­ing. Yet, to fund and sup­port aerospace grads does not fol­low. Per­haps I re­ally mean this: AI safety is an Earn­ing to Give non se­quitur.

Lastly, again, there is no ev­i­dence or re­sults. Effec­tive Altru­ism is about be­ing benefi­cent in­stead of merely benev­olent (mean­ing well). In other words, mak­ing de­ci­sions off well re­searched ini­ti­a­tives (e.g., bed nets). Since strong AI does not ex­ist, it does not make sense to sup­port though E2G. (I’m not say­ing it will never ex­ist; that is un­known.) Of course, there are medium-term (sys­tem­atic change) with re­sults that more or less rely on his­tor­i­cal-type em­piri­cism—but that’s still some type of ev­i­dence. For poverty we have RCTs and de­vel­op­men­tal eco­nomics. For AI safety [some­thing?]. For an­i­mal suffer­ing we have proof that less mis­er­able con­di­tions can be­come a re­al­ity.

I agree that sim­ply be­cause an as­ter­oid col­li­sion would be dev­as­tat­ing, it does not fol­low that we should nec­es­sar­ily fo­cus on that work in par­tic­u­lar. How­ever, there are vari­ables which I think you might be over­look­ing.

The rea­son why peo­ple are con­cerned with AI al­ign­ment is not nec­es­sar­ily be­cause of the scope of the is­sue, but also the ur­gency and tractabil­ity of the prob­lem. The ur­gency of the prob­lem comes from the idea that ad­vanced AI will prob­a­bly be de­vel­oped this cen­tury. The tractabil­ity of the prob­lem comes from the idea that there ex­ists a set of goals that we could in the­ory put into an AI goals that are con­gru­ent with ours—you might want to read up on the Orthog­o­nal­ity Th­e­sis.

Fur­ther­more, it is dan­ger­ous to as­sume that we should judge the effec­tive­ness of cer­tain ac­tivi­ties merely based on prior ev­i­dence or re­sults. There are some ac­tivi­ties which are just in­fea­si­ble to give post hoc judge­ments about—and this is­sue is one of them. The in­her­ent na­ture of the prob­lem is that we will prob­a­bly only get about one chance to de­velop su­per­in­tel­li­gence—be­cause if we fail, then we will all prob­a­bly die or oth­er­wise be per­ma­nently un­able to al­ter its goals.

To give you an anal­ogy, few would agree that be­cause cli­mate change is an un­prece­dented threat, it there­fore fol­lows that we should wait un­til af­ter the dam­age has been done to as­sess the best ways of miti­gat­ing it. Un­for­tu­nately for is­sues that have global scope, it doesn’t look like we get a redo if things start go­ing badly.

If you want to learn more about the re­search, I recom­mend read­ing Su­per­in­tel­li­gence by Nick Bostrom. The vast ma­jor­ity of AI al­ign­ment re­searchers are not wor­ried about malev­olent AI de­spite your state­ment. I mean this is in the kind­est way pos­si­ble, but if you re­ally want to be sure that you’re on the right side of a de­bate, it’s worth un­der­stand­ing the best ar­gu­ments against your po­si­tion, not the worst.

Please, what AIA or­ga­ni­za­tions? MIRI? And do not worry about offend­ing me. I do not in­tend to offend. If I do/​did though my tone or how­ever, I am sorry.

That be­ing said, I wish you would’ve ex­am­ined the ac­tual claims I pre­sented. I did not claim AI re­searchers are wor­ried about a malev­olent AI. I am not against re­searchers; re­search in robotics, in­dus­trial PLCs, nan­otech, what­ever—are fields in their own right. It is donat­ing my in­come, as an in­di­vi­d­ual that I take offense. Peo­ple can fund what­ever they want: A new plane­tary wing at a mu­seum, re­search in robotics, re­search in CS, re­search in CS philos­o­phy.

Although, Earn­ing to Give does not fol­low. Think­ing about and dis­cussing the risks of strong AI does make sense, and we both seem to agree it is im­por­tant. The CS grad stu­dents be­ing sup­ported, how­ever, what makes them differ­ent from a ran­dom CS grad? Just be­cause they claim to be re­search­ing AIA? Fol­low­ing the money, there is not a clear an­swer on which CS grad stu­dents are re­ceiv­ing it. Low or zero trans­parency. MIRI or no? Am I miss­ing some pub­lic in­for­ma­tion?

Se­cond, what do you define as ad­vanced AI? Be­fore, I said strong AI. Is that what you mean? Is there some sort of AI in be­tween? I’m not aware. This is cru­cially where I split with AI safety. The the­ory is an idea of a be­lief about the far fu­ture. To claim that we’re close to de­vel­op­ing strong AI is un­founded to me. What in this cen­tury is so close to strong AI? Neu­ral net­works do not seem to be (from my light re­search).

I do not be­lieve cli­mate change is as sim­ple to define a “be­fore” and “af­ter.” Per­haps a large rogue so­lar flair or the Yel­low­stone su­per­vol­cano. Or per­haps even a time travel anal­ogy would suffice ~ time travel safety re­search. There is no tractabil­ity/​solv­abil­ity. [Blank] can­not be defined be­cause it doesn’t ex­ist; un­founded and un­known phe­nom­ena can­not be solved. Cli­mate change ex­ists. It is a very real re­al­ity. It has solv­abil­ity. A be­lief in an idea about the fu­ture is a poor rea­son for claiming some sort of tractabil­ity for fund­ing. Strong AI safety (sin­gu­lar­ity safety) has “solv­abil­ity” for think­ing about and dis­cussing—but, again, it does not fol­low that one should give mon­e­tar­ily. I feel like I’m beat­ing a dead horse with this point.

For the book recom­men­da­tion, I looked into it. I’d rather read about moral­ity/​ethics di­rectly or fur­ther delve into bet­ter learn­ing Java, Python, Logix5000, LabVIEW, etc.

That be­ing said, I wish you would’ve ex­am­ined the ac­tual claims I pre­sented. I did not claim AI re­searchers are wor­ried about a malev­olent AI.

You did, how­ever, say “The the­o­ret­i­cal threat of a malev­olent strong AI would be im­mense. But that does not mean one has cause or a valid rea­son to sup­port CS grad stu­dents fi­nan­cially.” I as­sumed you meant that you be­lieved some­one was giv­ing an ar­gu­ment along the lines of “since malev­olent AI is pos­si­ble, then we should sup­port CS grads.” If that is not what you meant, then I don’t see the rele­vance of men­tion­ing malev­olent AI.

Since you also stated that you had an is­sue with me not be­ing char­i­ta­ble, I would re­cip­ro­cate like­wise. I agree that we should be char­i­ta­ble to each other’s opinions.

Hav­ing truth­ful views is not about win­ning de­bate. It’s about mak­ing sure that you hold good be­liefs for good rea­sons, end of story. I en­courage you to imag­ine this con­ver­sa­tion not as a way to con­vince me that I’m wrong—but more of a case study about what the cur­rent ar­gu­ments are, and whether they are valid. In the end, you don’t get points for win­ning an ar­gu­ment. You get points for ac­tu­ally hold­ing cor­rect views.

There­fore, it’s good to make sure that your be­liefs ac­tu­ally hold weight un­der scrutiny. Not in a, “you can’t find the flaw af­ter 10 min­utes of self-sab­o­taged think­ing” sort of way, but in a very deep un­der­stand­ing sort of way.

It is donat­ing my in­come, as an in­di­vi­d­ual that I take offense. Peo­ple can fund what­ever they want: A new plane­tary wing at a mu­seum, re­search in robotics, re­search in CS, re­search in CS philos­o­phy.

I agree peo­ple can fund what­ever they want. It’s im­por­tant to make a dis­tinc­tion be­tween nor­ma­tive ques­tions and fac­tual ones. It’s true that peo­ple can fund what­ever pro­ject they like; how­ever, it’s also true that some pro­jects have a high value from an im­per­sonal util­i­tar­ian per­spec­tive. It is this lat­ter cat­e­gory that I care about, which is why I want to find pro­jects with par­tic­u­lar high value. I be­lieve that ex­is­ten­tial risk miti­ga­tion and AI al­ign­ment is among these pro­jects, al­though I fully ad­mit that I may be mis­taken.

Although, Earn­ing to Give does not fol­low. Think­ing about and dis­cussing the risks of strong AI does make sense, and we both seem to agree it is im­por­tant.

If you agree that think­ing about some­thing is valuable, why not also agree that fund­ing that thing is valuable. It seems you think that the field should just get a cer­tain thresh­old of fund­ing that al­lows cer­tain peo­ple to think about the prob­lem just enough—but not too much. I don’t a rea­son to be­lieve that the field of AI al­ign­ment has reached that crit­i­cal thresh­old. On the con­trary, I be­lieve the field is far from it at the mo­ment.

Fol­low­ing the money, there is not a clear an­swer on which CS grad stu­dents are re­ceiv­ing it. Low or zero trans­parency. MIRI or no? Am I miss­ing some pub­lic in­for­ma­tion?

I sup­pose when you make a dona­tion to MIRI, it’s true that you can’t be cer­tain about how they spend that money (al­though I might be wrong about this, I haven’t ac­tu­ally donated to MIRI). Gen­er­ally though, fund­ing an or­ga­ni­za­tion is about whether you think that their mis­sion is ne­glected, and whether you think that fur­ther money would make a marginal im­pact in their cause area. This is no differ­ent than any other char­ity that EA al­igned peo­ple en­dorse.

Se­cond, what do you define as ad­vanced AI? Be­fore, I said strong AI. Is that what you mean? Is there some sort of AI in be­tween? I’m not aware.

It might be con­fus­ing that there are all these terms for AI. To taboo the words “ad­vanced AI”, “strong AI”, “AGI” or oth­ers—what I am wor­ried about is an in­for­ma­tion pro­cess­ing sys­tem that can achieve broad suc­cess in cog­ni­tive tasks in a way that ri­vals or sur­passes hu­mans. I hope that makes it clear.

This is cru­cially where I split with AI safety. The the­ory is an idea of a be­lief about the far fu­ture. To claim that we’re close to de­vel­op­ing strong AI is un­founded to me.

I’m not quite clear what you mean here. If you mean we are wor­ried about AI in the far fu­ture, fine. But then in the next sen­tence you say that we’re wor­ried about be­ing close to strong AI. How can we si­mul­ta­neously be­lieve both. If AI is near then I care about the near-term fu­ture. If AI is not near, then I care about the long-term fu­ture. I do not claim ei­ther, how­ever. I think it is im­por­tant con­sid­er­a­tion even if it’s a long way off.

Neu­ral net­works do not seem to be (from my light re­search).

This is what I’m refer­ring to when I talk about how im­por­tant it is to re­ally, truly un­der­stand some­thing be­fore de­vel­op­ing an in­formed opinion about it. If you ad­mit that you have only done light re­search, how can you be con­fi­dent that you are right. Do­ing a bit of re­search might give you an edge for de­bate pur­poses, but we are talk­ing about the fu­ture of life on Earth here. We re­ally need to know the an­swers to these ques­tions.

Per­haps a large rogue so­lar flair or the Yel­low­stone su­per­vol­cano. Or per­haps even a time travel anal­ogy would suffice ~ time travel safety re­search. There is no tractabil­ity/​solv­abil­ity.

Lump­ing all ex­is­ten­tial risks in a sin­gle cat­e­gory and then as­sert­ing that there’s no tractabil­ity is a sim­plified ap­proach. First what we need is the prob­a­bil­ity of any given ex­is­ten­tial risk oc­cur­ring. For in­stance, if sci­en­tists dis­cov­ered that the Yel­low­stone su­per­vol­cano was prob­a­bly about to erupt some­time in the next few cen­turies, I’d definitely agree we should do re­search in that area, and we should fund that re­search as well. In fact, some re­search is be­ing done in that area and I’m happy that it’s be­ing done.

A be­lief in an idea about the fu­ture is a poor rea­son for claiming some sort of tractabil­ity for fund­ing.

I’d agree with you if it was an idea as­serted with­out ev­i­dence or rea­son. But there’s a whole load of ar­gu­ments about why it is a tractable field, and how we can do things now—yes right now—about mak­ing the fu­ture safer. Ig­no­rance of these ar­gu­ments does not mean they do not ex­ist.

Re­mem­ber, ask your­self first what is true. Then form your opinion. Do not go the other way.

I am not try­ing to “win” any­thing. I am stat­ing why MIRI is not trans­par­ent, and does not deal in scal­able is­sues. As an in­di­vi­d­ual, Earn­ing to Give, it does not fol­low to fund such things un­der the guise of Effec­tive Altru­ism. Ex­is­ten­tial risk is im­por­tant to think about and dis­cuss as in­di­vi­d­u­als. How­ever, fund­ing CS grad stu­dents does not make sense in the light of Effec­tive Altru­ism.

Fund­ing does not in­crease “think­ing.” The whole point of EA is to not give blindly. For ex­am­ple, giv­ing food aid, al­though mean­ing well, can have a very nega­tive effect (i.e., the crowd­ing out effect on the lo­cal mar­ket). Non­malefi­cence should be one’s ini­tial po­si­tion in re­gards to fund­ing.

Lastly, no I rarely ac­cept some­thing as true first. I do not first ac­cept the null hy­poth­e­sis. “But there’s a whole load of ar­gu­ments about why it is a tractable field”—What are they? Again, none of the ac­tual ar­gu­ments were ex­am­ined: How is MIRI go­ing about tractable/​solv­able is­sues? Who of MIRI is get­ting the funds? How is time travel safety not as rele­vant as AI safety?

The lists were in­ter­est­ing on how they al­lude to the differ­ent psy­chol­ogy and mo­ti­va­tions each EA has be­tween the two camps. I hope some­day I can have a civil dis­cus­sion with some­one not di­rectly benefit­ing from AIA (such as be­ing in­volved in the re­search). Aside, I have a friend who’s crazy about fu­tur­ism, 2045 Ini­ti­a­tive/​pro­pa­ganda, and in love with ev­ery­thing Musk says on Twit­ter.

“Their re­ac­tion when they look about ex­tinc­tion risk or AI safety is non­sen­si­cal”, imag­i­nary and com­pletely un­known—zero tractabil­ity. No ev­i­dence to go off of since such tech­nol­ogy does not ex­ist. Why give to CS grad stu­dents? It’s like try­ing to fund a mis­sion to Mars, not pri­or­ity. It’s like fund­ing time travel safety re­search, non se­quitur.

“They are gen­er­ally an un­happy per­son.” I just had to laugh and com­pare how one in­ter­ested in AI safety matched up. A neo-Freudian Jung/​MBTI type of deal. Al­most like zo­diac signs. Although, the Min­nesota Mul­tipha­sic Per­son­al­ity In­ven­tory (MMPI) is rigor­ous—so who am I to judge this in­for­mal in­ven­tory.

Any­way, I sim­ply do not see that in­di­vi­d­ual ac­tion or dona­tion to AIA re­search has mea­surable out­comes. We’re talk­ing about Strong AI here—it doesn’t even ex­ist! Not that it couldn’t though. In the fu­ture, even the medium-term fu­ture, gen­eral stan­dards of liv­ing could be sig­nifi­cantly im­proved. Syn­thetic meat on a pro­duc­tion scale is a much more re­al­is­tic re­search area (or even anti-malaria mosquitoes) in­stead of mak­ing a fuss about imag­i­nary-the­o­ret­i­cal events. We’re at a unique sliver in time where it is ex­tremely prac­ti­cal to help lessen the suffer­ing of hu­mans and an­i­mals in the near and medium-term fu­ture. (I.e., we have rapid trans­porta­tion and in­stant in­for­ma­tion trans­fer.)

Just be­cause an event is the­o­ret­i­cal doesn’t mean that it won’t oc­cur. An as­ter­oid hit­ting the Earth is the­o­ret­i­cal, but some­thing I think you might re­al­ize is quite real when it im­pacts.

Some say that su­per­in­tel­li­gence doesn’t have prece­dence, but I think that’s over­look­ing a key fact. The rise of homo sapi­ens has rad­i­cally al­tered the world—and all signs point to­ward in­tel­li­gence as the cause. We think at the mo­ment that in­tel­li­gence is just a mat­ter of in­for­ma­tion pro­cess­ing, and there­fore, there should be a way that it could be done by our own com­put­ers some day, if only we figured out the right al­gorithms to im­ple­ment.

If we learn that su­per­in­tel­li­gence is im­pos­si­ble, that means our cur­rent most de­scrip­tive sci­en­tific the­o­ries are wrong, and we will have learned some­thing new. That’s be­cause that would in­di­cate that hu­mans are some­how cos­mi­cally spe­cial, or at least have hit the ceiling for gen­eral in­tel­li­gence. On the flip­side, if we cre­ate su­per­in­tel­li­gence, none of our cur­rent the­o­ries of how the world op­er­ates must be wrong.

That’s why it’s im­por­tant to take se­ri­ously. Be­cause the best ev­i­dence we have available tells us that it’s pos­si­ble, not that it’s im­pos­si­ble.