I think it would be use­ful to give your sense of how Embed­ded Agency fits into the more gen­eral prob­lem of AI Safety/​Align­ment. For ex­am­ple, what per­centage of the AI Safety/​Align­ment prob­lem you think Embed­ded Agency rep­re­sents, and what are the other ma­jor chunks of the larger prob­lem?

(It is the part of the pic­ture that I can give while be­ing only de­scrip­tive, and not pre­scrip­tive. For epistemic hy­giene rea­sons, I want avoid dis­cus­sions of how much of differ­ent ap­proaches we need in con­texts (like this one) that would make me feel like I was jus­tify­ing my re­search in a way that peo­ple might in­ter­pret as an offi­cial state­ment from the agent foun­da­tions team lead.)

I think that Embed­ded Agency is ba­si­cally a re­fac­tor­ing of Agent Foun­da­tions in a way that gives one cen­tral cu­ri­os­ity based goal­post, rather than mak­ing it look like a bunch of in­de­pen­dent prob­lems. It is mostly all the same prob­lems, but it was pre­vi­ously pack­aged as “Here are a bunch of things we wish we un­der­stood about al­ign­ing AI,” and in repack­aged as “Here is a cen­tral mys­tery of the uni­verse, and here are a bunch things we don’t un­der­stand about it.” It is not a co­in­ci­dence that they are the same prob­lems, since they were gen­er­ated in the first place by peo­ple pay­ing close to what mys­ter­ies of the uni­verse re­lated to AI we haven’t solved yet.

I think of Agent Foun­da­tions re­search has hav­ing a differ­ent type sig­na­ture than most other AI Align­ment re­search, in a way that looks kind of like Agent Foun­da­tions:other AI al­ign­ment::sci­ence:en­g­ineer­ing. I think of AF as more for­ward-chain­ing and other stuff as more back­ward-chain­ing. This may seem back­wards if you think about AF as rea­son­ing about su­per­in­tel­li­gent agents, and other re­search pro­grams as think­ing about mod­ern ML sys­tems, but I think it is true. We are try­ing to build up a moun­tain of un­der­stand­ing, un­til we col­lect enough that the prob­lem seems eas­ier. Others are try­ing to make di­rect plans on what we need to do, see what is wrong with those plans, and try to fix the prob­lems. Some con­se­quences of this is that AF work is more likely to be helpful given long timelines, par­tially be­cause AF is try­ing to be the start of a long jour­ney of figur­ing things out, but also be­cause AF is more likely to be ro­bust to huge shifts in the field.

I ac­tu­ally like to draw an anal­ogy with this: (taken from this post by Evan Hub­inger)

I was talk­ing with Scott Garrabrant late one night re­cently and he gave me the fol­low­ing prob­lem: how do you get a fixed num­ber of DFA-based robots to tra­verse an ar­bi­trary maze (if the robots can lo­cally com­mu­ni­cate with each other)? My ap­proach to this prob­lem was to come up with and then try to falsify var­i­ous pos­si­ble solu­tions. I started with a hy­poth­e­sis, threw it against coun­terex­am­ples, fixed it to re­solve the coun­terex­am­ples, and iter­ated. If I could find a hy­poth­e­sis which I could prove was un­falsifi­able, then I’d be done.

When Scott no­ticed I was us­ing this ap­proach, he re­marked on how differ­ent it was than what he was used to when do­ing math. Scott’s ap­proach, in­stead, was to just start prov­ing all of the things he could about the sys­tem un­til he man­aged to prove that he had a solu­tion. Thus, while I was work­ing back­wards by com­ing up with pos­si­ble solu­tions, Scott was work­ing for­wards by ex­pand­ing the scope of what he knew un­til he found the solu­tion.

(I don’t think it quite com­mu­ni­cates my ap­proach cor­rectly, but I don’t know how to do bet­ter.)

A con­se­quence of the type sig­na­ture of Agent Foun­da­tions is that my an­swer to “What are the other ma­jor chunks of the larger prob­lem?” is “That is what I am try­ing to figure out.”

Pro­moted to cu­rated: I think this post (and the fol­low­ing posts in the se­quence) might be my fa­vorite posts that have been writ­ten in the last year. I think this has a lot to do with how this post feels like it is suc­cess­fully do­ing one of the things that Eliezer se­quences did, which is to com­bine a wide range of solid the­o­ret­i­cal re­sults, with prac­ti­cal im­pli­ca­tions for ra­tio­nal­ity as well as broad ex­pla­na­tions that im­prove my un­der­stand­ing not just of the spe­cific do­main at hand, but of a range of re­lated do­mains as well.

I also re­ally like the change of pace with the for­mat, and quite like how easy cer­tain things be­come to ex­plain when you use a more vi­sual for­mat. I think it’s a ex­cep­tion­ally ac­cessible post, even given its highly tech­ni­cal na­ture. I am cur­rently reread­ing GEB, and this post re­minds me of that book in a large va­ri­ety of ways, in a very good way.

Good, point. We just up­loaded the images that Abram gave us, but I just re­al­ized that they are quite large and have min­i­mal com­pres­sion ap­plied to them.

I just ex­per­i­mented with some com­pres­sion and it looks like we can get a 5x size re­duc­tion with­out any sig­nifi­cant loss in qual­ity, so we will go and re­place all the images with the com­pressed ones. Thanks for point­ing that out!

Ah, yep. Sorry, that was the size af­ter I ex­per­i­mented with crop­ping them some (un­til I re­al­ized that pix­els don’t mean pix­els any­more, and we need to serve higher re­s­olu­tion images to retina screens).

Yeah, we could try chang­ing the image for­mat as well. Though I think it’s mostly fine now.

Bayesian rea­son­ing works by start­ing with a large col­lec­tion of pos­si­ble en­vi­ron­ments, and as you ob­serve facts that are in­con­sis­tent with some of those en­vi­ron­ments, you rule them out. What does rea­son­ing look like when you’re not even ca­pa­ble of stor­ing a sin­gle valid hy­poth­e­sis for the way the world works? Emmy is go­ing to have to use a differ­ent type of rea­son­ing, and make up­dates that don’t fit into the stan­dard Bayesian frame­work.

I think maybe this para­graph should say “Solomonoff in­duc­tion” in­stead of Bayesian rea­son­ing. If I’m rea­son­ing about a coin, and I have a model with a sin­gle pa­ram­e­ter rep­re­sent­ing the coin’s bias, there’s a sense in which I’m do­ing Bayesian rea­son­ing and there is some valid hy­poth­e­sis for the coin’s bias. Most ap­plied Bayesian ML work looks more like dis­cov­er­ing a coin’s bias than think­ing about the world at a suffi­ciently high re­s­olu­tion for the al­gorithm to be mod­el­ing it­self, so this seems like an im­por­tant dis­tinc­tion.

This sounds similar in effect to what philos­o­phy of mind calls “em­bod­ied cog­ni­tion”, but it takes a more ab­stract tack. Is there a rec­og­nized back­ground link be­tween the two ideas already? Is that a use­ful idea, re­gard­less of whether it already ex­ists, or am I off track?

Our old name for em­bed­ded agency was “nat­u­ral­ized agency”; we switched be­cause we kept find­ing that CS peo­ple wanted to know what we meant by “nat­u­ral­ized”, and we’d always say “em­bed­ded”, so...

“Em­bod­i­ment” is less rele­vant be­cause it’s about, well, bod­ies. Embed­ded agency just says that the agent is em­bed­ded in its en­vi­ron­ment in some fash­ion; it doesn’t say that the agent has a robot body, in spite of the cute pic­tures of robots Abram drew above. An AI sys­tem with no “body” it can di­rectly ma­nipu­late or sense will still be phys­i­cally im­ple­mented on com­put­ing hard­ware, and that on its own can raise all the is­sues above.

In my view, em­bod­ied cog­ni­tion says that the way in which an agent is em­bod­ied is im­por­tant to its cog­ni­tion, whereas em­bed­ded agency says that the fact that an agent is em­bod­ied is im­por­tant to its cog­ni­tion.

(This is prob­a­bly a rep­e­ti­tion, but it’s shorter and more ex­plicit, which could be use­ful.)

Bayesian rea­son­ing works by start­ing with a large col­lec­tion of pos­si­ble en­vi­ron­ments, and as you ob­serve facts that are in­con­sis­tent with some of those en­vi­ron­ments, you rule them out. What does rea­son­ing look like when you’re not even ca­pa­ble of stor­ing a sin­gle valid hy­poth­e­sis for the way the world works? Emmy is go­ing to have to use a differ­ent type of rea­son­ing, and make up­dates that don’t fit into the stan­dard Bayesian frame­work.

I think maybe this para­graph should say “Solomonoff in­duc­tion” in­stead of Bayesian rea­son­ing. If I’m rea­son­ing about a coin, and I have a model with a sin­gle pa­ram­e­ter rep­re­sent­ing the coin’s bias, there’s a sense in which I’m do­ing Bayesian rea­son­ing and there is some valid hy­poth­e­sis for the coin’s bias. Most ap­plied Bayesian ML work looks more like dis­cov­er­ing a coin’s bias than think­ing about the world at a suffi­ciently high re­s­olu­tion for the al­gorithm to be mod­el­ing it­self, so this seems like an im­por­tant dis­tinc­tion.

I don’t un­der­stand why be­ing an em­bed­ded agent makes Bayesian rea­son­ing im­pos­si­ble. My in­tu­ition is that an hy­poth­e­sis doesn’t have to be perfectly cor­re­lated with re­al­ity to be use­ful. Fur­ther­more sup­pose you con­ceived of hy­pothe­ses as be­ing a con­junc­tion of el­e­men­tary hy­poth­e­sis, then I see no rea­son why you can­not perform Bayesian rea­son­ing of the form “hy­poth­e­sis X is one of the con­situents of the true hy­poth­e­sis”, even if the agent can’t perfectly de­scribe the true hy­poth­e­sis.

Also, “the agent is larger/​smaller than the en­vi­ron­ment” is not very clear, so I think it would help if you would clar­ify what those terms mean.

I don’t see the point in adding so much com­plex­ity to such a sim­ple mat­ter. AIXI is an in­com­putable agent who’s proofs of op­ti­mal­ity re­quire a com­putable en­vi­ron­ment. It re­quires a spe­cific con­figu­ra­tion of the clas­sic agent-en­vi­ron­ment-loop where the agent and the en­vi­ron­ment are in­de­pen­dent ma­chines. That spe­cific con­figu­ra­tion is only ap­pli­ca­ble to a sub-set of real-world prob­lems in which the en­vi­ron­ment can be as­sumed to be much “smaller” than the agent op­er­at­ing upon it. Prob­lems that don’t in­volve other agents and have very few de­grees of free­dom rel­a­tive the agent op­er­at­ing upon them.

Mar­cus Hut­ter already pro­posed com­putable ver­sions of AIXI like AIXI_lt. In the con­text of agent-en­vi­ron­ment loops, AIXI_lt is ac­tu­ally more gen­eral than AIXI be­cause AIXI_lt can be ap­plied to all con­figu­ra­tions of the agent-en­vi­ron­ment loop in­clud­ing the em­bed­ded agent con­figu­ra­tion. AIXI is a spe­cial case of AIXI_lt where the limits of “l” and “t” go to in­finity.

Some of the prob­lems you bring up seem to be con­cerned with the prob­lem of rec­on­cil­ing logic with prob­a­bil­ity while oth­ers seem to be con­cerned with real-world im­ple­men­ta­tion. If your goal is to define con­cepts like “in­tel­li­gence” with math­e­mat­i­cal for­mal­iza­tions (which I be­lieve is nec­es­sary), then you need to delineate that from real-world im­ple­men­ta­tion. Dis­cussing both si­mul­ta­neously is ex­tremely con­fus­ing. In the real world, an agent only has is em­piri­cal ob­ser­va­tions. It has no “seeds” to build log­i­cal proofs upon. That’s why sci­en­tists talk about the­o­ries and ev­i­dence sup­port­ing them rather than proofs and ax­ioms.

You can’t prove that the sun will rise to­mor­row, you can only show that it’s rea­son­able to ex­pect the sun to rise to­mor­row based on your ob­ser­va­tions. Math­e­mat­ics is the study of pat­terns, math­e­mat­i­cal no­ta­tion is a lan­guage we in­vented to de­scribe pat­terns. We can prove the­o­rems in math­e­mat­ics be­cause we are the ones who de­cide the fun­da­men­tal ax­ioms. When we find pat­terns that don’t lend them­selves eas­ily to math­e­mat­i­cal de­scrip­tion, we re­work the tool (add con­cepts like zero, nega­tive num­bers, com­plex num­bers, etc.). It hap­pens that we live in a uni­verse that seems to fol­low pat­terns, so we try to use math­e­mat­ics to de­scribe the pat­terns we see and we de­sign ex­per­i­ments to in­ves­ti­gate the ex­tent to which those pat­terns ac­tu­ally hold.

The branch of math­e­mat­ics for char­ac­ter­iz­ing sys­tems with in­com­plete in­for­ma­tion is prob­a­bil­ity. If you wan’t to talk about real-world im­ple­men­ta­tions, most non-triv­ial prob­lems fall un­der this do­main.

When think­ing about em­bed­ded agency it might be helpful to drop the no­tion of ‘agency’ and ‘agents’ some­times, be­cause it might be con­fus­ing or un­der­defined. In­stead one could think of [sub-]pro­cesses run­ning ac­cord­ing to the laws of physics. Or of al­gorithms run­ning on a stack of in­ter­preters run­ning on the hard­ware of the uni­verse.

I have been think­ing about this for a while and I’d like to pro­pose a differ­ent view­point on this same idea. It’s equiv­a­lent but per­son­ally I pre­fer this per­spec­tive. In­stead of say­ing ‘em­bed­ded agency’ which im­plies an em­bed­ding of the agent in the en­vi­ron­ment. Per­haps we could say ‘em­bed­ded en­vi­ron­ment’. Which im­plies that the en­vi­ron­ment is within the agent. To illus­trate, the same way ‘I’ can ‘ob­serve’ ‘my’ ‘thoughts’, I can ob­serve my lap­top. Are my thoughts me, or is my lap­top me? I think the cor­rect way way took look at it that my com­puter is just as much a part of me as are my thoughts. Even though surely ev­ery­one is con­vinced that this du­al­is­tic no­tion be­tween agent and en­vi­ron­ment does not make sense, there seems to be a differ­ence be­tween what one be­lieves on a level of ‘knowl­edge’, and what you can phe­nomenolog­i­cally ob­serve. I be­lieve this phras­ing al­lows peo­ple to more eas­ier ex­pe­rience for them­selves the truth­ness of the em­bed­ded en­vi­ron­ment/​agent idea, which is im­por­tant for fur­ther idea gen­er­a­tion! :)