In my pre­vi­ous posts, I have been build­ing up a model of mind as a col­lec­tion of sub­agents with differ­ent goals, and no straight­for­ward hi­er­ar­chy. This then raises the ques­tion of how that col­lec­tion of sub­agents can ex­hibit co­her­ent be­hav­ior: af­ter all, many ways of ag­gre­gat­ing the prefer­ences of a num­ber of agents fail to cre­ate con­sis­tent prefer­ence or­der­ings.

We can roughly de­scribe co­her­ence as the prop­erty that, if you be­come aware that there ex­ists a more op­ti­mal strat­egy for achiev­ing your goals than the one that you are cur­rently ex­e­cut­ing, then you will switch to that bet­ter strat­egy. If an agent is not co­her­ent in this way, then bad things are likely to hap­pen to them.

Now, we all know that hu­mans some­times ex­press in­co­her­ent be­hav­ior. But on the whole, peo­ple still do okay: the me­dian per­son in a de­vel­oped coun­try still man­ages to sur­vive un­til their body starts giv­ing up on them, and typ­i­cally also man­ages to have and raise some num­ber of ini­tially-hel­pless chil­dren un­til they are old enough to take care of them­selves.

For a sub­agent the­ory of mind, we would like to have some ex­pla­na­tion of when ex­actly the sub­agents man­age to be col­lec­tively co­her­ent (that is, change their be­hav­ior to some bet­ter one), and what are the situ­a­tions in which they fail to do so. The con­clu­sion of this post will be:

We are ca­pa­ble of chang­ing our be­hav­iors on oc­ca­sions when the mind-sys­tem as a whole puts suffi­ciently high prob­a­bil­ity on the new be­hav­ior be­ing bet­ter, when the new be­hav­ior is not be­ing blocked by a par­tic­u­lar highly weighted sub­agent (such as an IFS-style pro­tec­tor) that puts high prob­a­bil­ity on it be­ing bad, and when we have enough slack in our lives for any new be­hav­iors to be eval­u­ated in the first place. Akra­sia is sub­agent dis­agree­ment about what to do.

(Those of you who read my pre­vi­ous post might re­mem­ber that I said this post would be about “unifi­ca­tion of mind”—that is, about how to make sub­agents agree with each other bet­ter. Turns out that I spent so many words ex­plain­ing when sub­agents dis­agree, that I had to put off the post on how to get them to agree. Maybe my next post will man­age to be about that…)

Cor­rect­ing your be­hav­ior as a default

There are many situ­a­tions in which we ex­hibit in­co­her­ent be­hav­ior sim­ply be­cause we’re not aware of it. For in­stance, sup­pose that I do my daily chores in a par­tic­u­lar or­der, when do­ing them in some other or­der would save more time. If you point this out to me, I’m likely to just say “oh”, and then adopt the bet­ter sys­tem.

Similarly, sev­eral of the ex­per­i­ments which get peo­ple to ex­hibit in­co­her­ent be­hav­ior rely on show­ing differ­ent groups of peo­ple differ­ent for­mu­la­tions of the same ques­tion, and then in­di­cat­ing that differ­ent fram­ings of the same ques­tion get differ­ent an­swers from peo­ple. It doesn’t work quite as well if you show the differ­ent for­mu­la­tions to the same peo­ple, be­cause then many of them will re­al­ize that differ­ing an­swers would be in­con­sis­tent.

But there are also situ­a­tions in which some­one re­al­izes that they are be­hav­ing in a non­sen­si­cal way, yet will con­tinue be­hav­ing in that way. Since peo­ple usu­ally can change sub­op­ti­mal be­hav­iors, we need an ex­pla­na­tion for why they some­times can’t.

Tow­ers of pro­tec­tors as a method for coherence

In my post about In­ter­nal Fam­ily Sys­tems, I dis­cussed a model of mind com­posed of sev­eral differ­ent kinds of sub­agents. One of them, the de­fault plan­ning sub­agent, is a mod­ule just try­ing to straight­for­wardly find the best thing to do and then ex­e­cute that. On the other hand, pro­tec­tor sub­agents ex­ist to pre­vent the sys­tem from get­ting into situ­a­tions which were catas­trophic be­fore. If they think that the de­fault plan­ning sub­agent is do­ing some­thing which seems dan­ger­ous, they will over­ride it and do some­thing else in­stead. (Pre­vi­ous ver­sions of the IFS post called the de­fault plan­ning agent, “a re­in­force­ment learn­ing sub­agent”, but this was po­ten­tially mis­lead­ing since sev­eral other sub­agents were re­in­force­ment learn­ing ones too, so I’ve changed the name.)

Thus, your be­hav­ior can still be co­her­ent even if you feel that you are failing to act in a co­her­ent way. You sim­ply don’t re­al­ize that a pro­tec­tor is car­ry­ing out a rou­tine in­tended to avoid dan­ger­ous out­comes—and this might ac­tu­ally be a very suc­cess­ful way of keep­ing you out of dan­ger. Some sub­agents in your mind think that do­ing X would be a su­pe­rior strat­egy, but the pro­tec­tor thinks that it would be a hor­rible idea—so from the point of view of the sys­tem as a whole, do­ing X is not a bet­ter strat­egy, so not switch­ing to it is ac­tu­ally bet­ter.

On the other hand, it may also be the case that the pro­tec­tor’s be­hav­ior, while keep­ing you out of situ­a­tions which the pro­tec­tor con­sid­ers un­ac­cept­able, is caus­ing other out­comes which are also un­ac­cept­able. The de­fault plan­ning sub­agent may re­al­ize this—but as already es­tab­lished, any pro­tec­tor can over­rule it, so this doesn’t help.

Evolu­tion’s an­swer here seems to be spaghetti tow­ers. The de­fault plan­ning sub­agent might even­tu­ally figure out the bet­ter strat­egy, which avoids both the thing that the pro­tec­tor is try­ing to block and the new bad out­come. But it could be dan­ger­ous to wait that long, es­pe­cially since the de­fault plan­ning agent doesn’t have di­rect ac­cess to the pro­tec­tor’s goals. So for the same rea­sons why a sep­a­rate pro­tec­tor sub­agent was cre­ated to avoid the first catas­tro­phe, the mind will cre­ate or re­cruit a pro­tec­tor to avoid the sec­ond catas­tro­phe—the one that the first pro­tec­tor keeps caus­ing.

Ex­am­ple Eric grows up in an en­vi­ron­ment where he learns that dis­agree­ing with other peo­ple is un­safe, and that he should always agree to do things that other peo­ple ask of him. So Eric de­vel­ops a pro­tec­tor sub­agent run­ning a pleas­ing, sub­mis­sive be­hav­ior.

Un­for­tu­nately, while this tac­tic worked in Eric’s child­hood home, once he be­came an adult he starts say­ing “yes” to too many things, with­out leav­ing any time for his own needs. But say­ing “no” to any­thing still feels un­safe, so he can’t just stop say­ing “yes”. In­stead he de­vel­ops a pro­tec­tor which tries to keep him out of situ­a­tions where peo­ple would ask him to do any­thing. This way, he doesn’t need to say “no”, and also won’t get over­whelmed by all the things that he has promised to do. The two pro­tec­tors to­gether form a com­pos­ite strat­egy.

While this helps, it still doesn’t en­tirely solve the is­sue. After all, there are plenty of rea­sons that might push Eric into situ­a­tions where some­one would ask some­thing of him. He still ends up agree­ing to do lots of things, to the point of ne­glect­ing his own needs. Even­tu­ally, his brain cre­ates an­other pro­tec­tor sub­agent. This one causes ex­haus­tion and de­pres­sion, so that he now has a so­cially-ac­cept­able rea­son for be­ing un­able to do all the things that he has promised to do. He con­tinues say­ing “yes” to things, but also keeps apol­o­giz­ing for be­ing un­able to do things that he (hon­estly) in­tended to do as promised, and even­tu­ally peo­ple re­al­ize that you prob­a­bly shouldn’t ask him to do any­thing that’s re­ally im­por­tant to get done.

And while this kind of a pro­cess of stack­ing pro­tec­tor on top of a pro­tec­tor is not perfect, for most peo­ple it mostly works out okay. Al­most ev­ery­one ends up hav­ing their unique set of minor neu­roses and situ­a­tions where they don’t quite be­have ra­tio­nally, but as they learn to un­der­stand them­selves bet­ter, their de­fault plan­ning sub­agent gets bet­ter at work­ing around those is­sues. This might also make the var­i­ous pro­tec­tors re­lax a bit, since the var­i­ous threats are gen­er­ally avoided and there isn’t a need to keep avoid­ing them.

But some­times, es­pe­cially for peo­ple in highly stress­ful en­vi­ron­ments where al­most any mis­take may get them pun­ished, or when they end up in an en­vi­ron­ment that their old tower of pro­tec­tors is no longer well-suited for (dis­tri­bu­tional shift), things don’t go as well. In that situ­a­tion, their minds may end up look­ing like this a hope­lessly tan­gled web, where they have al­most no flex­i­bil­ity. Some­thing hap­pens in their en­vi­ron­ment, which sets off one pro­tec­tor, which sets off an­other, which sets off an­other—leav­ing them with no room for flex­i­bil­ity or ra­tio­nal plan­ning, but rather forc­ing them to act in a way which is al­most bound to only make mat­ters worse.

This kind of an out­come is ob­vi­ously bad. So be­sides build­ing spaghetti tow­ers, the sec­ond strat­egy which the mind has evolved to em­ploy for keep­ing its be­hav­ior co­her­ent while piling up pro­tec­tors, is the abil­ity to re-pro­cess mem­o­ries of past painful events.

As I dis­cussed in my origi­nal IFS post, the mind has meth­ods for bring­ing up the origi­nal mem­o­ries which caused a pro­tec­tor to emerge, in or­der to re-an­a­lyze them. If end­ing up in some situ­a­tion is ac­tu­ally no longer catas­trophic (for in­stance, you are no longer in your child­hood home where you get pun­ished sim­ply for not want­ing to do some­thing), then the pro­tec­tors which were fo­cused on avoid­ing that out­come can re­lax and take a less ex­treme role.

For this pur­pose, there seems to be a built-in ten­sion. Ex­iles (the IFS term for sub­agents con­tain­ing mem­o­ries of past trauma) “want” to be healed and will do things like oc­ca­sion­ally send­ing painful mem­o­ries or feel­ings into con­scious­ness so as to be­come the cen­ter of at­ten­tion, es­pe­cially if there is some­thing about the cur­rent situ­a­tion which re­sem­bles the past trauma. This also acts as what my IFS post called a fear model—some­thing that warns of situ­a­tions which re­sem­ble the past trauma enough to be con­sid­ered dan­ger­ous in their own right. At the same time, pro­tec­tors “want” to keep the ex­iles hid­den and in­ac­tive, do­ing any­thing that they can for keep­ing them so. Var­i­ous schools of ther­apy—IFS one of them—seek to tap into this ex­ist­ing ten­sion so as to re­veal the trauma, trace it back to its origi­nal source, and heal it.

Co­her­ence and con­di­tioned responses

Be­sides the pres­ence of pro­tec­tors, an­other pos­si­bil­ity for why we might fail to change our be­hav­ior are strongly con­di­tioned habits. Most hu­man be­hav­ior in­volves au­to­matic habits: be­hav­ioral rou­tines which are trig­gered by some sort of a cue in the en­vi­ron­ment, and lead to or have once led to a re­ward. (Pre­vi­ous dis­cus­sion; see also.)

The prob­lem with this is that peo­ple might end up with habits that they wouldn’t want to have. For in­stance, I might de­velop a habit of check­ing so­cial me­dia on their phone when I’m bored, cre­at­ing a loop of bore­dom (cue) → look­ing at so­cial me­dia (be­hav­ior) → see­ing some­thing in­ter­est­ing on so­cial me­dia (re­ward).

Reflect­ing on this be­hav­ior, I no­tice that back when I didn’t do it, my mind was more free to wan­der when I was bored, gen­er­at­ing mo­ti­va­tion and ideas. I think that my old be­hav­ior was more valuable than my new one. But even so, my new be­hav­ior still de­liv­ers enough mo­men­tary satis­fac­tion to keep re­in­forc­ing the habit.

Sub­jec­tively, this feels like an in­creas­ing com­pul­sion to check my phone, which I try to re­sist since I know that long-term it would be a bet­ter idea to not be check­ing my phone all the time. But as the com­pul­sion keeps grow­ing stronger and stronger, even­tu­ally I give up and look at the phone any­way.

The ex­act neu­ro­science of what is hap­pen­ing at such a mo­ment re­mains only par­tially un­der­stood (Simp­son & Balsam 2016). How­ever, we know that when­ever differ­ent sub­sys­tems in the brain pro­duce con­flict­ing mo­tor com­mands, that con­flict needs to be re­solved, with only one at a time be­ing granted ac­cess to the “fi­nal com­mon mo­tor path”. This is thought to hap­pen in the basal gan­glia, a part of the brain closely in­volved in ac­tion se­lec­tion and con­nected to the global neu­ronal workspace.

One model (e.g. Red­grave 2007, McHaf­fie 2005) is that the basal gan­glia re­ceives in­puts from many differ­ent brain sys­tems; each of those sys­tems can send differ­ent “bids” sup­port­ing or op­pos­ing a spe­cific course of ac­tion to the basal gan­glia. A bid sub­mit­ted by one sub­sys­tem may, through looped con­nec­tions go­ing back from the basal gan­glia, in­hibit other sub­sys­tems, un­til one of the pro­posed ac­tions be­comes suffi­ciently dom­i­nant to be taken.

The above image from Red­grave 2007 has a con­cep­tual image of the model, with two ex­am­ple sub­sys­tems shown. Sup­pose that you are eat­ing at a restau­rant in Juras­sic Park when two ve­lo­cirap­tors charge in through the win­dow. Pre­vi­ously, your hunger sys­tem was sub­mit­ting suc­cess­ful bids for the “let’s keep eat­ing” ac­tion, which then caused in­hibitory im­pulses to the be sent to the threat sys­tem. This in­hi­bi­tion pre­vented the threat sys­tem from mak­ing bids for silly things like jump­ing up from the table and run­ning away in a panic. How­ever, as your brain reg­isters the new situ­a­tion, the threat sys­tem gets sig­nifi­cantly more strongly ac­ti­vated, send­ing a strong bid for the “let’s run away” ac­tion. As a re­sult of the basal gan­glia re­ceiv­ing that bid, an in­hibitory im­pulse is routed from the basal gan­glia to the sub­sys­tem which was pre­vi­ously sub­mit­ting bids for the “let’s keep eat­ing” ac­tions. This makes the threat sys­tem’s bids even stronger rel­a­tive to the (in­hibited) eat­ing sys­tem’s bids.

Soon the basal gan­glia, which was pre­vi­ously in­hibit­ing the threat sub­sys­tem’s ac­cess to the mo­tor sys­tem while al­low­ing the eat­ing sys­tem ac­cess, with­draws that in­hi­bi­tion and starts in­hibit­ing the eat­ing sys­tem’s ac­cess in­stead. The re­sult is that you jump up from your chair and be­gin to run away. Un­for­tu­nately, this is hope­less since the ve­lo­cirap­tor is faster than you. A few mo­ments later, the ve­lo­cirap­tor’s basal gan­glia gives the rap­tor’s “eat­ing” sub­sys­tem ac­cess to the rap­tor’s mo­tor sys­tem, let­ting it hap­pily munch down its lat­est meal.

But let’s leave ve­lo­cirap­tors be­hind and go back to our origi­nal ex­am­ple with the phone. Sup­pose that you have been try­ing to re­place the habit of look­ing at your phone when bored, to in­stead smil­ing and di­rect­ing your at­ten­tion to pleas­ant sen­sa­tions in your body, and then let­ting your mind wan­der.

Un­til the new habit es­tab­lishes it­self, the two habits will com­pete for con­trol. Fre­quently, the old habit will be stronger, and you will just au­to­mat­i­cally check your phone with­out even re­mem­ber­ing that you were sup­posed to do some­thing differ­ent. For this rea­son, be­hav­ioral change pro­grams may first spend sev­eral weeks just prac­tic­ing notic­ing the situ­a­tions in which you en­gage in the old habit. When you do no­tice what you are about to do, then more goal-di­rected sub­sys­tems may send bids to­wards the “smile and look for nice sen­sa­tions” ac­tion. If this hap­pens and you pay at­ten­tion to your ex­pe­rience, you may no­tice that long-term it ac­tu­ally feels more pleas­ant than look­ing at the phone, re­in­forc­ing the new habit un­til it be­comes preva­lent.

To put this in terms of the sub­agent model, we might dras­ti­cally sim­plify things by say­ing that the neu­ral pat­tern cor­re­spond­ing to the old habit is a sub­agent re­act­ing to a spe­cific sen­sa­tion (bore­dom) in the con­scious­ness workspace: its re­ac­tion is to gen­er­ate an in­ten­tion to look at the phone. At first, you might train the sub­agent re­spon­si­ble for mon­i­tor­ing the con­tents of your con­scious­ness, to out­put mo­ments of in­tro­spec­tive aware­ness high­light­ing when that in­ten­tion ap­pears. That in­tro­spec­tive aware­ness helps alert a goal-di­rected sub­agent to try to trig­ger the new habit in­stead. Grad­u­ally, a neu­ral cir­cuit cor­re­spond­ing to the new habit gets trained up, which starts send­ing its own bids when it de­tects bore­dom. Over time, re­in­force­ment learn­ing in the basal gan­glia starts giv­ing that sub­agent’s bids more weight rel­a­tive to the old habit’s, un­til it no longer needs the goal-di­rected sub­agent’s sup­port in or­der to win.

Now this model helps in­cor­po­rate things like the role of hav­ing a vivid emo­tional mo­ti­va­tion, a sense of hope, or psych­ing your­self up when try­ing to achieve habit change. Do­ing things like imag­in­ing an out­come that you wish the habit to lead to, may ac­ti­vate ad­di­tional sub­sys­tems which care about those kinds of out­comes, caus­ing them to sub­mit ad­di­tional bids in fa­vor of the new habit. The ex­tent to which you suc­ceed at do­ing so, de­pends on the ex­tent to which your mind-sys­tem con­sid­ers it plau­si­ble that the new habit leads to the new out­come. For in­stance, if you imag­ine your ex­er­cise habit mak­ing you strong and healthy, then sub­agents which care about strength and health might ac­ti­vate to the ex­tent that you be­lieve this to be a likely out­come, send­ing bids in fa­vor of the ex­er­cise ac­tion.

On this view, one way for the mind to main­tain co­her­ence and read­just its be­hav­iors, is its abil­ity to re-eval­u­ate old habits in light of which sub­sys­tems get ac­ti­vated when re­flect­ing on the pos­si­ble con­se­quences of new habits. An old habit hav­ing been strongly re­in­forced re­flects that a great deal of ev­i­dence has ac­cu­mu­lated in fa­vor of it be­ing benefi­cial, but the be­hav­ior in ques­tion can still be over­rid­den if enough in­fluen­tial sub­sys­tems weigh in with their eval­u­a­tion that a new be­hav­ior would be more benefi­cial in ex­pec­ta­tion.

Some sub­sys­tems hav­ing con­cerns (e.g. im­me­di­ate sur­vival) which are ranked more highly than oth­ers (e.g. cre­ative ex­plo­ra­tion) means that the de­ci­sion-mak­ing pro­cess ends up car­ry­ing out an im­plicit ex­pected util­ity calcu­la­tion. The strengths of bids sub­mit­ted by differ­ent sys­tems do not just re­flect the prob­a­bil­ity that those sub­sys­tems put on an ac­tion be­ing the most benefi­cial. There are also differ­ent mechanisms giv­ing the bids from differ­ent sub­sys­tems vary­ing amounts of weight, de­pend­ing on how im­por­tant the con­cerns rep­re­sented by that sub­sys­tem hap­pen to be in that situ­a­tion. This ends up do­ing some­thing like weight­ing the prob­a­bil­ities by util­ity, with the kinds of util­ity calcu­la­tions that are cho­sen by evolu­tion and cul­ture in a way to max­i­mize ge­netic fit­ness on av­er­age. Pro­tec­tors, of course, are sub­sys­tems whose bids are weighted par­tic­u­larly strongly, since the sys­tem puts high util­ity on avoid­ing the kinds of out­comes they are try­ing to avoid.

The origi­nal ques­tion which mo­ti­vated this sec­tion was: why are we some­times in­ca­pable of adopt­ing a new habit or aban­don­ing an old one, de­spite know­ing that to be a good idea? And the an­swer is: be­cause we don’t know that such a change would be a good idea. Rather, some sub­sys­tems think that it would be a good idea, but other sub­sys­tems re­main un­con­vinced. Thus the sys­tem’s over­all judg­ment is that the old be­hav­ior should be main­tained.

In­ter­lude: Min­sky on mu­tu­ally bid­ding subagents

I was try­ing to con­cen­trate on a cer­tain prob­lem but was get­ting bored and sleepy. Then I imag­ined that one of my com­peti­tors, Pro­fes­sor Challenger, was about to solve the same prob­lem. An an­gry wish to frus­trate Challenger then kept me work­ing on the prob­lem for a while. The strange thing was, this prob­lem was not of the sort that ever in­ter­ested Challenger.

What makes us use such round­about tech­niques to in­fluence our­selves? Why be so in­di­rect, in­vent­ing mis­rep­re­sen­ta­tions, fan­tasies, and out­right lies? Why can’t we sim­ply tell our­selves to do the things we want to do? [...]

Ap­par­ently, what hap­pened was that my agency for Work ex­ploited Anger to stop Sleep. But why should Work use such a de­vi­ous trick?

To see why we have to be so in­di­rect, con­sider some al­ter­na­tives. If Work could sim­ply turn off Sleep, we’d quickly wear our bod­ies out. If Work could sim­ply switch Anger on, we’d be fight­ing all the time. Direct­ness is too dan­ger­ous. We’d die.

Ex­tinc­tion would be swift for a species that could sim­ply switch off hunger or pain. In­stead, there must be checks and bal­ances. We’d never get through one full day if any agency could seize and hold con­trol over all the rest. This must be why our agen­cies, in or­der to ex­ploit each other’s skills, have to dis­cover such round­about path­ways. All di­rect con­nec­tions must have been re­moved in the course of our evolu­tion.

This must be one rea­son why we use fan­tasies: to provide the miss­ing paths. You may not be able to make your­self an­gry sim­ply by de­cid­ing to be an­gry, but you can still imag­ine ob­jects or situ­a­tions that make you an­gry. In the sce­nario about Pro­fes­sor Challenger, my agency Work ex­ploited a par­tic­u­lar mem­ory to arouse my Anger’s ten­dency to counter Sleep. This is typ­i­cal of the tricks we use for self-con­trol.

Most of our self-con­trol meth­ods pro­ceed un­con­sciously, but we some­times re­sort to con­scious schemes in which we offer re­wards to our­selves: “If I can get this pro­ject done, I’ll have more time for other things.” How­ever, it is not such a sim­ple thing to be able to bribe your­self. To do it suc­cess­fully, you have to dis­cover which men­tal in­cen­tives will ac­tu­ally work on your­self. This means that you—or rather, your agen­cies—have to learn some­thing about one an­other’s dis­po­si­tions. In this re­spect the schemes we use to in­fluence our­selves don’t seem to differ much from those we use to ex­ploit other peo­ple—and, similarly, they of­ten fail. When we try to in­duce our­selves to work by offer­ing our­selves re­wards, we don’t always keep our bar­gains; we then pro­ceed to raise the price or even de­ceive our­selves, much as one per­son may try to con­ceal an unattrac­tive bar­gain from an­other per­son.

Hu­man self-con­trol is no sim­ple skill, but an ever-grow­ing world of ex­per­tise that reaches into ev­ery­thing we do. Why is it that, in the end, so few of our self-in­cen­tive tricks work well? Be­cause, as we have seen, di­rect­ness is too dan­ger­ous. If self-con­trol were easy to ob­tain, we’d end up ac­com­plish­ing noth­ing at all.

-- Marvin Min­sky, The So­ciety of Mind

Akra­sia is sub­agent disagreement

You might feel that the above dis­cus­sion doesn’t still en­tirely re­solve the origi­nal ques­tion. After all, some­times we do man­age to change even strongly con­di­tioned habits pretty quickly. Why is it some­times hard and some­times eas­ier?

Red­grave et al. (2010) dis­cuss two modes of be­hav­ioral con­trol: goal-di­rected ver­sus ha­bit­ual. Goal-di­rected con­trol is a rel­a­tively slow mode of de­ci­sion-mak­ing, where “ac­tion se­lec­tion is de­ter­mined pri­mar­ily by the rel­a­tive util­ity of pre­dicted out­comes”, whereas ha­bit­ual con­trol in­volves more di­rectly con­di­tioned stim­u­lus-re­sponse be­hav­ior. Which kind of sub­sys­tem is in con­trol is com­pli­cated, and de­pends on a va­ri­ety of fac­tors (the fol­low­ing quote has been ed­ited to re­move foot­notes to refer­ences; see the origi­nal for those):

Ex­per­i­men­tally, sev­eral fac­tors have been shown to de­ter­mine whether the agent (an­i­mal or hu­man) op­er­ates in goal-di­rected or ha­bit­ual mode. The first is over-train­ing: here, ini­tial con­trol is largely goal-di­rected, but with con­sis­tent and re­peated train­ing there is a grad­ual shift to stim­u­lus–re­sponse, ha­bit­ual con­trol. Once habits are es­tab­lished, ha­bit­ual re­spond­ing tends to dom­i­nate, es­pe­cially in stress­ful situ­a­tions in which quick re­ac­tions are re­quired. The sec­ond re­lated fac­tor is task pre­dictabil­ity: in the ex­am­ple of driv­ing, talk­ing on a mo­bile phone is fine so long as ev­ery­thing pro­ceeds pre­dictably. How­ever, if some­thing un­ex­pected oc­curs, such as some­one step­ping out into the road, there is an im­me­di­ate switch from ha­bit­ual to goal-di­rected con­trol. Mak­ing this switch takes time and this is one of the rea­sons why sev­eral coun­tries have banned the use of mo­bile phones while driv­ing. The third fac­tor is the type of re­in­force­ment sched­ule: here, fixed-ra­tio sched­ules pro­mote goal-di­rected con­trol as the out­come is con­tin­gent on re­spond­ing (for ex­am­ple, a food pel­let is de­liv­ered af­ter ev­ery n re­sponses). By con­trast, in­ter­val sched­ules (for ex­am­ple, sched­ules in which the first re­sponse fol­low­ing a speci­fied pe­riod is re­warded) fa­cil­i­tate ha­bit­ual re­spond­ing be­cause con­tin­gen­cies be­tween ac­tion and out­come are vari­able. Fi­nally, stress, of­ten in the form of ur­gency, has a pow­er­ful in­fluence over which mode of con­trol is used. The fast, low com­pu­ta­tional re­quire­ments of stim­u­lus–re­sponse pro­cess­ing en­sure that ha­bit­ual con­trol pre­dom­i­nates when cir­cum­stances de­mand rapid re­ac­tions (for ex­am­ple, pul­ling the wrong way in an emer­gency when driv­ing on the op­po­site side of the road). Chronic stress also favours stim­u­lus–re­sponse, ha­bit­ual con­trol. For ex­am­ple, rats ex­posed to chronic stress be­come, in terms of their be­havi­oural re­sponses, in­sen­si­tive to changes in out­come value and re­sis­tant to changes in ac­tion–out­come con­tin­gency. [...]

Although these fac­tors can be seen as pro­mot­ing one form of in­stru­men­tal con­trol over the other, real-world tasks of­ten have mul­ti­ple com­po­nents that must be performed si­mul­ta­neously or in rapid se­quences. Tak­ing again the ex­am­ple of driv­ing, a driver is re­quired to con­tinue steer­ing while chang­ing gear or brak­ing. Dur­ing the first few driv­ing les­sons, when steer­ing is not yet un­der au­to­matic stim­u­lus–re­sponse con­trol, things can go hor­ribly awry when the new driver at­tempts to change gears. By con­trast, an ex­pe­rienced (that is, ‘over-trained’) driver can steer, brake and change gear au­to­mat­i­cally, while hold­ing a con­ver­sa­tion, with only fleet­ing con­tri­bu­tions from the goal-di­rected con­trol sys­tem. This sug­gests that many skills can be de­con­structed into se­quenced com­bi­na­tions of both goal-di­rected and ha­bit­ual con­trol work­ing in con­cert. [...]

Nev­er­the­less, a fun­da­men­tal prob­lem re­mains: at any point in time, which mode should be al­lowed to con­trol which com­po­nent of a task? Daw et al. have used a com­pu­ta­tional ap­proach to ad­dress this prob­lem. Their anal­y­sis was based on the recog­ni­tion that goal-di­rected re­spond­ing is flex­ible but slow and car­ries com­par­a­tively high com­pu­ta­tional costs as op­posed to the fast but in­flex­ible ha­bit­ual mode. They pro­posed a model in which the rel­a­tive un­cer­tainty of pre­dic­tions made by each con­trol sys­tem is tracked. In any situ­a­tion, the con­trol sys­tem with the most ac­cu­rate pre­dic­tions comes to di­rect be­havi­oural out­put.

Note those last sen­tences: be­sides the sub­sys­tems mak­ing their own pre­dic­tions, there might also be a meta-learn­ing sys­tem keep­ing track of which other sub­sys­tems tend to make the most ac­cu­rate pre­dic­tions in each situ­a­tion, giv­ing ex­tra weight to the bids of the sub­sys­tem which has tended to perform the best in that situ­a­tion. We’ll come back to that in fu­ture posts.

This seems com­pat­i­ble with my ex­pe­rience in that, I feel like it’s pos­si­ble for me to change even en­trenched habits rel­a­tively quickly—as­sum­ing that the new habit re­ally is un­am­bigu­ously bet­ter. In that case, while I might for­get and lapse to the old habit a few times, there’s still a rapid feed­back loop which quickly in­di­cates that the goal-di­rected sys­tem is sim­ply right about the new habit be­ing bet­ter.

Or, the be­hav­ior in ques­tion might be suffi­ciently com­plex and I might be suffi­ciently in­ex­pe­rienced at it, that the goal-di­rected (de­fault plan­ning) sub­agent has always mostly re­mained in con­trol of it. In that case change is again easy, since there is no strong ha­bit­ual pat­tern to over­ride.

In con­trast, in cases where it’s hard to es­tab­lish a new be­hav­ior, there tends to be some kind of gen­uine un­cer­tainty:

The benefits of the old be­hav­ior have been val­i­dated in the form of di­rect ex­pe­rience (e.g. un­healthy food that tastes good, has in fact tasted good each time), whereas the benefits of the new be­hav­ior come from a less trusted in­for­ma­tion source which is harder to val­i­date (e.g. I’ve read sci­en­tific stud­ies about the long-term health risks of this food).

Im­me­di­ate vs. long-term re­wards: the more re­mote the re­wards, the larger the risk that they will for some rea­son never ma­te­ri­al­ize.

High vs. low var­i­ance: some­times when I’m bored, look­ing at my phone pro­duces gen­uinely bet­ter re­sults than let­ting my thoughts wan­der. E.g. I might see an in­ter­est­ing ar­ti­cle or dis­cus­sion, which gives me novel ideas or in­sights that I would not oth­er­wise have had. Ba­si­cally look­ing at my phone usu­ally pro­duces worse re­sults than not look­ing at it—but some­times it also pro­duces much bet­ter ones than the al­ter­na­tive.

Si­tu­a­tional vari­ables af­fect­ing the value of the be­hav­iors: look­ing at my phone can be a way to es­cape un­com­fortable thoughts or sen­sa­tions, for which pur­pose it’s of­ten ex­cel­lent. This then also tends to re­in­force the be­hav­ior of look­ing at the phone when I’m in the same situ­a­tion oth­er­wise, but with­out un­com­fortable sen­sa­tions that I’d like to es­cape.

As the above ex­cerpt noted, the ten­dency to fall back to old habits is ex­ac­er­bated dur­ing times of stress. The au­thors at­tribute it to the need to act quickly in stress­ful situ­a­tions, which seems cor­rect—but I would also em­pha­size the fact that nega­tive emo­tions in gen­eral tend to be signs of some­thing be­ing wrong. E.g. El­dar et al. (2016) note that pos­i­tive or nega­tive moods tend to be re­lated to whether things are go­ing bet­ter or worse than ex­pected, and sug­gest that mood is a com­pu­ta­tional rep­re­sen­ta­tion of mo­men­tum, act­ing as a sort of global up­date to our re­ward ex­pec­ta­tions.

For in­stance, if an an­i­mal finds more fruit than it had been ex­pect­ing, that may in­di­cate that spring is com­ing. A shift to a good mood and be­ing “ir­ra­tionally op­ti­mistic” about find­ing fruit even in places where the an­i­mal hasn’t seen fruit in a while, may ac­tu­ally serve as a ra­tio­nal pre-emp­tive up­date to its ex­pec­ta­tions. In a similar way, things go­ing less well than ex­pected may be a sign of some more gen­eral prob­lem, ne­ces­si­tat­ing fewer ex­plo­ra­tory be­hav­iors and less risk-tak­ing, so fal­ling back into be­hav­iors for which there is a higher cer­tainty of them work­ing out.

So to re­peat the sum­mary that I had in the be­gin­ning: we are ca­pa­ble of chang­ing our be­hav­iors on oc­ca­sions when the mind-sys­tem as a whole puts suffi­ciently high prob­a­bil­ity on the new be­hav­ior be­ing bet­ter, when the new be­hav­ior is not be­ing blocked by a par­tic­u­lar highly weighted sub­agent (such as an IFS pro­tec­tor whose bids get a lot of weight) that puts high prob­a­bil­ity on it be­ing bad, and when we have enough slack in our lives for any new be­hav­iors to be eval­u­ated in the first place. Akra­sia is sub­agent dis­agree­ment about what to do.

I have ear­lier stated that “To un­der­stand study edge cases”, and the model of sub­agents in a sin­gle brain would benefit from study­ing just such edge cases, namely the Dis­so­ci­a­tive Iden­tity Di­sor­der, formerly known as the Mul­ti­ple Per­son­al­ities Di­sor­der, where the sub­agents are plainly visi­ble and their in­ter­ac­tion and in­ter-com­mu­ni­ca­tion is largely bro­ken, mak­ing some of your sup­po­si­tions and con­jec­tures easy to study and test. There are many sites de­voted to this largely mi­s­un­der­stood di­s­or­der, and the es­ti­mated prevalence of it in the gen­eral pop­u­la­tion is some­where around 1-3%, so, odds are, you know some­one with it per­son­ally, with­out re­al­iz­ing it. One good in­tro­duc­tion to the topic is the doc­u­men­tary Many Sides of Jane, which may give you some very ba­sic un­der­stand­ing of how sub­agents might func­tion (and dys­func­tion) in the mind. Akra­sia, fight for con­trol, mu­tual sab­o­tage, var­i­ous sub­agent roles and be­hav­iors are cov­ered in the doc­u­men­tary in an ac­cessible way and could serve as a much needed feed­back for your ideas.

Thanks! I’ve been in­tend­ing to first work out a pre­limi­nary ver­sion of the in­tu­itive model that I’ve got in my head in suffi­cient de­tail to know ex­actly what claims I’m even mak­ing (these posts), and then delve into var­i­ous other sources once I’ve finished writ­ing down my ini­tial rough sketch. (As I’ve found that try­ing to read too broadly about a re­search ques­tion be­fore I’ve got a men­tal skele­ton to “hang the con­tent on” just causes me to for­get most of the stuff that would ac­tu­ally have been rele­vant.) I’ll add your recom­men­da­tions to the list of things to look at.

I con­tinue to ap­pre­ci­ate Kaj’s write­ups of this paradigm. As I men­tioned in a pre­vi­ous cu­ra­tion no­tice, the “in­te­grat­ing sub­agents” paradigm has or­gan­i­cally gained some trac­tion in the ra­tio­nal­sphere but hasn’t been ex­plic­itly writ­ten up in a way that lets peo­ple build off or cri­tique it in de­tail.

Par­tic­u­lar things I liked

The use of the spaghetti tower di­a­grams to illus­trate what may be go­ing on.

The Min­sky in­ter­lude was en­ter­tain­ing and pro­vided a nice change of pace.

The crys­tal­liza­tion of why it’s hard to re­solve par­tic­u­lar kinds of sub­agent dis­agree­ments was use­ful.

One thing that I felt some­what un­cer­tain about was this pas­sage:

So be­sides build­ing spaghetti tow­ers, the sec­ond strat­egy which the mind has evolved to em­ploy for keep­ing its be­hav­ior co­her­ent while piling up pro­tec­tors, is the abil­ity to re-pro­cess mem­o­ries of past painful events.

Some­thing about this felt like a big­ger leap and/​or stronger claim than I’d been ex­pect­ing. Spec­i­fy­ing “the sec­ond strat­egy which the mind has evolved” felt odd. Partly be­cause it seems to im­plic­itly claim there are ex­actly 2 strate­gies, or that they evolved in a spe­cific or­der. Partly be­cause “re-pro­cess mem­o­ries of past painful events” reifies a par­tic­u­lar in­ter­pre­ta­tion that I’d want to ex­am­ine more.

It seems like non­hu­man an­i­mals need to deal with similar kinds of spaghetti code, but I’d be some­what sur­prised if the way they ex­pe­rienced that made most sense to clas­sify as “re-pro­cess­ing mem­o­ries.”

Partly be­cause it seems to im­plic­itly claim there are ex­actly 2 strate­gies, or that they evolved in a spe­cific or­der.

Oh, I didn’t mean to im­ply ei­ther of those. (those are the two strate­gies that I know of, but there could ob­vi­ously be oth­ers as well)

It seems like non­hu­man an­i­mals need to deal with similar kinds of spaghetti code, but I’d be some­what sur­prised if the way they ex­pe­rienced that made most sense to clas­sify as “re-pro­cess­ing mem­o­ries.”

I haven’t thought that much about it, but “re-pro­cess mem­o­ries” feels like… it sort of re­quires lan­guage, and ori­en­ta­tion around nar­ra­tives. Or maybe it’s just that that’s what it feels from the in­side when I do it, I have a hard time imag­in­ing other ways it could be.

When I think about, say, a rab­bit re-pro­cess­ing mem­o­ries, I’m not sure what the qualia of that would be like.

My cur­rent guess, is for non-so­cial re­pro­cess­ing, I’d ex­pect it to look more like tack­ing on ad­di­tional lay­ers of spaghetti code, or sim­ply fad­ing away of un­used spaghetti code.

Say that one time vis­it­ing an open field got you al­most kil­led, so you avoided open fields. But even­tu­ally you found an open field where that where there weren’t preda­tors. And the warn­ing flags that would get thrown when you see a bird or dog (that’d re­in­force the “ahh! open field === preda­tors === run!” loop), would turn out to be false alarms (“oh, that’s not a dog, that’s some non-preda­tor an­i­mal”). So grad­u­ally those loops would fire less of­ten un­til they stop firing.

But that doesn’t feel like “re­pro­cess­ing”, just con­tin­u­ous pro­cess­ing. Re­pro­cess­ing feels like some­thing that re­quires you to have an on­tol­ogy, and you ac­tu­ally re­al­ized you were clas­sify­ing some­thing in­cor­rectly and then ac­tu­ally be­lieve the new re­clas­sifi­ca­tion, which I don’t ex­pect a rab­bit to do. It’s plau­si­ble that smarter birds or apes might but it stills feels off.

I think I’d still ex­pect most prim­i­tive so­cial in­ter­ac­tions to be similarly a mat­ter of re­in­force­ment learn­ing. Maybe at some point a bully or alpha threat­ened you and you were scared of them. But then later when they achieved dom­i­nance (or were driven out of dom­i­nance by a ri­val), and then they stopped bul­ly­ing you, and the “threat! ahh! sub­mit to them!” loop stopped firing as of­ten or as hard, and then even­tu­ally faded.

I’d pre­dict it’s (cur­rently) a uniquely lan­guage-species thing to go “oh, I had made a mis­take” and then re­pro­cess mem­o­ries in a way that changes your in­ter­pre­ta­tion of them. (I’m not that con­fi­dent in this, am mul­ling over what sorts of ex­per­i­ments would dis­t­in­guish this)

Note that mem­ory re-con­soli­da­tion was origi­nally dis­cov­ered in rats, so there at least ap­pears to be pre­limi­nary ev­i­dence that goes against this per­spec­tive.. Although “Me­mory” here refers to some­thing differ­ent than what we nor­mally think about, the pro­cess is ba­si­cally the same.

There’s also been some in­ter­est­ing spec­u­la­tion that what’s ac­tu­ally go­ing on in modal­ities like IFS and Fo­cus­ing is the ex­act same pro­cess. The spec­u­la­tion comes from the fact that the re­quire­ments seem to be the same for both an­i­mal mem­ory re­con­soli­da­tion and ther­a­pies that have fast/​in­stant changes such as co­her­ence ther­apy, IFS, EMDR, etc. I’ve used some of these in­sights to cre­ate novel ther­a­peu­tic modal­ities that seem to anec­do­tally have strong effects by ap­ply­ing the same re­quire­ments in their most dis­til­led form.

I origi­nally learned about the the­ory from the book I linked to, which is a good place to start but also clearly bi­ased be­cause they’re try­ing to make the case that their ther­apy uses mem­ory re­con­soli­da­tion. Wikipe­dia seems to have a use­ful sum­mary.

I haven’t thought that much about it, but “re-pro­cess mem­o­ries” feels like… it sort of re­quires lan­guage, and ori­en­ta­tion around nar­ra­tives.

Hmm. I’m not sure to what ex­tent, if any, I’m us­ing lan­guage when I’m re-pro­cess­ing mem­o­ries? Ex­cept when I’m ex­plic­itly think­ing about what I want to say to some­one, or what I might want to write, I gen­er­ally don’t feel like I think in a lan­guage: I feel like I think in men­tal images and felt senses.

“Nar­ra­tives”, I think, are ba­si­cally im­pres­sions of cause and effect or sim­ple men­tal mod­els, and any an­i­mals that could be de­scribed as “in­tel­li­gent” in any rea­son­able sense do need to have those. “Me­mory re-pro­cess­ing”, would then just be an up­date to the men­tal model that you in­ter­preted the mem­ory in terms of.

I feel like this ex­cerpt from “Don’t Shoot the Dog” could be an ex­am­ple of very short-term mem­ory re­pro­cess­ing:

I once video­taped a beau­tiful Ara­bian mare who was be­ing clicker-trained to prick her ears on com­mand, so as to look alert in the show ring. She clearly knew that a click meant a hand­ful of grain. She clearly knew her ac­tions made her trainer click. And she knew it had some­thing to do with her ears. But what? Hold­ing her head erect, she ro­tated her ears in­di­vi­d­u­ally: one for­ward, one back; then the re­verse; then she flopped both ears to the sides like a rab­bit, some­thing I didn’t know a horse could do on pur­pose. Fi­nally, both ears went for­ward at once. Click! Aha! She had it straight from then on. It was charm­ing, but it was also sad: We don’t usu­ally ask horses to think or to be in­ven­tive, and they seem to like to do it.

This (and other similar anec­dotes in the book) doesn’t look to me like it’s just sim­ple re­in­force­ment learn­ing: rather, it looks to me more like the horse has a men­tal model of the trainer want­ing some­thing, and is then sys­tem­at­i­cally ex­plor­ing what that some­thing might be, un­til it hits on the right al­ter­na­tive. And when it does, there’s a rapid re-in­ter­pre­ta­tion of the mem­ory just a mo­ment ago: from “in this situ­a­tion, my trainer wants me to do some­thing that I don’t know what”, to “in this situ­a­tion, my trainer wants me to prick my ears”.

My mom once hit a dog with her car, and then brought it to a vet. She tried to find the origi­nal owner but couldn’t, and even­tu­ally adopted him for­mally. e was very small, and had been liv­ing in the woods for weeks at least, and had lots of in­juries.

For sev­eral months af­ter bring­ing the dog home, it would sit and stare blankly into the cor­ner of the wall.

Even­tu­ally, my sister started spend­ing hours at a time leav­ing food next to her while ly­ing mo­tion­less. Even­tu­ally, he started eat­ing the food. Even­tu­ally, he started let­ting her touch him (but not other hu­mans). Nowa­days, he ap­pears to be gen­er­ally psy­cholog­i­cally healthy.

This seems a lot more like clas­sic PTSD, and some­thing like ac­tual ther­apy. It still doesn’t seem like it re­quires re­pro­cess­ing of mem­o­ries, al­though it might. I also don’t ex­pect this sort of situ­a­tion hap­pens that of­ten in the wild.

The origi­nal ques­tion which mo­ti­vated this sec­tion was: why are we some­times in­ca­pable of adopt­ing a new habit or aban­don­ing an old one, de­spite know­ing that to be a good idea? And the an­swer is: be­cause we don’t know that such a change would be a good idea. Rather, some sub­sys­tems think that it would be a good idea, but other sub­sys­tems re­main un­con­vinced. Thus the sys­tem’s over­all judg­ment is that the old be­hav­ior should be main­tained.

To me this is the key in­sight for work­ing with sub­agent mod­els. Just to add some­thing about the phe­nomenol­ogy of it, I think many peo­ple strug­gle with this be­cause the con­flicts can feel like failures to up­date on ev­i­dence, which feels like a failure as a re­sult of iden­ti­fy­ing with a par­tic­u­lar sub­agent (see a re­cent ar­ti­cle I posted on akra­sia that makes this same claim and tries to con­vince the reader of it in terms of dual-pro­cess the­ory). Thus this is a case of eas­ily said, difficultly done, but I think just hav­ing this frame is ex­tremely helpful for mak­ing progress be­cause at least you have a way of think­ing of your­self as not fight­ing against your­self but ma­nipu­lat­ing com­plex ma­chin­ery that de­cides what you do.

As a bonus to my de­vel­op­men­tal psy­chol­ogy friends out there, I think this points to the key in­sight for mak­ing the Ke­gan 3 to 4 tran­si­tion (and for my Bud­dhist friends out there, the in­sight that, once grokked, will pro­duce stream en­try), al­though your milage may vary.

No, I meant 3 to 4. What I think of as the 4 to 5 key in­sight builds on this one to say that not only can you think of your­self as a ma­nipu­la­ble com­plex sys­tem/​ma­chin­ery and work with that, it takes a step back and says what you choose to make the sys­tem do is also able to be ma­nipu­lated. That’s of course a nat­u­ral con­se­quence of the first in­sight, but re­ally be­liev­ing it and know­ing how to work with it takes time and con­sti­tutes the tran­si­tion to an­other level be­cause get­ting that in­sight re­quires the abil­ity to in­tu­itively work with an ad­di­tional level of ab­strac­tion in your think­ing.

Fol­low­ing on 5 to 6 is about step­ping back from what you choose to make the sys­tem do and find­ing you can treat as ob­ject/​ma­nipu­late how you choose (prefer­ences; the sys­tem that does the choos­ing). Then 6 to 7 is about get­ting back one more level and see­ing you can ma­nipu­late not just prefer­ences but per­cep­tions since they con­trol the in­puts that pro­duce prefer­ences.

So, just to check, we are still talk­ing about the Ke­gan stage 4 that ac­cord­ing to Ke­gan, 35 % of the adult pop­u­la­tion has at­tained? Are you say­ing that get­ting to stage 4 ac­tu­ally is ac­tu­ally the same as at­tain­ing stream en­try, or just that the work to get to stream en­try in­volves similar in­sights?

So I do think stream en­try is way more com­mon than most peo­ple would think be­cause the thing that is stream en­try is amaz­ing and use­ful but also in­cred­ible nor­mal and I think lots of folks are walk­ing around hav­ing no idea they at­tained it (this re­lies, though, on a very par­si­mo­nious ap­proach to what counts as stream en­try). Whether my iden­ti­fy­ing it with Ke­gan 4 means the same thing as what that study from which that num­ber comes does (which was it­self, as I re­call, not that great a study, and was lead by La­hey) is ques­tion­able since it de­pends on where you choose to draw the bor­ders for each stage (the Sub­ject-Ob­ject In­ter­view man­ual pro­vides one way of do­ing this, and is the method by which the num­ber you men­tion was ob­tained).

My sus­pi­cion is that the num­ber is much lower as I would count it, which I calcu­late as closer to 7% do­ing a Fermi es­ti­mate based on my own ob­ser­va­tions and other ev­i­dence I know of, even though a lot of folks (this is where I would say the 35% num­ber makes sense) are some­where in what I would con­sider the 3.5 to 4 range where they might be able to pass as 4 but have not yet had the im­por­tant in­sight that would put them fully into the 4 stage.

So all those caveats aside, yes, I con­sider stream en­try to be point­ing at the same thing as Ke­gan 4.

Hav­ing stud­ied and achieved stream en­try (in the Vi­pas­sana tra­di­tion), I very much doubt many peo­ple have stum­bled into it. Although for clar­ity, what % of the pop­u­la­tion are we tak­ing about? Quick fermi: from what I’ve seen in the spiritual com­mu­nity about 1/​1000 have achieved stream en­try spon­ta­neously /​ eas­ily. Out of my bub­ble, I’d say 1⁄100 is spiritu­ally in­clined. Then I’d add an­other fac­tor of at least 1⁄100 to con­trol for my bub­ble be­ing the Bay Area.

The rea­son why I doubt it is be­cause most peo­ple will tell you (and have writ­ten) that it has take them (and peo­ple they know) many years and in­tense prac­tice to get it.

To be clear I am ap­pro­pri­at­ing stream en­try here the same way In­gram has, which has much more in­clu­sive (be­cause they are much smaller and more spe­cific) crite­ria than what is tra­di­tional. I agree with your point about A&P, and maybe I am typ­i­cal mind­ing here be­cause I made it through to what matches with once re­turner with­out for­mal prac­tice (al­though I did en­gage in a lot of prac­tices that were in­for­mal that drug me along the same way).

Are In­gram’s crite­ria par­tic­u­larly in­clu­sive? He has talked a bunch about most peo­ple who think them­selves be­ing stream en­ter­ers not ac­tu­ally be­ing that, e.g.:

The A&P is so com­monly mis­taken for things like Equa­nim­ity, higher jhanas (third and fourth, as well as form­less realms), and Stream En­try, or even some higher path, even on its first oc­cur­rence, that I now have to ac­tively check my­self when re­spond­ing to emails and fo­rum posts so that I don’t au­to­mat­i­cally as­sume that this is what has gone on, as it is prob­a­bly 50:1 that some­one claiming stream en­try has ac­tu­ally just crossed the A&P. [...]

Over­call­ing at­tain­ments has be­come some­thing of an en­demic dis­ease in those ex­posed to the maps. It an­noys the heck out of dharma teach­ers who feel some re­spon­si­bil­ity to keep prac­ti­tion­ers on the rails and in the realms of re­al­ity.

Right, if you’ve not had the later ex­pe­riences (equa­nim­ity, fruition lead­ing to at­tain­ment) you’re likely to mis­take oth­ers for them, es­pe­cially if you have a very squishy model of en­light­en­ment and es­pe­cially es­pe­cially if you are try­ing hard to at­tain the path. My com­ment was more a refer­ence to the fact that In­gram seems to view stream en­try as a very pre­cise thing rel­a­tive to how it is talked about in ther­avada, which is why it seems pos­si­ble that some of the above dis­agree­ment on num­bers might be due to a differ­ent sense of what qual­ifies as stream en­try.

I have my own fairly pre­cise way of de­scribing it, which is that you de­velop the ca­pac­ity to always rea­son at Com­mons’ MHC level 13 (this is placed about halfway along the 4 to 5 tran­si­tion in the nor­mal Ke­gan model by Wilburl but I con­sider that to be an in­fla­tion of what’s re­ally core 4), i.e. you S1 rea­son that way, de­liber­a­tive S2 rea­son­ing at that level is go­ing to hap­pen first but doesn’t count. At least as of right now I think that, but I could prob­a­bly be con­vinced to wig­gle the lo­ca­tion a lit­tle bit be­cause I’m try­ing to pro­ject my in­ter­nal model of it back out to other ex­ist­ing mod­els that I can refer­ence.

jump up one more level to kegan 5 (<1% of the pop­u­la­tion) and it jives much more closely with sur­vey es­ti­mates of .5% of the pop­u­la­tion hav­ing some sort of per­ma­nent at­tain­ment (the sur­vey does not use the ther­avadan map)

Have you—or any­one, re­ally—put much thought into the im­pli­ca­tions of these ideas to AI al­ign­ment?

If it’s true that mod­el­ing hu­mans at the level of con­sti­tu­tive sub­agents ren­ders a more ac­cu­rate de­scrip­tion of hu­man be­hav­ior, then any true solu­tion to the al­ign­ment prob­lem will need to re­spect this in­ter­nal in­co­her­ence in hu­mans.

This is po­ten­tially a very pos­i­tive de­vel­op­ment, I think, be­cause it sug­gests that a hu­man can be mod­eled as a col­lec­tion of rel­a­tively sim­ple sub­agent util­ity func­tions, which in­ter­act and com­pete in com­plex but pre­dictable ways. This sounds closer to a gears-level por­trayal of what is hap­pen­ing in­side a hu­man, in con­trast to de­scrip­tions of hu­mans as hav­ing a sin­gle con­voluted and im­pos­si­ble-to-pin-down util­ity func­tion.

I don’t know if you’re at all fa­mil­iar with Mark Lipp­man’s Fold­ing ma­te­rial and his on­tol­ogy for men­tal phe­nomenol­ogy. My at­tempt to sum­ma­rize his frame­work of men­tal phe­nom­ena is as fol­lows: there are be­lief-like ob­jects (ex­pec­ta­tions, tacit or ex­plicit, com­plex or sim­ple), goal-like ob­jects (de­sir­able states or set­tings or con­texts), af­for­dances (con­text-ac­ti­vated rep­re­sen­ta­tions of the cur­rent po­ten­tial ac­tion space) and in­ten­tion-like ob­jects (plans co­or­di­nat­ing im­me­di­ate felt in­ten­tions, via af­for­dances, to­ward goal-states). All cog­ni­tion is “gen­er­ated” by the ac­tions and in­ter­ac­tions of these fun­da­men­tal units, which I in­fer must be some­thing like neu­rolog­i­cally fun­da­men­tal. Fish and maybe even worms prob­a­bly have some­thing like be­liefs, goals, af­for­dances and in­ten­tions. Ours are just big­ger, more lay­ered, more nested and more in­ter­con­nected.

The rea­son I bring this up is that Fold­ing was a bit of a kick in the head to my view on sub­agents. In­stead of see­ing sub­agents as be­ing fun­da­men­tal, I now see sub­agents as ex­pres­sions of la­tent goal-like and be­lief-like ob­jects, and the brain is im­ple­ment­ing some kind of pas­sive pro­gram that pur­sues goals and avoids ex­pec­ta­tions of suffer­ing, even if you’re not aware you have these goals or these ex­pec­ta­tions. In other words, the sense of there be­ing a sub­agent is your brain run­ning a back­ground pro­gram that ac­ti­vates and acts upon the im­pli­ca­tions of these more fun­da­men­tal yet hid­den goals/​be­liefs.

None of this is at all in con­tra­dic­tion to any­thing in your Se­quence. It’s more like a slightly differ­ent fram­ing, where a “Pro­tec­tor Subagent” is re­duced to an ex­pres­sion of a be­lief-like ob­ject via a self-pro­tec­tive back­ground pro­cess. It all adds up to the same thing, pretty much, but it might be more gears-level. Or maybe not.

I definitely have some thoughts on the AI al­ign­ment im­pli­ca­tions, yes. Still work­ing out ex­actly what they are. :-) A few frag­mented thoughts, here’s what I wrote in the ini­tial post of the se­quence:

In a re­cent post, Wei Dai men­tioned that “the only ap­par­ent util­ity func­tion we have seems to be defined over an on­tol­ogy very differ­ent from the fun­da­men­tal on­tol­ogy of the uni­verse”. I agree, and I think it’s worth em­pha­siz­ing that the differ­ence is not just “we tend to think in terms of clas­si­cal physics but ac­tu­ally the uni­verse runs on par­ti­cle physics”. Un­less they’ve been speci­fi­cally trained to do so, peo­ple don’t usu­ally think of their val­ues in terms of clas­si­cal physics, ei­ther. That’s some­thing that’s learned on topof the de­fault on­tol­ogy.

The on­tol­ogy that our val­ues are defined over, I think, shat­ters into a thou­sand shard­sof dis­parate mod­els held by differ­ent sub­agents with differ­ent pri­ori­ties. It is mostly some­thing like “pre­dic­tions of re­ceiv­ing sen­sory data that has been pre­vi­ously clas­sified as good or bad, the pre­dic­tions formed on the ba­sis of do­ing pat­tern match­ing to past streams of sen­sory data”. Things like e.g. in­tu­itive physics simu­la­tors feed into these pre­dic­tions, but I sus­pect that even in­tu­itive physics is not the on­tol­ogy over which our val­ues are defined; clusters of sen­sory ex­pe­riences are that on­tol­ogy, with in­tu­itive physics be­ing a tool for pre­dict­ing how to get those ex­pe­riences. This is the same sense in which you might e.g. use your knowl­edge of so­cial dy­nam­ics to figure out how to get into situ­a­tions which have made you feel loved in the past, but your knowl­edge of so­cial dy­nam­ics is not the same thing as the ex­pe­rience of be­ing loved.

1) My brain is com­posed of var­i­ous sub­agents, each of which has differ­ent pri­ori­ties or in­ter­ests. One way of de­scribing them would be to say that there are con­se­quen­tial­ist, de­on­tol­o­gist, virtue eth­i­cal, and ego­ist sub­agents, though that too seems po­ten­tially mis­lead­ing. Subagents prob­a­bly don’t re­ally care about eth­i­cal the­o­ries di­rectly, rather they care about sen­sory in­puts and ex­pe­riences of emo­tional tone. In any case, they have differ­ing in­ter­ests and will of­ten dis­agree about what to do. The _per­sonal_ pur­pose of ethics is to come up with the kinds of prin­ci­ples that all sub­agents can broadly agree upon as serv­ing all of their in­ter­ests, to act as a guide for per­sonal de­ci­sion-mak­ing.

(There’s an ob­vi­ous con­nec­tion from here to moral par­li­a­ment views of ethics, but in those views the mem­bers of the par­li­a­ment are of­ten con­sid­ered to be var­i­ous eth­i­cal the­o­ries—and like I men­tioned, I do not think that sub­agents re­ally care about eth­i­cal the­o­ries di­rectly. Also, the de­ci­sion-mak­ing pro­ce­dures within a hu­man brain differ sub­stan­tially from those of a par­li­a­ment. E.g. some sub­agents will get more vot­ing power on times when the per­son is afraid or sex­u­ally aroused, and there need to be com­monly-agreed upon prin­ci­ples which pre­vent tem­porar­ily-pow­er­ful agents from us­ing their power to take ac­tions which would then be im­me­di­ately re­versed when the bal­ance of power shifted back.)

2) Be­sides dis­agree­ments be­tween sub­agents within the same mind, there are also dis­agree­ments among peo­ple in a so­ciety. Here the pur­pose of ethics is again to act as pro­vid­ing com­mon prin­ci­ples which peo­ple can agree to abide by; mur­der is wrong be­cause the over­whelming ma­jor­ity of peo­ple agree that they would pre­fer to live in a so­ciety where no­body gets mur­dered.

You men­tion that per­son-af­fect­ing views are in­tractable as a solu­tion to gen­er­at­ing bet­ter­ness-rank­ings be­tween wor­lds. But part of what I was try­ing to ges­ture at when I said that the whole ap­proach may be flawed, is that gen­er­at­ing bet­ter­ness-rank­ings be­tween wor­lds does not seem like a par­tic­u­larly use­ful goal to have.

On my view, ethics is some­thing like an on­go­ing pro­cess of ne­go­ti­a­tion about what to do, as ap­plied to par­tic­u­lar prob­lems: try­ing to de­cide which kind of world is bet­ter in gen­eral and in the ab­stract, seems to me like try­ing to de­cide whether a ham­mer or a saw is bet­ter in gen­eral. Nei­ther is: it de­pends on what ex­actly is the prob­lem that you are try­ing to de­cide on and its con­text. Differ­ent con­texts and situ­a­tions will elicit differ­ent views from differ­ent peo­ple/​sub­agents, so the im­plicit judg­ment of what kind of a world is bet­ter than an­other may differ based on which con­tex­tual fea­tures of any given de­ci­sion hap­pen to ac­ti­vate which par­tic­u­lar sub­agents/​peo­ple.

> I’ve in­creas­ingly come to think that liv­ing one’s life ac­cord­ing to the judg­ments of any for­mal eth­i­cal sys­tem gets it back­wards—any such sys­tem is just a crude at­tempt of for­mal­iz­ing our var­i­ous in­tu­itions and de­sires, and they’re mostly use­less in de­ter­min­ing what we should ac­tu­ally do. To the ex­tent that the things that I do re­sem­ble the recom­men­da­tions of util­i­tar­i­anism (say), it’s be­cause my nat­u­ral de­sires hap­pen to al­ign with util­i­tar­i­anism’s recom­mended courses of ac­tion, and if I say that I lean to­wards util­i­tar­i­anism, it just means that util­i­tar­i­anism pro­duces the least recom­men­da­tions that would con­flict with what I would want to do any­way.

Similarly, I can en­dorse the claim that “we should some­times act as if the per­son-af­fect­ing view was true”, and I can men­tion in con­ver­sa­tion that I sup­port a per­son-af­fect­ing view. When I do so, I’m treat­ing it as a short­hand for some­thing like “the judg­ments gen­er­ated by my in­ter­nal sub­agents some­times pro­duce similar judg­ments as the prin­ci­ple called ‘per­son-af­fect­ing view’ does, and I think that adopt­ing it as a so­cietal prin­ci­ple in some situ­a­tions would cause good re­sults (in terms of be­ing some­thing that would pro­duce the kinds of be­hav­ioral crite­ria that both my and most peo­ple’s sub­agents could con­sider to pro­duce good out­comes)”.

Also a bunch of other thoughts which par­tially con­tra­dict the above com­ments, and are too time-con­sum­ing to write in this mar­gin. :)

Re: Fold­ing, I started read­ing the doc­u­ment and found the be­gin­ning valuable, but didn’t get around read­ing it to the end. I’ll need to read the rest, thanks for the recom­men­da­tion. I definitely agree that this

In­stead of see­ing sub­agents as be­ing fun­da­men­tal, I now see sub­agents as ex­pres­sions of la­tent goal-like and be­lief-like ob­jects, and the brain is im­ple­ment­ing some kind of pas­sive pro­gram that pur­sues goals and avoids ex­pec­ta­tions of suffer­ing, even if you’re not aware you have these goals or these ex­pec­ta­tions. In other words, the sense of there be­ing a sub­agent is your brain run­ning a back­ground pro­gram that ac­ti­vates and acts upon the im­pli­ca­tions of these more fun­da­men­tal yet hid­den goals/​be­liefs.

sounds very plau­si­ble. I think I was already hint­ing at some­thing like that in this post, when I sug­gested that es­sen­tially the same sub­sys­tem (habit-based learn­ing) could con­tain com­pet­ing neu­ral pat­terns cor­re­spond­ing to differ­ent habits, and treated those as sub­agents. Similarly, a lot of “sub­agents” could emerge from es­sen­tially the same kind of pro­gram act­ing on con­tra­dic­tory be­liefs or goals… but I don’t know how I would em­piri­cally test one pos­si­bil­ity over the other (un­less read­ing the Fold­ing doc­u­ment gives me ideas), so I’ll just leave that part of the model un­defined.

Be­sides the sub­sys­tems mak­ing their own pre­dic­tions, there might also be a meta-learn­ing sys­tem keep­ing track of which other sub­sys­tems tend to make the most ac­cu­rate pre­dic­tions in each situ­a­tion, giv­ing ex­tra weight to the bids of the sub­sys­tem which has tended to perform the best in that situ­a­tion.

Note: Due to a bug, if you were sub­scribed to email no­tifi­ca­tions for cu­rated posts, the cu­ra­tion email for this post came from Align­ment Fo­rum in­stead of LessWrong. If you’re view­ing this post on AF, to see the com­ments, view it on LessWrong in­stead. (This is a LessWrong post, not an AF post, but the two sites share a database and have one-di­rec­tional auto-cross­post­ing from AF to LW.)