The Prediction Problem: A Variant on Newcomb’s

When try­ing to un­der­stand a prob­lem, it is of­ten helpful to re­duce it to some­thing sim­pler. Even if the prob­lem seems as sim­ple as pos­si­ble, it may still be able to be sim­plified fur­ther. This post will de­mys­tify New­comb’s Prob­lem by re­duc­ing it to the Pre­dic­tion Prob­lem, which works as fol­lows:

Step 1: An AI called Alpha scans you and makes a pre­dic­tion about what ac­tion you are go­ing to take. A thou­sand peo­ple have already played this game and Alpha was cor­rect ev­ery time.

Step 2: You pull ei­ther the left or the right lever. Nei­ther does any­thing.

Step 3: Alpha gives you $1 mil­lion if it pre­dicted that you’d pull the left lever, oth­er­wise it gives you $1000.

The em­piri­cal an­swer seems to be that you ought to pull the left lever. On the other hand, some strictly fol­low­ing Causal De­ci­sion The­ory ought to be in­differ­ent to the two solu­tions. After all, the rea­son­ing goes, Alpha has already made their pre­dic­tion noth­ing you do now can change this.

At this point some­one who thinks they are smarter than they ac­tu­ally are might de­cide that pul­ling the left lever may have an up­side, but doesn’t have a down­side, so you may as well pull it and then go about their lives with­out think­ing about this prob­lem any more. This is the way to win if you were ac­tu­ally thrust into such a situ­a­tion, but a los­ing strat­egy if your goal is to ac­tu­ally un­der­stand de­ci­sion the­ory. I’ve ar­gued be­fore that prac­tise prob­lems don’t need to be re­al­is­tic, it’s also fine if they are triv­ial. If we can an­swer why ex­actly you ought to pull the left lever, then we should also be able to jus­tify one-box­ing for New­comb’s Prob­lem and also Time­less De­ci­sion The­ory.

“De­ci­sion The­ory” is misleading

The name “de­ci­sion the­ory” seems to sug­gest a fo­cus on mak­ing an op­ti­mal de­ci­sion, which then causes the op­ti­mal out­come. For the Pre­dic­tion Prob­lem, the ac­tual de­ci­sion does ab­solutely noth­ing in and of it­self, while if I’m cor­rect, the per­son who pulls the left lever gains 1 mil­lion ex­tra dol­lars. How­ever this is purely as a re­sult of the kind of agent that they are; all the agent has to do in or­der to trig­ger this is ex­ist. The de­ci­sion doesn’t ac­tu­ally have any im­pact apart from the fact that it would be im­pos­si­ble to be the kind of agent that always pulls the left lever with­out ac­tu­ally pul­ling the left lever.

The ques­tion then arises: do you (roughly) wish to be the kind of agent that gets good out­comes or the kind of agent that makes good de­ci­sions? I need to clar­ify this be­fore it can be an­swered. “Good out­comes” is eval­u­ated by the ex­pected util­ity that an agent re­ceives, with the coun­ter­fac­tual be­ing that an agent with a differ­ent de­ci­sion mak­ing ap­para­tus en­coun­tered this sce­nario in­stead. To avoid con­fu­sion, we’ll re­fer to these coun­ter­fac­tu­als as time­less-coun­ter­fac­tu­als and the out­comes as holis­tic-out­comes and the op­ti­mal such coun­ter­fac­tual as holis­ti­cally-op­ti­mal. I’m us­ing “good de­ci­sions” to re­fer to the ca­sual im­pact of a de­ci­sion on the out­come. The coun­ter­fac­tu­als are the agent “mag­i­cally” mak­ing a differ­ent de­ci­sion at that point, with ev­ery­thing else that hap­pened be­fore be­ing held static, even the de­ci­sion mak­ing fac­ul­ties of the agent it­self. To avoid con­fu­sion, we’ll re­fer to these coun­ter­fac­tu­als as point-coun­ter­fac­tu­als and the de­ci­sions over these as point-de­ci­sions and the op­ti­mal such coun­ter­fac­tual as point-op­ti­mal.

I will ar­gue that we should choose good out­comes as the method by which this is ob­tained is ir­rele­vant. In fact, I would al­most sug­gest us­ing the term Win­ning The­ory in­stead of De­ci­sion The­ory. Eliezer made a similar case very el­e­gantly in New­comb’s Prob­lem and Re­gret of Ra­tion­al­ity, but this post aims to iden­tify the ex­act flaw in the two-box­ing ar­gu­ment. Since one-box­ing ob­tains the holis­ti­cally-op­ti­mal out­come, but two-box­ing pro­duces the point-op­ti­mal de­ci­sion, I need to show why the former de­serves prefer­ence.

At this point, we can make two in­ter­est­ing ob­ser­va­tions. Firstly, two-box­ers gain $1000 ex­tra dol­lars as a re­sult of their de­ci­sion, but miss out on $1 mil­lion dol­lars as a re­sult of who they are. Why are these two figures ac­counted for differ­ently? Se­condly, both of these ap­proaches are self-af­firm­ing af­ter the pre­dic­tion. At this point, the point-op­ti­mal de­ci­sion is to choose point-op­ti­mal­ity and the holis­ti­cally-op­ti­mal de­ci­sion be­ing to choose holis­tic-op­ti­mal­ity. This might ap­pear a stale­mate, but we can re­solve this con­flict by in­ves­ti­gat­ing why point-op­ti­mal­ity is usu­ally con­sid­ered im­por­tant.

Why do we care about point-op­ti­mal­ity any­way?

Both the Pre­dic­tion Prob­lem and New­comb’s Prob­lem as­sume that agents don’t have liber­tar­ian free will; that is, the abil­ity to make de­ci­sions un­con­strained by the past. For if they did, Alpha wouldn’t be able to perfectly or near perfectly pre­dict the agent’s fu­ture ac­tions from their past state with­out some kind of back­wards cau­sa­tion which would then make one-box­ing the ob­vi­ous choice. So we can as­sume a de­ter­minis­tic or prob­a­bil­is­ti­cally de­ter­minis­tic uni­verse. For sim­plic­ity, we’ll just work with the former and as­sume that agents are de­ter­minis­tic.

The ab­sence of free will is im­por­tant be­cause it af­fects what ex­actly we mean by mak­ing a de­ci­sion. Here’s what a de­ci­sion is not: choos­ing from a va­ri­ety of op­tions all of which were (in the strictest sense) pos­si­ble at the time given the past. Tech­ni­cally, only one choice was pos­si­ble and that was the choice taken. The other choices only be­come strictly pos­si­ble when we imag­ine the agent counter-fac­tu­ally hav­ing a differ­ent brain state.

The fol­low­ing ex­am­ple may help: Sup­pose a stu­dent has a test on Fri­day. Rea­son­ing that de­ter­minism means that the out­come is already fixed, the stu­dent figures that they may as well not bother to study. What’s wrong with this rea­son­ing?

The an­swer is that the out­come is only known to be fixed be­cause whether or not the stu­dent stud­ies is fixed. When mak­ing a de­ci­sion, you don’t loop over all of the strictly pos­si­ble op­tions, be­cause there is only one of them and that is what­ever you ac­tu­ally choose. In­stead, you loop over a set of coun­ter­fac­tu­als (and the one ac­tual fac­tual, though you don’t know it at the time). While the out­come of the test is fixed in re­al­ity, the coun­ter­fac­tu­als can have a differ­ent out­come as they aren’t re­al­ity.

So why do we ac­tu­ally care about the point-op­ti­mal de­ci­sion if it can’t ac­tu­ally strictly change what you choose as this was fixed from the be­gin­ning of time? Well even if you can’t strictly change your choice, you can still be for­tu­nate enough to be an agent that always was go­ing to try to calcu­late the best point-de­ci­sion and then carry it out (this is effec­tive for stan­dard de­ci­sion the­ory prob­lems). If such an agent can’t figure out the best point-de­ci­sion it­self, it would choose to pay a triv­ial amount (say 1 cent) to an or­a­cle find out this out, as­sum­ing that the differ­ences in the pay­offs aren’t similarly triv­ial. And over a wide class of prob­lems, so long as this pro­cess is con­ducted prop­erly, the agent ends up in the world with the high­est ex­pected util­ity.

So what about the Pre­dic­tion Prob­lem?

The pro­cess de­scribed for point-op­ti­mal­ity as­sumes that out­comes are purely a re­sult of ac­tions. But for the Pre­dic­tion Prob­lem, the out­come isn’t de­pen­dent on ac­tions at all, but in­stead on the in­ter­nal al­gorithm at time of pre­dic­tion. Even if our de­ci­sion doesn’t cause the past state that is analysed by Alpha to cre­ate its pre­dic­tion, these are clearly linked in some kind of man­ner. But point-op­ti­mal­ity as­sumes out­comes are fixed in­de­pen­dently of our de­ci­sion al­gorithm. The out­comes are fixed for a given agent, but it is empty to say fixed for a given agent what­ever its choice as each agent can only make one choice. So al­low­ing any mean­ingful vari­a­tion over choices re­quires al­low­ing vari­a­tion over agents in which case we can no longer as­sume that the out­comes are fixed. At this point, what­ever the spe­cific re­la­tion­ship, we are out­side the in­tended scope of point-op­ti­mal de­ci­sion mak­ing.

Tak­ing this even fur­ther, ask­ing, “What choice ought I make?” is mis­lead­ing be­cause given who you are, you can only make a sin­gle choice. In­deed, it seems strange that we care about point-op­ti­mal­ity, even in reg­u­lar de­ci­sion the­ory prob­lems, given that point-coun­ter­fac­tu­als in­di­cate im­pos­si­ble situ­a­tions. An agent can­not be such that it would choose X, but then mag­i­cally choose Y in­stead, with no ca­sual rea­son. In fact, I’d sug­gest that only rea­son why we care about point-coun­ter­fac­tu­als is that they are equiv­a­lent to the ac­tu­ally con­sis­tent time­less-coun­ter­fac­tu­als in nor­mal de­ci­sion the­ory prob­lems. After all, in most de­ci­sion the­ory prob­lems, we can al­ter an agent to carry out a par­tic­u­lar ac­tion at a par­tic­u­lar point of time with­out af­fect­ing any other el­e­ments of the prob­lem.

Get­ting more con­crete, for the ver­sion of the Pre­dic­tion Prob­lem where we as­sume Alpha is perfect, you sim­ply can­not pull the right lever and have Alpha pre­dict the left lever. This coun­ter­fac­tual doesn’t cor­re­spond to any­thing real, let alone any­thing that we care about. In­stead, it makes much more sense to con­sider the time­less-coun­ter­fac­tu­als, which are the most log­i­cal way of pro­duc­ing con­sis­tent coun­ter­fac­tu­als from point-coun­ter­fac­tual. In this ex­am­ple, the time­less-coun­ter­fac­tu­als are pul­ling the left lever and hav­ing Alpha pre­dict­ing left; or pul­ling the right lever and hav­ing Alpha pre­dict­ing right.

In the prob­a­bil­is­tic ver­sion where Alpha cor­rectly iden­ti­fies you pul­ling the right lever 90% of the time and the left lever 100% of the time, we will imag­ine that a ten-sided dice is rol­led and Alpha cor­rectly iden­ti­fies you pul­ling the right lever as long as the dice doesn’t show a ten. You sim­ply can­not pull the right lever with the dice show­ing a num­ber that is not ten and have Alpha pre­dict you will pull the left lever. Similarly, you can­not pull the right level with the dice show­ing a ten and have Alpha pre­dict the cor­rect re­sult. The point coun­ter­fac­tu­als al­low this, but these situ­a­tions are in­con­sis­tent. In con­trast, the time­less coun­ter­fac­tu­als in­sist on con­sis­tency be­tween the dice-roll and your de­ci­sion, so ac­tu­ally cor­re­spond to some­thing mean­ingful.

If you are per­suaded to re­ject point-op­ti­mal­ity, I would sug­gest a switch to a met­ric built upon a no­tion of good out­comes in­stead for two rea­sons. Firstly, point-op­ti­mal­ity is ul­ti­mately mo­ti­vated by the fact that it pro­vides good out­comes within a par­tic­u­lar scope. Se­condly, both the one-box­ers and two-box­ers see their strat­egy as pro­duc­ing bet­ter out­comes.

In or­der to make this work, we just need to for­mu­late good out­comes in a way that ac­counts for agents be­ing pre­des­tined to perform strate­gies as op­posed to agents ex­er­cis­ing some kind of liber­tar­ian free will. The nat­u­ral way to do this is to work with holis­tic-coun­ter­fac­tu­als in­stead of point-coun­ter­fac­tu­als.

But doesn’t this re­quire back­wards cau­sa­tion?

How can a de­ci­sion af­fect a pre­dic­tion at an ear­lier time? Surely this should be im­pos­si­ble. If hu­man adopts the time­less ap­proach in the mo­ment it’s be­cause ei­ther:

a) They were fooled into it by rea­son­ing that sounded con­vinc­ing, but was ac­tu­ally flawed

b) They re­al­ised that the time­less ap­proach best achieves their in­trin­sic ob­jec­tives, even ac­count­ing for the past be­ing fixed. For ex­am­ple, they value what­ever cur­rency is offered in the ex­per­i­ment and they ul­ti­mately value achiev­ing the best out­come in these terms, then they re­al­ise that Time­less De­ci­sion The­ory de­liv­ers this.

Re­mem­ber­ing that agent’s “choice” of what de­ci­sion the­ory to adopt is already pre­des­tined, even if the agent only figured this out when faced with this situ­a­tion. You don’t re­ally make a de­ci­sion in the sense we usu­ally think about it; in­stead you are just fol­low­ing in­evitable pro­cess. For an in­di­vi­d­ual who ul­ti­mately val­ues out­comes as per b), the only ques­tion is whether the in­di­vi­d­ual will carry out this pro­cess of pro­duc­ing a de­ci­sion the­ory that matches their in­trin­sic ob­jec­tives cor­rectly or in­cor­rectly. An in­di­vi­d­ual who adopts the time­less ap­proach wins be­cause Alpha knew that they were go­ing to carry out this pro­cess cor­rectly, while an in­di­vi­d­ual who adopts point-op­ti­mal­ity loses be­cause Alpha knew they were always go­ing to make a mis­take in this pro­cess.

The two-box­ers are right that you can only be as­sured of gain­ing the mil­lion if you are pre-com­mit­ted in some kind of man­ner, al­though they don’t re­al­ise that de­ter­minism means that we are all pre-com­mit­ted in a gen­eral sense to what­ever ac­tion we end up tak­ing. That is, in ad­di­tion to ex­plicit pre-com­mit­ments, we can also talk about im­plicit pre-com­mit­ments. An in­evitable flaw in rea­son­ing as per a) is equiv­a­lent to pre-com­mit­ment, al­though from the in­side it will feel as though you could have avoided it. So are unar­tic­u­lated in­trin­sic ob­jec­tives that are only iden­ti­fied and clar­ified at the point of the de­ci­sion as per b); clar­ify­ing these ob­jec­tives doesn’t cause you to be­come pre-com­mit­ted, it merely re­veals what you were pre-com­mit­ted to. Of course, this only works with su­per-hu­man pre­dic­tors. Nor­mal peo­ple can’t be re­lied upon to pick up on these deep as­pects of per­son­al­ity and so re­quire more ex­plicit pre-com­mit­ment in or­der to be con­vinced (I ex­panded this into a full ar­ti­cle here).

What about agents that are al­most pre-com­mit­ted to a par­tic­u­lar ac­tion? Sup­pose 9⁄10 times you fol­low the time­less ap­proach, but 1⁄10 you de­cided to do the op­po­site. More speci­fi­cally, we’ll as­sume that when a ten-sided dice roll shows a 10, you ex­pe­rience a mood that con­vinces you to take the later course of ac­tion. Since we’re as­sum­ing de­ter­minism, Alpha will be aware of this be­fore they make their pre­dic­tion. When the dice shows a ten, you feel re­ally strongly that you have ex­er­cised free will as you would have acted differ­ently in the coun­ter­fac­tual where your mood was slightly differ­ent. How­ever, given that the dice did show a ten, your ac­tion was in­evitable. Again, you’ve dis­cov­ered your de­ci­sion rather than made it. For ex­am­ple, if you de­cide to be ir­ra­tional, the pre­dic­tor knew that you were in that mood at the start, even if you did not.

Or go­ing fur­ther, a com­pletely ra­tio­nal agent that wants to end up in the world with the most dol­lars doesn’t make that de­ci­sion in the Pre­dic­tion Prob­lem so that any­thing hap­pens, it makes that de­ci­sion be­cause it can make no other. If you make an­other de­ci­sion, you ei­ther have differ­ent ob­jec­tives or you have an er­ror in your rea­son­ing, so you weren’t the agent that you thought you were.

When you learn ar­gu­ments in favour of one side or an­other, it changes what your choice would have been in the coun­ter­fac­tual where you were forced to make a de­ci­sion just be­fore you made that re­al­i­sa­tion, but what hap­pens in re­al­ity is fixed. It doesn’t change the past ei­ther, but it does change your es­ti­ma­tion of what Alpha would have pre­dicted. When you lock in your choice, you’ve fi­nal­ised your es­ti­mate of the past, but this looks a lot like chang­ing the past, es­pe­cially if you had switched to favour­ing a differ­ent de­ci­sion at the last minute. Ad­di­tion­ally, when you lock in your choice it isn’t like the fu­ture was just locked in at that time as it were already fixed. Ac­tu­ally, mak­ing a de­ci­sion can be seen as a pro­cess that makes the pre­sent line up with the past pre­dic­tions and again this can eas­ily be mis­taken for chang­ing the past.

But fur­ther than this, I want to challenge the ques­tion: “How does my de­ci­sion af­fect a past pre­dic­tion?” Just like, “What choice ought I make?”, if we con­tem­plate a fixed in­di­vi­d­ual, then we must fix the de­ci­sion as well. If in­stead we con­sider a va­ri­ety of in­di­vi­d­u­als tak­ing a va­ri­ety of ac­tions, then the ques­tion be­comes, “How does a in­di­vi­d­ual/​de­ci­sion pair af­fect a pre­dic­tion prior to the de­ci­sion?”, which isn’t ex­actly a para­dox.

Thank you for post­ing this! I’m post­ing here for the first time, al­though I’ve spent a sig­nifi­cant amount of time read­ing the Se­quences already (I just finished See­ing with Fresh Eyes). The com­ments on de­ter­minism cleared up a few un­cer­tain­ties about New­comb’s Prob­lem for me.

When I have ex­plained the prob­lem to oth­ers, I have usu­ally used the phras­ing where Alpha is sig­nifi­cantly bet­ter than av­er­age at pre­dict­ing what you will choose, but not perfect. (This helps re­duce in­cre­dulity on the part of the av­er­age listener.) I have also used the as­sump­tion that Alpha does this by ex­am­in­ing your men­tal state, rather than by draw­ing causal ar­rows back­ward in time. One of my friends sug­gested pre­com­mit­ting to a strat­egy that one-boxes 51% of the time and two-boxes 49% of the time, cho­sen at the time you re­ceive the boxes by some source that is agreed to be ran­dom such as rol­ling two d10′s. His logic is that Alpha would prob­a­bly read your mind ac­cu­rately, and that if he did, he would de­cide based on your men­tal state to put the money in the box, since you are more likely to one-box than not.

This seemed like a very good strat­egy (as­sum­ing the logic and the model of the prob­lem are cor­rect, which is far from cer­tain), and I won­dered why this strat­egy wasn’t at least be­ing dis­cussed more. It seems that most other peo­ple were as­sum­ing de­ter­minism while I was as­sum­ing liber­tar­ian free will.

What do all of you think of my friend’s strat­egy?

Is the as­sump­tion of de­ter­minism a com­ment on the ac­tual state of the uni­verse, or sim­ply a nec­es­sary as­sump­tion to make the prob­lem in­ter­est­ing?

Well that would work for a pre­dic­tor that 100% pre­dicts your most likely strat­egy. If the pre­dic­tor has even a slight chance of pre­dict­ing the 49% strat­egy in­stead of the 51% strat­egy, you’ll lose out as you’re risk­ing a mil­lion to gain a thou­sand. But yes, the dis­cus­sion in my post as­sumes that the pre­dic­tor can pre­dict any sources of ran­dom­ness that you have ac­cess to.

I usu­ally just think about which de­ci­sion the­ory we’d want to pro­gram into an AI which might get copied, its source code in­spected, etc. That lets you get past the ba­sic stuff, like New­comb’s Prob­lem, and move on to more in­ter­est­ing things. Then you can see which in­tu­itions can be trans­ferred back to prob­lems in­volv­ing hu­mans.

It turns out that many of the com­pli­ca­tions (mul­ti­ple play­ers, am­ne­sia, copy­ing, pre­dic­tors, coun­ter­fac­tu­als) lead to the same idea: that we should model things game-the­o­ret­i­cally and play the global op­ti­mal strat­egy no mat­ter what, in­stead of try­ing to find the op­ti­mal de­ci­sion lo­cally at each node. That idea sum­ma­rizes a large part of UDT (Wei’s origi­nalpro­pos­als of UDT also in­cluded deal­ing with log­i­cal un­cer­tainty, but that turned out to be much harder.) Hence my re­cent posts on how to model an­thropic up­dates and pre­dic­tors game-the­o­ret­i­cally.

“I usu­ally just think about which de­ci­sion the­ory we’d want to pro­gram into an AI which might get copied, its source code in­spected”—well ev­ery­one agrees that if you can pre-com­mit to one-box you ought to. The ques­tion is what about if you’re in the situ­a­tion and you haven’t pre-com­mit­ted. My an­swer is that if you take a choice, then you were im­plic­itly pre-com­mit­ted.

Peo­ple who fol­low UDT don’t need to pre­com­mit, they have a perfectly lo­cal de­ci­sion pro­ce­dure: think back, figure out the best strat­egy, and play a part in it. The ques­tion of pre­com­mit­ment only arises if you fol­low CDT, but why would you fol­low CDT?

The idea of play­ing the best strat­egy can stand on its own, it doesn’t need to be jus­tified by pre­com­mit­ment. I’d say the idea of my­opi­cally choos­ing the next move needs jus­tifi­ca­tion more.

For ex­am­ple, when you’re dealt a weak hand in poker, the temp­ta­tion to fold is strong. But all good play­ers know you must play ag­gres­sively on your weak­est hands, be­cause if you fold, you might as well light up a neon sign say­ing “I have a strong hand” when­ever you do play ag­gres­sively, al­low­ing your op­po­nent to fold and cut their losses. In this case it’s clear that play­ing the best strat­egy is right, and my­opi­cally choos­ing the next move is wrong. You don’t need pre­com­mit­ment to figure it out. Sure, it’s a re­peated game where your op­po­nent can learn about you, but New­comb’s Prob­lem has a pre­dic­tor which amounts to the same thing.

I of­ten think of “de­ter­minism” as too strong a word for what’s go­ing on. The past is fixed, and the past in­fluences the pre­sent, but that doesn’t ex­actly mean that the pre­sent is de­ter­mined wholly by the past, but the past as we see it from the pre­sent can be no other way than what it is and can have no other effect than what it has. This doesn’t mean the pre­sent and fu­ture are fixed un­less we want to com­mit to a par­tic­u­lar meta­phys­i­cal claim about the uni­verse; in­stead it just means that the past is “perfect” or com­plete and we move for­ward from there. We can then rea­son­ably ad­mit all sorts of ways the past need not de­ter­mine the fu­ture while also ac­knowl­edg­ing that the fu­ture is causally linked the a fixed past.

To me this is not re­ally a mat­ter of whether or not we have liber­tar­ian free will. In fact I think we don’t and need not posit it to ex­plain any­thing. My point is per­haps more that when we talk about “de­ter­minism” it’s of­ten mixed up with ideas about a clock-work uni­verse that flows for­ward in a par­tic­u­lar way that we can calcu­late in ad­vance, but the com­pu­ta­tion is so com­plex that the only way to do it is to ac­tu­ally let time ad­vance so the com­pu­ta­tion plays out in the real world, and thus al­though the pre­sent and fu­ture may be linked to the past it can’t be known as well as we can pos­si­bly know it un­til we get there.