Ben Garfinkel: How sure are we about this AI stuff?

This tran­script of an EA Global talk, which CEA has lightly ed­ited for clar­ity, is cross­posted from effec­tivealtru­ism.org. You can also watch the talk on YouTube here.

It is in­creas­ingly clear that ar­tifi­cial in­tel­li­gence is poised to have a huge im­pact on the world, po­ten­tially of com­pa­rable mag­ni­tude to the agri­cul­tural or in­dus­trial rev­olu­tions. But what does that ac­tu­ally mean for us to­day? Should it in­fluence our be­hav­ior? In this talk from EA Global 2018: Lon­don, Ben Garfinkel makes the case for mea­sured skep­ti­cism.

The Talk

To­day, work on risks from ar­tifi­cial in­tel­li­gence con­sti­tutes a note­wor­thy but still fairly small por­tion of the EA port­fo­lio.

Only a small por­tion of dona­tions made by in­di­vi­d­u­als in the com­mu­nity are tar­geted at risks from AI. Only about 5% of the grants given out by the Open Philan­thropy Pro­ject, the lead­ing grant-mak­ing or­ga­ni­za­tion in the space, tar­get risks from AI. And in sur­veys of com­mu­nity mem­bers, most do not list AI as the area that they think should be most pri­ori­tized.

At the same time though, work on AI is promi­nent in other ways. Lead­ing ca­reer ad­vis­ing and com­mu­nity build­ing or­ga­ni­za­tions like 80,000 Hours and CEA of­ten high­light ca­reers in AI gov­er­nance and safety as es­pe­cially promis­ing ways to make an im­pact with your ca­reer. In­ter­est in AI is also a clear el­e­ment of com­mu­nity cul­ture. And lastly, I think there’s also a sense of mo­men­tum around peo­ple’s in­ter­est in AI. I think es­pe­cially over the last cou­ple of years, quite a few peo­ple have be­gun to con­sider ca­reer changes into the area, or made quite large changes in their ca­reers. I think this is true more for work around AI than for most other cause ar­eas.

So I think all of this to­gether sug­gests that now is a pretty good time to take stock. It’s a good time to look back­wards and ask how the com­mu­nity first came to be in­ter­ested in risks from AI. It’s a good time look for­ward and ask how large we ex­pect the com­mu­nity’s bet on AI to be: how large a por­tion of the port­fo­lio we ex­pect AI to be five or ten years down the road. It’s a good time to ask, are the rea­sons that we first got in­ter­ested in AI still valid? And if they’re not still valid, are there per­haps other rea­sons which are ei­ther more or less com­pel­ling?

To give a brief talk roadmap, first I’m go­ing to run through what I see as an in­tu­itively ap­peal­ing ar­gu­ment for fo­cus­ing on AI. Then I’m go­ing to say why this ar­gu­ment is a bit less force­ful than you might an­ti­ci­pate. Then I’ll dis­cuss a few more con­crete ar­gu­ments for fo­cus­ing on AI and high­light some miss­ing pieces of those ar­gu­ments. And then I’ll close by giv­ing con­crete im­pli­ca­tions for cause pri­ori­ti­za­tion.

The in­tu­itive argument

So first, here’s what I see as an in­tu­itive ar­gu­ment for work­ing on AI, and that’d be the sort of, “AI is a big deal” ar­gu­ment.

There are three con­cepts un­der­pin­ning this ar­gu­ment:

The fu­ture is what mat­ters most in the sense that, if you could have an im­pact that car­ries for­ward and af­fects fu­ture gen­er­a­tions, then this is likely to be more eth­i­cally press­ing than hav­ing im­pact that only af­fects the world to­day.

Tech­nolog­i­cal progress is likely to make the world very differ­ent in the fu­ture: that just as the world is very differ­ent than it was a thou­sand years ago be­cause of tech­nol­ogy, it’s likely to be very differ­ent again a thou­sand years from now.

If we’re look­ing at tech­nolo­gies that are likely to make es­pe­cially large changes, then AI stands out as es­pe­cially promis­ing among them.

So given these three premises, we have the con­clu­sion that work­ing on AI is a re­ally good way to have lev­er­age over the fu­ture, and that shap­ing the de­vel­op­ment of AI pos­i­tively is an im­por­tant thing to pur­sue.

I think that a lot of this ar­gu­ment works. I think there are com­pel­ling rea­sons to try and fo­cus on your im­pact in the fu­ture. I think that it’s very likely that the world will be very differ­ent in the far fu­ture. I also think it’s very likely that AI will be one of the most trans­for­ma­tive tech­nolo­gies. It seems at least phys­i­cally pos­si­ble to have ma­chines that even­tu­ally can do all the things that hu­mans can do, and per­haps do all these things much more ca­pa­bly. If this even­tu­ally hap­pens, then what­ever their world looks like, we can be pretty con­fi­dent the world will look pretty differ­ent than it does to­day.

What I find less com­pel­ling though is the idea that these premises en­tail the con­clu­sion that we ought to work on AI. Just be­cause a tech­nol­ogy will pro­duce very large changes, that doesn’t nec­es­sar­ily mean that work­ing on that tech­nol­ogy is a good way to ac­tu­ally have lev­er­age over the fu­ture. Look back at the past and con­sider the most trans­for­ma­tive tech­nolo­gies that have ever been de­vel­oped. So things like elec­tric­ity, or the steam en­g­ine, or the wheel, or steel. It’s very difficult to imag­ine what in­di­vi­d­u­als early in the de­vel­op­ment of these tech­nolo­gies could have done to have a last­ing and fore­see­ably pos­i­tive im­pact. An anal­ogy is some­times made to the in­dus­trial rev­olu­tion and the agri­cul­tural rev­olu­tion. The idea is that in the fu­ture, im­pacts of AI may be sub­stan­tial enough that there will be changes that are com­pa­rable to these two rev­olu­tion­ary pe­ri­ods through­out his­tory.

The is­sue here, though, is that it’s not re­ally clear that ei­ther of these pe­ri­ods ac­tu­ally were pe­ri­ods of es­pe­cially high lev­er­age. If you were, say, an English­man in 1780, and try­ing to figure out how to make this in­dus­try thing go well in a way that would have a last­ing and fore­see­able im­pact on the world to­day, it’s re­ally not clear you could have done all that much. The ba­sic point here is that from a long-ter­mist per­spec­tive, what mat­ters is lev­er­age. This means find­ing some­thing that could go one way or the other, and that’s likely to stick in a fore­see­ably good or bad way far into the fu­ture. Long-term im­por­tance is per­haps a nec­es­sary con­di­tion for lev­er­age, but cer­tainly not a suffi­cient one, and it’s a sort of flawed in­di­ca­tor in its own right.

Three con­crete cases

So now I’m go­ing to move to three some­what more con­crete cases for po­ten­tially fo­cus­ing on AI. You might have a few con­cerns that lead you to work in this area:

In­sta­bil­ity. You might think that there are cer­tain dy­nam­ics around the de­vel­op­ment or use of AI sys­tems that will in­crease the risk of per­ma­nently dam­ag­ing con­flict or col­lapse, for in­stance war be­tween great pow­ers.

Lock-in. Cer­tain de­ci­sions re­gard­ing the gov­er­nance or de­sign of AI sys­tems may per­ma­nently lock in, in a way that prop­a­gates for­ward into the fu­ture in a last­ingly pos­i­tive or nega­tive way.

Ac­ci­dents. It might be quite difficult to use fu­ture sys­tems safely. And that there may be ac­ci­dents that oc­cur in the fu­ture with more ad­vanced sys­tems that cause last­ing harm that again car­ries for­ward into the fu­ture.

Instability

First, the case from in­sta­bil­ity. A lot of the thought here is that it’s very likely that coun­tries will com­pete to reap the benefits eco­nom­i­cally and mil­i­tar­ily from the ap­pli­ca­tions of AI. This is already hap­pen­ing to some ex­tent. And you might think that as the ap­pli­ca­tions be­come more sig­nifi­cant, the com­pe­ti­tion will be­come greater. And in this con­text, you might think that this all in­creases the risk of war be­tween great pow­ers. So one con­cern here is that there may be a po­ten­tial for tran­si­tions in terms of what coun­tries are pow­er­ful com­pared to which other coun­tries.

A lot of peo­ple in the field of in­ter­na­tional se­cu­rity think that these are con­di­tions un­der which con­flict be­comes es­pe­cially likely. You might also be con­cerned about changes in mil­i­tary tech­nol­ogy that, for ex­am­ple, in­crease the odds of ac­ci­den­tal es­ca­la­tion, or make offense more fa­vor­able com­pared to defense. You may also just be con­cerned that in pe­ri­ods of rapid tech­nolog­i­cal change, there are greater odds of mis­per­cep­tion or mis­calcu­la­tion as coun­tries strug­gle to figure out how to use the tech­nol­ogy ap­pro­pri­ately or in­ter­pret the ac­tions of their ad­ver­saries. Or you could be con­cerned that cer­tain ap­pli­ca­tions of AI will in some sense dam­age do­mes­tic in­sti­tu­tions in a way that also in­creases in­sta­bil­ity. That ris­ing un­em­ploy­ment or in­equal­ity might be quite dam­ag­ing, for ex­am­ple. And lastly, you might be con­cerned about the risks from ter­ror­ism, that cer­tain ap­pli­ca­tions might make it quite easy for small ac­tors to cause large amounts of harm.

In gen­eral, I think that many of these con­cerns are plau­si­ble and very clearly im­por­tant. Most of them have not re­ceived very much re­search at­ten­tion at all. I be­lieve that they war­rant much, much more at­ten­tion. At the same time though, if you’re look­ing at things from a long-ter­mist per­spec­tive, there are at least two reser­va­tions you could con­tinue to have. The first is just we don’t re­ally know how wor­ried to be. Th­ese risks re­ally haven’t been re­searched much, and we shouldn’t re­ally take it for granted that AI will be desta­bi­liz­ing. It could be or it couldn’t be. We just ba­si­cally have not done enough re­search to feel very con­fi­dent one way or the other.

You may also be con­cerned, if you’re re­ally fo­cused on long term, that lots of in­sta­bil­ity may not be suffi­cient to ac­tu­ally have a last­ing im­pact that car­ries for­ward through gen­er­a­tions. This is a some­what cal­lous per­spec­tive. If you re­ally are fo­cused on the long term, it’s not clear, for ex­am­ple, that a mid-sized war by his­tor­i­cal stan­dards would be suffi­cient to have a big long term im­pact. So it may be ac­tu­ally a quite high bar to achieve a level of in­sta­bil­ity that a long-ter­mist would re­ally be fo­cused on.

Lock-in

The case from lock-in I’ll talk about just a bit more briefly. Some of the in­tu­ition here is that cer­tain de­ci­sions have been made in the past about, for in­stance the de­sign of poli­ti­cal in­sti­tu­tions, soft­ware stan­dards, or cer­tain out­comes of mil­i­tary or eco­nomic com­pe­ti­tions, which seem to pro­duce out­comes that carry for­ward into the fu­ture for cen­turies. Some ex­am­ples would be the de­sign of the US Con­sti­tu­tion, or the out­come of the Se­cond World War. You might have the in­tu­ition that cer­tain de­ci­sions about the gov­er­nance or de­sign of AI sys­tems, or cer­tain out­comes of strate­gic com­pe­ti­tions, might carry for­ward into the fu­ture, per­haps for even longer pe­ri­ods of time. For this rea­son, you might try and fo­cus on mak­ing sure that what­ever locks in is some­thing that we ac­tu­ally want.

I think that this is a some­what difficult ar­gu­ment to make, or at least it’s a fairly non-ob­vi­ous one. I think the stan­dard skep­ti­cal re­ply is that with very few ex­cep­tions, we don’t re­ally see many in­stances of long term lock-in, es­pe­cially long term lock-in where peo­ple re­ally could have pre­dicted what would be good and what would be bad. Prob­a­bly the most promi­nent ex­am­ples of lock-in are choices around ma­jor re­li­gions that have car­ried for­ward for thou­sands of years. But it’s quite hard to find ex­am­ples that last for hun­dreds of years. Those seem quite few. It’s also gen­er­ally hard to judge what you would want to lock in. If you imag­ine fix­ing some as­pect of the world, as the rest of world changes dra­mat­i­cally, it’s re­ally hard to guess what would ac­tu­ally be good un­der quite differ­ent cir­cum­stances in the fu­ture. I think my gen­eral feel­ing on this line of ar­gu­ment is that, I think it’s prob­a­bly not that likely that we should ex­pect any truly ir­re­versible de­ci­sions around AI to be made any­time soon, even if progress is quite rapid, al­though other peo­ple cer­tainly might dis­agree.

Accidents

Last, we have the case from ac­ci­dents. The idea here is that, we know that there are cer­tain safety en­g­ineer­ing challenges around AI sys­tems. It’s ac­tu­ally quite difficult to de­sign sys­tems that you can feel con­fi­dent will be­have the way you want them to in all cir­cum­stances. This has been laid out most clearly in the pa­per ‘Con­crete Prob­lems in AI Safety,’ from a cou­ple of years ago by Dario Amodei and oth­ers. I’d recom­mend for any­one in­ter­ested in safety is­sues to take a look at that pa­per. Then we might think, given the ex­is­tence of these safety challenges, and given the be­lief or ex­pec­ta­tion that AI sys­tems will be­come much more pow­er­ful in the fu­ture or be given much more re­spon­si­bil­ity, we might ex­pect that these safety con­cerns will be­come more se­ri­ous as time goes on.

At the limit you might worry that these safety failures could be­come so ex­treme that they could per­haps de­rail civ­i­liza­tion on the whole. In fact, there is a bit of writ­ing ar­gu­ing that we should be wor­ried about these sort of ex­is­ten­tial safety failures. The main work ar­gu­ing for this is still the book ‘Su­per­in­tel­li­gence’ by Nick Bostrom, pub­lished in 2014. Be­fore this, es­says by Eliezer Yud­kowsky were the main source of ar­gu­ments along these lines. And then a num­ber of other writ­ers such as Stu­art Rus­sell or, a long time ago, IJ Goods or David Chalmers have also ex­pressed similar con­cerns, albeit more briefly. The writ­ing on ex­is­ten­tial safety ac­ci­dents definitely isn’t ho­mo­ge­neous, but of­ten there’s a sort of similar nar­ra­tive that ap­pears in these es­says ex­press­ing these con­cerns. There’s this ba­sic stan­dard dis­aster sce­nario that has a few com­mon el­e­ments.

First, the au­thor imag­ines that a sin­gle AI sys­tem ex­pe­riences a mas­sive jump in ca­pa­bil­ities. Over some short pe­riod of time, a sin­gle sys­tem be­comes much more gen­eral or much more ca­pa­ble than any other sys­tem in ex­is­tence, and in fact any hu­man in ex­is­tence. Then given the sys­tem, re­searchers spec­ify a goal for it. They give it some in­put which is meant to com­mu­ni­cate what be­hav­ior it should en­gage in. The goal ends up be­ing some­thing quite sim­ple, and the sys­tem goes off and sin­gle-hand­edly pur­sues this very sim­ple goal in a way that vi­o­lates the full nu­ances of what its de­sign­ers in­tended.

There’s a clas­sic sort of toy ex­am­ple, which is of­ten used to illus­trate this con­cern. We imag­ine that some poor pa­per­clip fac­tory owner re­ceives a gen­eral su­per-in­tel­li­gent AI on his doorstep. There’s a slot that’s to stick in a goal. He writes down the goal “max­i­mize pa­per­clip pro­duc­tion,” puts it in the AI sys­tem, and then lets it go off and do that. The sys­tem figures out the best way to max­i­mize pa­per­clip pro­duc­tion is to take over all the world’s re­sources, just to plow them all into pa­per­clips. And the sys­tem is so ca­pa­ble that de­sign­ers can do noth­ing to stop it, even though it’s do­ing some­thing that they ac­tu­ally re­ally do not in­tend.

I have some gen­eral con­cerns about the ex­ist­ing writ­ing on ex­is­ten­tial ac­ci­dents. So first there’s just still very lit­tle of it. It re­ally is just mostly Su­per­in­tel­li­gence and es­says by Eliezer Yud­kowsky, and then sort of a hand­ful of shorter es­says and talks that ex­press very similar con­cerns. There’s also been very lit­tle sub­stan­tive writ­ten crit­i­cism of it. Many peo­ple have ex­pressed doubts or been dis­mis­sive of it, but there’s very lit­tle in the way of skep­ti­cal ex­perts who are sit­ting down and fully en­gag­ing with it, and writ­ing down point by point where they dis­agree or where they think the mis­takes are. Most of the work on ex­is­ten­tial ac­ci­dents was also writ­ten be­fore large changes in the field of AI, es­pe­cially be­fore the re­cent rise of deep learn­ing, and also be­fore work like ‘Con­crete Prob­lems in AI Safety,’ which laid out safety con­cerns in a way which is more rec­og­niz­able to AI re­searchers to­day.

Most of the ar­gu­ments for ex­is­ten­tial ac­ci­dents of­ten rely on these sort of fuzzy, ab­stract con­cepts like op­ti­miza­tion power or gen­eral in­tel­li­gence or goals, and toy thought ex­per­i­ments like the pa­per clip­per ex­am­ple. And cer­tainly thought ex­per­i­ments and ab­stract con­cepts do have some force, but it’s not clear ex­actly how strong a source of ev­i­dence we should take these as. Then lastly, al­though many AI re­searchers ac­tu­ally have ex­pressed con­cern about ex­is­ten­tial ac­ci­dents, for ex­am­ple Stu­art Rus­sell, it does seem to be the case that many, and per­haps most AI re­searchers who en­counter at least abridged or sum­ma­rized ver­sions of these con­cerns tend to bounce off them or just find them not very plau­si­ble. I think we should take that se­ri­ously.

I also have some more con­crete con­cerns about writ­ing on ex­is­ten­tial ac­ci­dents. You should cer­tainly take these con­cerns with a grain of salt be­cause I am not a tech­ni­cal re­searcher, al­though I have talked to tech­ni­cal re­searchers who have es­sen­tially similar or even the same con­cerns. The gen­eral con­cern I have is that these toy sce­nar­ios are quite difficult to map onto some­thing that looks more rec­og­niz­ably plau­si­ble. So these sce­nar­ios of­ten in­volve, again, mas­sive jumps in the ca­pa­bil­ities of a sin­gle sys­tem, but it’s re­ally not clear that we should ex­pect such jumps or find them plau­si­ble. This is a wooly is­sue. I would recom­mend check­ing out writ­ing by Katja Grace or Paul Chris­ti­ano on­line. That sort of lays out some con­cerns about the plau­si­bil­ity of mas­sive jumps.

Another el­e­ment of these nar­ra­tives is, they of­ten imag­ine some sys­tem which be­comes quite gen­er­ally ca­pa­ble and then is given a goal. In some sense, this is the re­verse of the way ma­chine learn­ing re­search tends to look to­day. At least very loosely speak­ing, you tend to spec­ify a goal or some means of pro­vid­ing feed­back. You di­rect the be­hav­ior of a sys­tem and then al­low it to be­come more ca­pa­ble over time, as op­posed to the re­verse. It’s also the case that these toy ex­am­ples stress the nu­ances of hu­man prefer­ences, with the idea be­ing that be­cause hu­man prefer­ences are so nu­anced and so hard to state pre­cisely, it should be quite difficult to get a ma­chine that can un­der­stand how to obey them. But it’s also the case in ma­chine learn­ing that we can train lots of sys­tems to en­gage in be­hav­iors that are ac­tu­ally quite nu­anced and that we can’t spec­ify pre­cisely. Rec­og­niz­ing faces from images is an ex­am­ple of this. So is fly­ing a he­li­copter.

It’s re­ally not clear ex­actly why hu­man prefer­ences would be so fatal to un­der­stand. So it’s quite difficult to figure out how to map the toy ex­am­ples onto some­thing which looks more re­al­is­tic.

Caveats

Some gen­eral caveats on the con­cerns ex­pressed. None of my con­cerns are meant to be de­ci­sive. I’ve found, for ex­am­ple, that many peo­ple work­ing in the field of AI safety in fact list some­what differ­ent con­cerns as ex­pla­na­tions for why they be­lieve the area is very im­por­tant. There are many more ar­gu­ments that I be­lieve are shared in­di­vi­d­u­ally, or in­side peo­ple’s heads and cur­rently un­pub­lished. I re­ally can’t speak ex­actly to how com­pel­ling these are. The main point I want to stress here is es­sen­tially that when it comes to the writ­ing which has ac­tu­ally been pub­lished, and which is out there for anal­y­sis, I don’t think it’s nec­es­sar­ily that force­ful, and at the very least it’s not de­ci­sive.

So now I have some brief, prac­ti­cal im­pli­ca­tions, or thoughts on pri­ori­ti­za­tion. You may think, from all the stuff I’ve just said, that I’m quite skep­ti­cal about AI safety or gov­er­nance as ar­eas to work in. In fact, I’m ac­tu­ally fairly op­ti­mistic. My rea­son­ing here is that I re­ally don’t think that there are any slam-dunks for im­prov­ing the fu­ture. I’m not aware of any sin­gle cause area that seems very, very promis­ing from the per­spec­tive of offer­ing high as­surance of long-term im­pact. I think that the fact that there are at least plau­si­ble path­ways for im­pact by work­ing on AI safety and AI gov­er­nance puts it head and shoulders above most ar­eas you might choose to work in. And AI safety and AI gov­er­nance also stand out for be­ing pretty ex­traor­di­nar­ily ne­glected.

Depend­ing on how you count, there are prob­a­bly fewer than a hun­dred peo­ple in the world work­ing on tech­ni­cal safety is­sues or gov­er­nance challenges with an eye to­wards very long-term im­pacts. And that’s just truly, very sur­pris­ingly small. The over­all point though, is that the ex­act size of the bet that EA should make on ar­tifi­cial in­tel­li­gence, sort of the size of the port­fo­lio that AI should take up will de­pend on the strength of the ar­gu­ments for fo­cus­ing on AI. And most of those ar­gu­ments still just aren’t very fleshed out yet.

I also have some broader episte­molog­i­cal con­cerns which con­nect to the con­cerns I’ve ex­pressed. I think it’s also pos­si­ble that there are so­cial fac­tors re­lat­ing to EA com­mu­ni­ties that might bias us to take an es­pe­cially large in­ter­est in AI.

One thing is just that AI is es­pe­cially in­ter­est­ing or fun to talk about, es­pe­cially com­pared to other cause ar­eas. It’s an in­ter­est­ing, kind of con­trar­ian an­swer to the ques­tion of what is most im­por­tant to work on. It’s sur­pris­ing in cer­tain ways. And it’s also now the case that in­ter­est in AI is to some ex­tent an el­e­ment of com­mu­nity cul­ture. Peo­ple have an in­ter­est in it that goes be­yond just the be­lief that it’s an im­por­tant area to work in. It definitely has a cer­tain role in the con­ver­sa­tions that peo­ple have ca­su­ally, and what peo­ple like to talk about. I think these wouldn’t nec­es­sar­ily be that con­cern­ing, ex­cept peo­ple some­times also think that we can’t re­ally count on ex­ter­nal feed­back to push us back if we sort of drift a bit.

So first it just seems to be em­piri­cally the case that skep­ti­cal AI re­searchers gen­er­ally will not take the time to sit down and en­gage with all of the writ­ing, and then ex­plain care­fully why they dis­agree with our con­cerns. So we can’t re­ally ex­pect that much ex­ter­nal feed­back of that form. Peo­ple who are skep­ti­cal or con­fused, but not AI re­searchers, or just gen­er­ally not ex­perts may be con­cerned about sound­ing ig­no­rant or dumb if they push back, and they also won’t be in­clined to be­come ex­perts. We should also ex­pect gen­er­ally very weak feed­back loops. If you’re try­ing to in­fluence the very long-run fu­ture, it’s hard to tell how well you’re do­ing, just be­cause the long-run fu­ture hasn’t hap­pened yet and won’t hap­pen for a while.

Gen­er­ally, I think one thing to watch out for is jus­tifi­ca­tion drift. If we start to no­tice that the com­mu­nity’s in­ter­est in AI stays con­stant, but the rea­sons given for fo­cus­ing on it change over time, then this would be sort of a po­ten­tial check en­g­ine light, or at least a sort of trig­ger to be es­pe­cially self-con­scious or self-crit­i­cal, be­cause that may be some in­di­ca­tion of mo­ti­vated rea­son­ing go­ing on.

Conclusion

I have just a hand­ful of short take­aways. First, I think that not enough work has gone into an­a­lyz­ing the case for pri­ori­tiz­ing AI. Ex­ist­ing pub­lished ar­gu­ments are not de­ci­sive. There may be many other pos­si­ble ar­gu­ments out there, which could be much more con­vinc­ing or much more de­ci­sive, but those just aren’t out there yet, and there hasn’t been much writ­ten crit­i­ciz­ing the stuff that’s out there.

For this rea­son, think­ing about the case for pri­ori­tiz­ing AI may be an es­pe­cially high im­pact thing to do, be­cause it may shape the EA port­fo­lio for years into the fu­ture. And just gen­er­ally, we need to be quite con­scious of pos­si­ble com­mu­nity bi­ases. It’s pos­si­ble that cer­tain so­cial fac­tors will lead us to drift in what we pri­ori­tize, that we re­ally should not be al­low­ing to in­fluence us. And just in gen­eral, if we’re go­ing to be putting sub­stan­tial re­sources into any­thing as a com­mu­nity, we need to be es­pe­cially cer­tain that we un­der­stand why we’re do­ing this, and that we stay con­scious that our rea­sons for get­ting in­ter­ested in the first place con­tinue to be good rea­sons. Thank you.

Questions

Ques­tion: What ad­vice would you give to one who wants to do the kind of re­search that you are do­ing here about the case for AI po­ten­tially, as op­posed to the AI it­self?

Ben: Some­thing that I be­lieve would be ex­tremely valuable is just ba­si­cally talk­ing to lots of peo­ple who are con­cerned about AI and ask­ing them pre­cisely what rea­sons they find com­pel­ling. I’ve started to do this a lit­tle bit re­cently and it’s ac­tu­ally been quite in­ter­est­ing that peo­ple seem to have pretty di­verse rea­sons, and many of them are things that peo­ple want to write blog posts on, but just haven’t done. So, I think this is a low-hang­ing fruit that would be quite valuable. Just talk­ing to peo­ple who are con­cerned about AI, try­ing to un­der­stand ex­actly why they’re con­cerned, and ei­ther writ­ing up their ideas or helping them to do that. I think that would be very valuable and prob­a­bly not that time in­ten­sive ei­ther.

Ques­tion: Have you seen any of the jus­tifi­ca­tion drift that you al­luded to? Can you pin­point that hap­pen­ing in the com­mu­nity?

Ben: Yeah. I think that’s cer­tainly hap­pen­ing to some ex­tent. Even for my­self, I be­lieve that’s hap­pened for me to some ex­tent. When I ini­tially be­came in­ter­ested in AI, I was es­pe­cially con­cerned about these ex­is­ten­tial ac­ci­dents. I think I now place rel­a­tively greater promi­nence on sort of the case from in­sta­bil­ity as I de­scribed it. And that’s cer­tainly, you know, one pos­si­ble ex­am­ple of jus­tifi­ca­tion drift. It may be the case that this was ac­tu­ally a sen­si­ble way to shift em­pha­sis, but would be some­thing of a warn­ing sign. And I’ve also just spo­ken to tech­ni­cal re­searchers, as well, who used to be es­pe­cially con­cerned about this idea of an in­tel­li­gence ex­plo­sion or re­cur­sive self im­prove­ment. Th­ese very large jumps. I now have spo­ken to a num­ber of peo­ple who are still quite con­cerned about ex­is­ten­tial ac­ci­dents, but make ar­gu­ments that don’t hinge on there be­ing this one sin­gle mas­sive jump into a sin­gle sys­tem.

Ques­tion: You made the anal­ogy to the in­dus­trial rev­olu­tion, and the 1780 English­man who doesn’t re­ally have much abil­ity to shape how the steam en­g­ine is go­ing to be used. It seems in­tu­itively quite right. The ob­vi­ous coun­ter­point would be, well AI is a prob­lem-solv­ing ma­chine. There’s some­thing kind of differ­ent about it. I mean, does that not feel com­pel­ling to you, the sort of in­her­ent differ­ent­ness of AI?

Ben: So I think prob­a­bly the strongest in­tu­ition is, you might think that there will even­tu­ally be a point where we start turn­ing more and more re­spon­si­bil­ity over to au­to­mated sys­tems or ma­chines, and that there might even­tu­ally come a point where hu­mans have al­most no con­trol over what’s hap­pen­ing what­so­ever, that we keep turn­ing over more and more re­spon­si­bil­ity and there’s a point where ma­chines are in some sense in con­trol and you can’t back out. And you might have some sort of ir­re­versible junc­ture here. I definitely, to some ex­tent, share that in­tu­ition that if you’re look­ing over a very long time span, that that is prob­a­bly fairly plau­si­ble. I sup­pose the in­tu­ition I don’t nec­es­sar­ily have is that un­less things go, I sup­pose quite wrong or if they hap­pen in some­what sur­pris­ing ways, I don’t nec­es­sar­ily an­ti­ci­pate that there will be this re­ally ir­re­versible junc­ture com­ing any­time soon. If let’s say it takes a thou­sand years for con­trol to be handed off, then I am not that op­ti­mistic about peo­ple hav­ing that much con­trol over what that hand­off looks like by work­ing on things to­day. But I cer­tainly am not very con­fi­dent.

Ques­tion: Are there any poli­cies that you think a gov­ern­ment should im­ple­ment at this stage of the game, in light of the con­cerns around AI safety? And how would you al­lo­cate re­sources be­tween ex­ist­ing is­sues and pos­si­ble fu­ture risks?

Ben: Yeah, I am still quite hes­i­tant, I think, to recom­mend very sub­stan­tive poli­cies that I think gov­ern­ments should be im­ple­ment­ing to­day. I cur­rently have a lot of ag­nos­ti­cism about what would be use­ful, and I think that most cur­rent ex­ist­ing is­sues that gov­ern­ments are mak­ing de­ci­sions on aren’t nec­es­sar­ily that crit­i­cal. I think there’s lots of stuff that can be done that would be very valuable, like hav­ing stronger ex­per­tise or stronger lines of di­alogue be­tween the pub­lic and pri­vate sec­tor, and things like this. But I would be hes­i­tant at this point to recom­mend a very con­crete policy that at least I’m con­fi­dent would be good to im­ple­ment right now.

Ques­tion: You men­tioned the con­cept of kind of a con­crete de­ci­sive ar­gu­ment. Do you see con­crete, de­ci­sive ar­gu­ments for other cause ar­eas that are some­how more con­crete and de­ci­sive than for AI, and what is the differ­ence?

Ben: Yeah. So I guess I tried to al­lude to this a lit­tle bit, but I don’t think that re­ally any cause area has an es­pe­cially de­ci­sive ar­gu­ment for be­ing a great way to in­fluence the fu­ture. There’s some that I think you can put sort of a lower bound on at least how likely it is to be use­ful that’s some­what clear. So for ex­am­ple, risk from nu­clear war. It’s fairly clear that it’s at least plau­si­ble this could hap­pen over the next cen­tury. You know, nu­clear war has al­most hap­pened in the past, the cli­mate effects are spec­u­la­tive, but at least some­what well un­der­stood. And then there’s this ques­tion of if there were nu­clear war, how dam­ag­ing is this? Do peo­ple even­tu­ally come back from this? And that’s quite un­cer­tain, but I think it’d be difficult to put above 99% chance that peo­ple would come back from a nu­clear war.

So, in that case you might have some sort of a clean lower bound on, let’s say work­ing on nu­clear risk. Or, quite similarly, work­ing on pan­demics. And I think for AI it’s difficult to have that sort of con­fi­dent lower bound. I ac­tu­ally tend to think, I guess as I al­luded to, that AI is prob­a­bly or pos­si­bly still the most promis­ing area based on my cur­rent cre­dences, and its ex­treme ne­glect­ed­ness. But yeah, I don’t think any cause area stands out as es­pe­cially de­ci­sive as a great place to work.

Ques­tion: I’m an AI ma­chine learn­ing re­searcher PhD stu­dent cur­rently, and I’m skep­ti­cal about the risk of AGI. How would you sug­gest that I con­tribute to the pro­cess of pro­vid­ing this feed­back that you’re iden­ti­fy­ing as a need?

Ben: Yeah, I mean I think just a com­bi­na­tion of in-per­son con­ver­sa­tions and then I think even sim­ple blog posts can be quite helpful. I think there’s still been sur­pris­ingly lit­tle in the way of just, let’s say some­thing writ­ten on­line that I would point some­one to who wants the skep­ti­cal case. This ac­tu­ally is a big part of the rea­son I sup­pose I gave this talk, even though I con­sider my­self not ex­tremely well placed to give it, given that I am not a tech­ni­cal per­son. There’s so lit­tle out there along these lines that there’s low hang­ing fruit, es­sen­tially.

Ques­tion: Promi­nent deep learn­ing ex­perts such as Yann Le­cun and An­drew Ng do not seem to be wor­ried about risks from su­per­in­tel­li­gence. Do you think that they have es­sen­tially the same view that you have or are they com­ing at it from a differ­ent an­gle?

Ben: I’m not sure of their spe­cific con­cerns. I know this clas­sic thing that An­drew Ng always says is he com­pares it to wor­ry­ing about over­pop­u­la­tion on Mars, where the sug­ges­tion is that these risks, if they ma­te­ri­al­ize, are just so far away that it’s re­ally pre­ma­ture to worry about them. So it seems to be sort of an ar­gu­ment from timeline con­sid­er­a­tions. I’m ac­tu­ally not quite sure what his view is in terms of, if we were like, let’s say 50 years in the fu­ture, would he think that this is a re­ally great area to work on? I’m re­ally not quite sure.

I ac­tu­ally tend to think that the line of think­ing that says, “Oh, this is so far away so we shouldn’t work on it” just re­ally isn’t that com­pel­ling. It seems like we have a load of un­cer­tainty about AI timelines. It seems like no one can be very con­fi­dent about that. So yeah, it’d be hard to get un­der, let’s say one per­cent that in­ter­est­ing things won’t hap­pen in the next 30 years or so. So I’m not quite sure about the ex­tent of his con­cerns, but if they’re based on timelines, I ac­tu­ally don’t find them that com­pel­ling.

I was con­fused by the head­line. “Ben Garfinkel: How Sure are we about this AI Stuff?” would make it clear that it is not some kind of offi­cial state­ment from the CEA. Chang­ing an au­thor to EA Global or even to the co-au­thor­ship of EA Global and Ben Garfinkel would help as well.

Thanks for this sug­ges­tion, Misha. I’ve changed the head­line to in­clude Ben’s name, and I’m re­view­ing our tran­script-pub­lish­ing pro­cess to see how we be more clear in the fu­ture (e.g. by post­ing un­der au­thors’ names if they have an EA Fo­rum ac­count, as we do when we cross­post from a user’s blog).

An up­date: The pre­vi­ous name on this ac­count was “Cen­tre for Effec­tive Altru­ism”. Since the ac­count was origi­nally made for the pur­pose of post­ing tran­scripts from EA Global, I’ve re­named it to “EA Global Tran­scripts” to avert fur­ther con­fu­sion.

Cur­rently, co-au­thor­ship only pro­duces karma for the “lead au­thor”. The same is true on LessWrong, where most of the Fo­rum’s code comes from, and they’re in­ter­ested in chang­ing that at some point (I sub­mit­ted a Github re­quest here), but it would re­quire a more-than-triv­ial in­fras­truc­ture change, so I don’t know how highly they’ll pri­ori­tize it.

I think the big dis­anal­ogy be­tween AI and the In­dus­trial and Agri­cul­tural rev­olu­tions is that there seems to be a se­ri­ous chance that an AI ac­ci­dent will kill us all. (And more­over this isn’t guaran­teed; it’s some­thing we have lev­er­age over, by do­ing safety re­search and in­fluenc­ing policy to dis­cour­age arms races and en­courage more safety re­search.) I can’t think of any­thing com­pa­rable for the IR or AR. In­deed, there are only two other cases in his­tory of risk on that scale: Nu­clear war and pan­demics.

Thanks for this talk/​post—It’s a good ex­am­ple of the sort of self-skep­ti­cism that I think we should en­courage.

FWIW, I think it’s a mis­take to con­strue the clas­sic model of AI ac­ci­dent catas­tro­phe as ca­pa­bil­ity gain first, then goal ac­qui­si­tion. I say this be­cause (a) I never in­ter­preted it that way when read­ing the clas­sic texts, and (b) it doesn’t re­ally make sense—the origi­nal texts are very clear that the mas­sive jump in AI ca­pa­bil­ity is sup­posed to come from re­cur­sive self-im­prove­ment, i.e. the AI helping to do AI re­search. So already we have some sort of goal-di­rected be­hav­ior (brack­et­ing CAIS/​ToolAI ob­jec­tions!) lead­ing up to and in­clud­ing the point of ar­rival at su­per­in­tel­li­gence.

I would con­strue the lit­tle sci-fi sto­ries about putting goals into goal slots as not be­ing a pre­dic­tion about the ar­chi­tec­ture of AI but rather illus­tra­tions of com­pletely differ­ent points about e.g. or­thog­o­nal­ity of value or the dan­gers of un­al­igned su­per­in­tel­li­gences.

At any rate, though, what does it mat­ter whether the goal is put in af­ter the ca­pa­bil­ity growth, or be­fore/​dur­ing? Ob­vi­ously, it mat­ters, but it doesn’t mat­ter for pur­poses of eval­u­at­ing the pri­or­ity of AI safety work, since in both cases the po­ten­tial for ac­ci­den­tal catas­tro­phe ex­ists.

the origi­nal texts are very clear that the mas­sive jump in AI ca­pa­bil­ity is sup­posed to come from re­cur­sive self-im­prove­ment, i.e. the AI helping to do AI research

...be­cause that AI re­search is use­ful for some other goal the AI has, such as max­i­miz­ing pa­per­clips. See the in­stru­men­tal con­ver­gence the­sis.

At any rate, though, what does it mat­ter whether the goal is put in af­ter the ca­pa­bil­ity growth, or be­fore/​dur­ing? Ob­vi­ously, it mat­ters, but it doesn’t mat­ter for pur­poses of eval­u­at­ing the pri­or­ity of AI safety work, since in both cases the po­ten­tial for ac­ci­den­tal catas­tro­phe ex­ists.

The ar­gu­ment for doom by de­fault seems to rest on a de­fault mi­s­un­der­stand­ing of hu­man val­ues as the pro­gram­mer at­tempts to com­mu­ni­cate them to the AI. If ca­pa­bil­ity growth comes be­fore a goal is granted, it seems less likely that mi­s­un­der­stand­ing will oc­cur.

Pre­sum­ably the pro­gram­mer will make some effort to em­bed the right set of val­ues in the AI. If this is an easy task, doom is prob­a­bly not the de­fault out­come.

AI pes­simists have ar­gued hu­man val­ues will be difficult to com­mu­ni­cate due to their com­plex­ity. But as AI ca­pa­bil­ities im­prove, AI sys­tems get bet­ter at learn­ing com­plex things.

Both the in­stru­men­tal con­ver­gence the­sis and the com­plex­ity of value the­sis are key parts of the ar­gu­ment for AI pes­simism as it’s com­monly pre­sented. Are you claiming that they aren’t ac­tu­ally nec­es­sary for the ar­gu­ment to be com­pel­ling? (If so, why were they in­cluded in the first place? This sounds a bit like jus­tifi­ca­tion drift.)

...be­cause that AI re­search is use­ful for some other goal the AI has, such as max­i­miz­ing pa­per­clips. See the in­stru­men­tal con­ver­gence the­sis.

Yes, ex­actly.

The ar­gu­ment for doom by de­fault seems to rest on a de­fault mi­s­un­der­stand­ing of hu­man val­ues as the pro­gram­mer at­tempts to com­mu­ni­cate them to the AI. If ca­pa­bil­ity growth comes be­fore a goal is granted, it seems less likely that mi­s­un­der­stand­ing will oc­cur.

Eh, I could see ar­gu­ments that it would be less likely and ar­gu­ments that it would be more likely. Ar­gu­ment that it is less likely: We can use the ca­pa­bil­ities to do some­thing like “Do what we mean,” al­low­ing us to state our goals im­pre­cisely & sur­vive. Ar­gu­ment that it is more likely: If we mess up, we im­me­di­ately have an un­al­igned su­per­in­tel­li­gence on our hands. At least if the goals come be­fore the ca­pa­bil­ity growth, there is a pe­riod where we might be able to con­tain it and test it, since it isn’t ca­pa­ble of es­cap­ing or con­ceal­ing its in­ten­tions.

we don’t re­ally know how wor­ried to be [about in­sta­bil­ity risk from AI]. Th­ese risks re­ally haven’t been re­searched much, and we shouldn’t re­ally take it for granted that AI will be desta­bi­liz­ing. It could be or it couldn’t be. We just ba­si­cally have not done enough re­search to feel very con­fi­dent one way or the other.

This makes me worry about tractabil­ity. The prob­lem of in­sta­bil­ity has been known for at least five years now, and we haven’t made any progress?

A note on lev­er­age: One clear differ­ence be­tween AI and non-lev­er­age-able things like wheels or steam en­g­ines is that there are many differ­ent ways to build AI.

Some­one who tried to cre­ate a tri­an­gu­lar wheel wouldn’t have got­ten far, but it seems plau­si­ble that many differ­ent kinds of AI sys­tem could be­come very pow­er­ful, with no par­tic­u­lar kind of sys­tem “guaran­teed” to arise even if it hap­pens to be the most effec­tive kind—there are switch­ing costs and mar­ket fac­tors and brand­ing to con­sider. (I as­sume that switch­ing be­tween AI sys­tems/​paradigms for a pro­ject will be harder than switch­ing be­tween mod­els of steam en­g­ine).

This makes me think that it is pos­si­ble, at least in prin­ci­ple, for our ac­tions now to in­fluence what fu­ture AI sys­tems look like.