“Taking AI Risk Seriously” – Thoughts by Andrew Critch

I wrote this sev­eral months ago for LessWrong, but it seemed use­ful to have cross­posted here.

It’s a writeup of sev­eral in­for­mal con­ver­sa­tions I had with An­drew Critch (of the Berkeley Ex­is­ten­tial Risk Ini­ti­a­tive) about what con­sid­er­a­tions are im­por­tant for tak­ing AI Risk se­ri­ously, based on his un­der­stand­ing of the AI land­scape. (The land­scape has changed slightly in the past year, but I think most con­cerns are still rele­vant)

[meta: this com­ment is writ­ten in more ar­gu­men­ta­tive way than what is my ac­tual po­si­tion (where I have more un­cer­tainty); it seems more use­ful to state the dis­agree­ment than de­scribe the un­cer­tain­ties]

If I un­der­stand the model for how the AI safety field should grow, im­plic­itly ad­vo­cated for by this text, cor­rectly, it seems to me the model is pos­si­bly wrong/​harm­ful.

(?) up­load good pa­pers on arx­ive or at least write im­pres­sive posts on LW

(?) than, they get no­ticed and start talk­ing to other peo­ple in the field

This seems strange

I. They way how this path into the field is “fil­ter­ing peo­ple” is a set of filters ca: “abil­ity to get enough money to cre­ate them­selves run­way” AND “re­ally re­ally strong mo­ti­va­tion to work on safety” AND “abil­ity to work for long pe­riod of time in iso­la­tion” AND “abil­ity to turn in­sight into pa­pers”.

This seems to be fil­ter­ing on some­what ar­bi­trary set of crite­ria. Likely drop­ping tal­ented peo­ple, who for ex­am­ple

would have to pay high op­por­tu­nity costs by work­ing on “cre­at­ing fi­nan­cial run­way”

get de­pressed work­ing in iso­la­tion on x-risk problems

are at least ini­tially in­ter­nally more mo­ti­vated by in­ter­est­ing re­search prob­lems than ex­is­ten­tial risk worries

(and many more!)

II. Do­ing re­search in this field “with­out talk­ing to peo­ple” is prob­a­bly quite hard. It is im­prov­ing, but many im­por­tant ideas and con­sid­er­a­tions are im­plicit/​not shared pub­li­cly.

III. It en­courages peo­ple to do some sort of moral sac­ri­fice (op­po­site of moral haz­ard: you take the risk, mostly oth­ers benefit). It would be at least fair to point it out. Also it seems rel­a­tively easy for the com­mu­nity to de­crease the per­sonal risks, but this way of think­ing does not lead peo­ple to ac­tu­ally do it.

IV. The model seems to sug­gest peo­ple should learn and start do­ing re­search in a very differ­ent way to other in­tel­lec­tual en­ter­prises like physics or ma­chine learn­ing. Why it should be the case?

Ah. So I’m not sure I can rep­re­sent Critch here off-the-cuff, but my in­ter­pre­ta­tion of this post is a bit differ­ent than what you’ve laid out here.

This is not a pro­posal for how the field over­all should grow. There should be in­fras­truc­tural efforts made to on­board peo­ple via men­tor­ship, things like AI Safety Camp, things like MIRI Fel­lows, etc.

This post is an on-the-mar­gin recom­men­da­tion to some sub­set of peo­ple. I think there were a few in­tents here:

1. If you’re ba­sic plan is to donate, con­sider try­ing to be­come use­ful for di­rect work in­stead. Get­ting use­ful on di­rect work prob­a­bly re­quires at least some chunk of time for think­ing and un­der­stand­ing the prob­lem, and some chunk of time for learn­ing new skills.

2. The “take time off to think” thing isn’t meant to be “do solo work” (like writ­ing pa­pers) It’s more speci­fi­cally for learn­ing about the AI Align­ment prob­lem and land­scape. From there, maybe the thing you do is write pa­pers (solo or at an org), or maybe it’s ap­ply for a man­age­rial or ops po­si­tion at an org, or maybe it’s found­ing a new pro­ject.

3. I think (per­sonal opinion, al­though I ex­pect Critch to agree), that when it comes to learn­ing skills there are prob­a­bly bet­ter ways to go about it than “just study in­de­pen­dently.” (Note the sub-sec­tions on tak­ing ad­van­tage of be­ing in school). This will vary from per­son to per­son.

4. Not re­ally cov­ered in the post, but I per­son­ally think there’s a “men­tor­ship bot­tle­neck”. It’s ob­vi­ously bet­ter to have men­tors and com­pan­ions, and the field should try to flesh that out. The filter for peo­ple who can work at least some­what in­de­pen­dently and figure things out for them­selves is a filter of ne­ces­sity, not an ideal situ­a­tion.

3. I think Critch was speci­fi­cally try­ing to fill some par­tic­u­lar-gaps on the mar­gin, which is “peo­ple who can be trusted to flesh out the mid­dle-tier hi­er­ar­chy”, who can be en­trusted to launch and run new pro­jects com­pe­tently with­out need­ing to be con­stantly dou­blechecked. This is nec­es­sary to grow the field for peo­ple who do still need men­tor­ship or guidance. (My read from re­cent 80k posts is that the field is still some­what “man­age­ment bot­tle­necked”)

Even if you’re not in­ter­ested in ori­ent­ing your life around helping with x-risk – if you just want to not be blind­sided by rad­i­cal changes that may be coming

[...]

We don’t know ex­actly what will hap­pen, but I ex­pect se­ri­ous changes of some sort over the next 10 years. Even if you aren’t com­mit­ting to sav­ing the world, I think it’s in your in­ter­est just to un­der­stand what is hap­pen­ing, so in a decade or two you aren’t com­pletely lost.

And even ‘un­der­stand­ing the situ­a­tion’ is com­pli­cated enough that I think you need to be able to quit your day-job and fo­cus full-time, in or­der to get ori­ented.

Ray­mond, do you or An­drew Critch have any con­crete pos­si­bil­ities in mind for what “ori­ent­ing one’s life”/​“un­der­stand­ing the situ­a­tion” might look like from a non-al­tru­is­tic per­spec­tive? I’m in­ter­ested in hear­ing con­crete ideas for what one might do; the only sug­ges­tions I can re­call see­ing so far were men­tioned in the 80,000 Hours pod­cast epi­sode with Paul Chris­ti­ano, to save money and in­vest in cer­tain com­pa­nies. Is this the sort of thing you had in mind?

The way I am imag­in­ing it, a per­son think­ing about this from a non-al­tru­is­tic per­spec­tive would then think about the prob­lem for sev­eral years and would nar­row this list down (or add new things to it) and act on some sub­set of them (e.g. maybe they would think about which com­pa­nies to in­vest in and de­cide how much money to save, but to not im­ple­ment some other idea). Is this an ac­cu­rate un­der­stand­ing of your view?

(Off the cuff thoughts, which are very low con­fi­dence. Not at­tributed to Critch at all)

So, this de­pends quite a bit on how you think the world is shaped (which is a com­plex enough ques­tion that Critch made the recom­men­da­tion to just think about it for weeks or months). But the three classes of an­swer that I can think of are:

a) in many pos­si­ble wor­lds, the self­ish and al­tru­is­tic an­swers are just the same. The best way to sur­vive a fast or even mod­er­ate take­off is to en­sure a pos­i­tive sin­gu­lar­ity, and just pour­ing your efforts and money into max­i­miz­ing the chance of that is re­ally all you can do.

b) in some pos­si­ble wor­lds (per­haps like Robin Han­son’s Age of Em), it might mat­ter that you have skills that can go into shap­ing the world (such as skil­led pro­gram­ming). Though this is re­al­is­ti­cally only an op­tion for some peo­ple.

c) for the pur­poses of flour­ish­ing in the in­ter­ven­ing years (if we’re in a slow­ish take­off over the next 1-4 decades), own­ing stock in the right com­pa­nies or in­sti­tu­tions might help. (al­though some cau­tion about wor­ry­ing this over the ac­tual take­off pe­riod may be more of a dis­trac­tion than helpful)

d) re­lat­edly, sim­ply get­ting your­self psy­cholog­i­cally ready for the world to change dra­mat­i­cally may be helpful in and of it­self, and/​or be use­ful to make your­self ready to take on sud­den op­por­tu­ni­ties as they arise.