Rough blog posts to summarize my research as I try to build a model of how things work that's good enough for me to start acting in the world.

Update 7: conceptual vocabulary, writing, feedback, lit review

I’m redefining my current research project: I’m limiting its current scope to creating a conceptual framework, not getting empirical results.

I’m dropping my commitment to regular updates.

I got some good feedback on my draft research hierarchy writeup. Core issues I’m going to emphasize are:

Conflict vs cooperation as drivers of AI progress during a takeoff scenario.

Creating disjunctive arguments that put considerations in some relation to each other and make it clear which scenarios are mutually exclusive alternatives to one another, which are elaborations on a basic dynamic, and which might overlap.

I plan to review more of the existing on AI risk to see whether anyone else has already made substantial progress I don’t know about, on a conceptual framework for AI risk.

Conceptual framework

Previously I framed this research project as one to figure out facts about the world, present and future, relevant to what projects I might work on. I’ve found it hard to stay motivated using this framework, and I think this is partly because I expect it to fail. It’s very difficult to predict how long it will take, what the form of the work will be, and whether I’ll get results at all. A large part of this is because I don’t have a good conceptual framework for thinking about AI risk.

Because of this, I’m no longer thinking of this project in terms of the specific questions I need answered. Enough of them center around AI risk that it makes sense to focus on building my ability to properly frame questions in that domain. My current project is now to build a conceptual framework in which I can place particular empirical or theoretical questions. Once I have enough of a framework, it should be much easier to evaluate the value of attempting to answer particular questions.

I’ve done work creating a conceptual framework before, with some success. Several months ago I realized that my thinking around trust and interpersonal relationships was too binary. I knew I needed a more nuanced view, but knowing I needed it wasn’t enough to enable me to actually have it. So I spent some time thinking about where I was confused, paid attention to my experiences, chased my sense of curiosity, and wrote things up on my personal blog when I felt I had something to say. Now I have much better models in this domain and am able to break down my former monocategory of “trust” into a bunch of tighter clusters, so I don’t reliably get surprised in both directions.

Irregular updates

I’ve already been writing updates irregularly, but now I’m making it official. My commitment is to write things up as I think of them, as part of the process of thinking, without a set schedule. When I dropped my commitment to regular blogging on my personal blog, both the quantity and quality of my output shot way up. I wouldn’t be surprised if that happens here too, though I’m not counting on it.

Feedback on research hierarchy writeup

Conflict vs cooperation

It seems like a major different between various models of AI takeoff is whether AI progress depends mostly on conflict or cooperation. Economic models of AI takeoff seem to assume that investments in AI will mainly be based on the short-run profitability of improved intelligence. Other models of AI takeoff assume that at some point the most advanced AIs will be able to directly seize resources quickly and cheaply enough that they will simply ignore the broader economy. Figuring out which of these will happen seems very important for modeling an AI takeoff - safeguards that work in one scenario may not work in another.

Disjunctive arguments

One way of describing my discomfort with the current discourse around AI risk is that most research and argument I’ve read seems to be either hardly contextualized at all, or promoting a single thesis. Since the single thesis promoted in one argument is often neither the same as, nor the negation of, the thesis promoted in another, it’s hard to compare arguments, or to tell whether key considerations are being left out.

By contrast, Nick Bostrom’s formulation of the simulation argument seems easy to evaluate conceptually. An important aspect of this is that it makes the relevant disjunction explicit:

This paper argues that at least one of the following propositions is true:

the human species is very likely to go extinct before reaching a “posthuman” stage;

any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof);

we are almost certainly living in a computer simulation.

It follows that the belief that there is a significant chance that we will one day become posthumans who run ancestor-simulations is false, unless we are currently living in a simulation. A number of other consequences of this result are also discussed.

I don’t expect that I can build an AI risk framework out of a disjunction this simple and elegant, but I hope to at least set up a series of disjunctions that are similarly clear.

Literature review

Someone else might have already done the work to create a conceptual framework relevant to my questions, so I plan to review the literature on AI risk to see if this has happened in whole or in part.