Daily summaries vs research content

As you may have noticed, I’ve only posted a couple of updates in the many days since I started this blog. Part of this was due to a bad cold screwing up my tempo, but I did get things done on a fair number of those days. I think I felt implicit pressure to have a good polished summary of a finished thing at the end of the day. I still want my content summaries to be polished, but I don’t want them to delay summaries of what I did, so I’m going to treat those separately going forward. This is just an update around process and mentioning what I worked on. Separately I’ll be publishing some writeups of things I’ve looked into soon.

URL change

I moved this blog from WordPress.com to my own domain, in order to have more control over plugins. In particular, I expect to need a LaTEX plugin for writing equations.

Work style

I started out by trying to block off several hours each day for research and commit to actually researching, on the hypothesis that even if the last hour wasn’t high-output in terms of immediate research produced, it would be valuable the way the last set in a workout is valuable - to load the problems I’m working on more firmly into my mind.

The last few workdays were very different and felt unusually productive. I think that’s because I focused separately on finding an angle that seemed exciting, and then on exploring it. This means that I still need to be disciplined in blocking out time to work, but I need to be willing to take more breaks and let my mind wander when my current path doesn’t look interesting. The most valuable hour I spend on this project over the last few weeks was a lunch that got me interested in mathematizing some intuitions around AI takeoff.

Overall I'd characterize this shift as one from using the diligence motivational architecture to using the obsession architecture.

Research update

Thought-provoking conversations

Paul Christiano

A few weeks ago I overheard my housemate Paul complaining that other people were wrong about AI takeoff speed, and asked him whether he’d be willing to meet and help me understand his thinking on it. We talked about it over lunch, and it helped me flesh out some of the relevant considerations I want to look into. I don’t want to talk about what Paul’s opinions were, because it seems likely that I’ll accidentally misrepresent them, but I can talk about the things I started to consider during our conversation:

Intelligence as economic productivity - One intuitive way to quantify intelligence is as economic productivity. This can connect reasoning about AI to existing economic models that have been well studies.

Intelligence as an Elo-like score - Another way to think about the gains from higher intelligence is to think about how likely an AI is to be able to hack another AI, and how costly this would be to the victor.

An economic model of compounding gains to intelligence - AIs don’t have to have a steep improvements based on internal progress alone to quickly take over almost all of the economy - they just need to outcompete us for economic resources.

This model feels very compatible with outcomes that are at least initially multipolar, but not necessarily human-friendly, depending on how disadvantageous it is to value things other than maximizing economic productivity.

This also suggests that there may be an extended period where AI gets steadily less manageable, rather than a sudden discontinuous takeoff.

If this outcome is likely, then I’m not sure what we should be paying attention to, but should look more closely into Robin Hanson’s work and find out who else is doing similar work.

Pre-takeoff transition - It’s likely to be profitable for AI teams to use their AIs to make money in the broader economy before, while, and for some time after AIs are around human level. This is likely to be a big change in society that reallocates a lot of resources to AIs and the people who built them even before the development of superintelligence, and what happens during this period is a big part of what determines AI takeoff dynamics. We could plausibly have about a year of this before AIs start massively outperforming humans.

Smooth historical economic and technological progress - There’s a plausible way of looking at world history where apparently big tech jumps such as the industrial revolution are just the continuation of existing trends.

A cybersecurity model of compounding gains to intelligence - If smarter AIs can easily seize all the computational resources of other AIs, then there might be some threshold where the first mover gets a sudden advantage by controlling much of the world’s hardware.

This model feels very compatible with singleton AI scenarios

If this outcome is likely, then that we should pay attention to current economic and political dynamics with respect to cybersecurity.

A self-improvement model of compounding gains to intelligence - this is the intuition that I get from reading Eliezer Yudkowsky’s writings on superintelligence, and Nick Bostrom’s Superintelligence - that algorithmic progress enables more algorithmic progress, and this blows up into a superintelligence without necessarily making much of a mark on the broader world first.=

This can be thought of as a component of the economic model, where an AI can acquire hardware (and possibly buy off-the-shelf algorithms), or work on its own software (and maybe eventually hardware as well).

I decided to follow up this conversation by trying to mathematize my intuitions around the economic model of AI takeoff, to see what evidence it would take to persuade me either that it collapses into the self-improvement model, or that self-improvement is likely to be only a very small part of AI progress.

Andrew Critch

A couple of days later, at couple of friends’ commitment ceremony, I was talking with Critch, another friend of mine who is thinking about AI risk. I brought up my thoughts around cybersecurity as an important part of takeoff dynamics, and we played around with the idea a bit. I hope to make my intuitions about this more explicit after I’ve put something up around the economic model of AI takeoff dynamics.

After that, I made my first-ever use of my Critch-approved gigantic pad of paper (verdict: unclear. It’s unwieldy to write on and I didn’t have a good surface for it, but it was nice to just keep not needing to start a new page) by trying to write down a simplified economic model of AI takeoff dynamics as a system of equations. It took me a couple of frustrated days before I realized I knew enough to actually solve for the things I cared about, and a few more substantially more satisfying days to line everything up nicely, but I think I have a slightly interesting result to share. I intend to post about it soon.