Comments on CAIS

17

Over the last few months I’ve talked with Eric Drexler a number of times about his Comprehensive AI Services (CAIS) model of AI development, and read most of his technical report on the topic. I think these are important ideas which are well worth engaging with, despite personally being skeptical about many of the conclusions. Below I’ve summarised what I see as the core components of Eric’s view, followed by some of own arguments. Note that these are only my personal opinions. I did make some changes to the summary based on Eric’s comments on early drafts, to better reflect his position - however, there are likely still ways I’ve misrepresented him. Also note that this was written before reading Rohin’s summary of the same report, although I do broadly agree with most of Rohin’s points.

One useful piece of context for this model is Eric's background in nanotechnology, and his advocacy for the development of nanotech as "atomically precise manufacturing" rather than self-replicating nanomachines. The relationship between these two frameworks has clear parallels with the relationship between CAIS and a recursively self-improving superintelligence.

The CAIS model:

The standard arguments in AI safety are concerned with the development of a single AGI agent doing open-ended optimisation. Before we build such an entity (if we do so at all), we will build AI services which each perform a bounded task with bounded resources, and which can be combined to achieve superhuman performance on a wide range of tasks.

AI services may or may not be “agents”. However, under CAIS there will be no entity optimising extremely hard towards its goals in the way that most AI safety researchers have been worrying about, because:

Each service will be relatively specialised and myopic (focused on current episodic performance, not maximisation over the whole future). This is true of basically all current AI applications, e.g. image classifiers or Google Translate.

Although rational agents can be proved equivalent to utility-maximisers, the same is not necessarily true of systems of rational agents. Most such systems are fundamentally different in structure from rational agents - for example, individual agents within the system can compete with or criticise each other. And since AI services aren’t “rational agents” in the first place, a system composed of them is even less likely to implement a utility-maximiser.

There won't be very much demand for unified AIs which autonomously carry out large-scale tasks requiring general capabilities, because systems of AI services will be able to perform those tasks just as well or better.

Early AI services could do things like massively disrupt financial markets, increase the rate of scientific discovery, help run companies, etc. Eventually they should be able to do any task that humans can, at our level or higher.

They could also be used to recursively improve AI technologies and to develop AI applications, but usually with humans in the loop - in roughly the same way that science allows us to build better tools with which to do better science.

Our priorities in doing AI safety research can and should be informed by this model:

A main role for technical AI safety researchers should be to look at the emergent properties of systems of AI services, e.g. which combinations of architectures, tasks and selection pressures could lead to risky behaviour, as well as the standard problems of specifying bounded tasks.

AI safety experts can also give ongoing advice and steer the development of AI services. AI safety researchers shouldn't think of safety as a one-shot problem, but rather a series of ongoing adjustments.

AI services will make it much easier to prevent the development of unbounded agent-like AGI through methods like increasing coordination and enabling surveillance, if the political will can be mustered.

I'm broadly sympathetic to the empirical claim that we'll develop AI services which can replace humans at most cognitively difficult jobs significantly before we develop any single superhuman AGI (one unified system that can do nearly all cognitive tasks as well as or better than any human). One plausible mechanism is that deep learning continues to succeed on tasks where there's lots of training data, but doesn't learn how to reason in general ways - e.g. it could learn from court documents how to imitate lawyers well enough to replace them in most cases, without being able to understand law in the way humans do. Self-driving cars are another pertinent example. If that pattern repeats across most human professions, we might see massive societal shifts well before AI becomes dangerous in the adversarial way that’s usually discussed in the context of AI safety.

If I had to sum up my objections to Eric’s framework in one sentence, it would be: “the more powerful each service is, the harder it is to ensure it’s individually safe; the less powerful each service is, the harder it is to combine them in a way that’s competitive with unified agents.” I’ve laid out my arguments in more detail below.

Richard’s view:

Open-ended agentlike AI seems like the most likely candidate for the first strongly superhuman AGI system.

As a basic prior, our only example of general intelligence so far is ourselves - a species composed of agentlike individuals who pursue open-ended goals. So it makes sense to expect AGIs to be similar - especially if you believe that our progress in artificial intelligence is largely driven by semi-random search with lots of compute (like evolution was) rather than principled intelligent design.

In particular, the way we trained on the world - both as a species and as individuals - was by interacting with it in a fairly unconstrained way. Many machine learning researchers believe that we’ll get superhuman AGI via a similar approach, by training RL agents in simulated worlds. Even if we then used such agents as “services”, they wouldn’t be bounded in the way predicted by CAIS.

Many complex tasks don’t easily decompose into separable subtasks. For instance, while writing this post I had to keep my holistic impression of Eric’s ideas in mind most of the time. This impression was formed through having conversations and reading essays, but was updated frequently as I wrote this post, and also draws on a wide range of my background knowledge. I don’t see how CAIS would split the task of understanding a high-level idea between multiple services, or (if it were done by a single service) how that service would interact with an essay-writing service, or an AI-safety-research service.

Note that this isn’t an argument against AGI being modular, but rather an argument that requiring the roles of each module and the ways they interface with each other to be human-specified or even just human-comprehensible will be very uncompetitive compared with learning them in an unconstrained way. Even on today’s relatively simple tasks, we already see end-to-end training outcompeting other approaches, and learned representations outperforming human-made representations. The basic reason is that we aren’t smart enough to understand how the best cognitive structures or representations work. Yet it’s key to CAIS that each service performs a specific known task, rather than just doing useful computation in general - otherwise we could consider each lobe of the human brain to be a “service”, and the combination of them to be unsafe in all the standard ways.

It’s not clear to me whether this is also an argument against IDA. I think that it probably is, but to a lesser extent, because IDA allows multiple layers of task decomposition which are incomprehensible to humans before bottoming out in subtasks which we can perform.

Even if task decomposition can be solved, humans reuse most of the same cognitive faculties for most of the tasks that we can carry out. If many AI services end up requiring similar faculties to each other, it would likely be more efficient to unify them into a single entity. It would also be more efficient if that entity could pick up new tasks in the same rapid way that humans do, because then you wouldn’t need to keep retraining. At that point, it seems like you no longer have an AI service but rather the same sort of AGI that we’re usually worried about. (In other words, meta-learning is very important but doesn’t fit naturally into CAIS).

Humans think in terms of individuals with goals, and so even if there's an equally good approach to AGI which doesn't conceive of it as a single goal-directed agent, researchers will be biased against it.

Even assuming that the first superintelligent AGI is in fact a system of services as described by the CAIS framework, it will be much more like an agent optimising for an open-ended goal than Eric claims.

There'll be significant pressure to reduce the extent to which humans are in the loop of AI services, for efficiency reasons. E.g. when a CEO can't improve on the strategic advice given to it by an AI, or the implementation by another AI, there's no reason to have that CEO any more. Then we’ll see consolidation of narrow AIs into one overall system which makes decisions and takes actions, and may well be given an unbounded goal like "maximise shareholder value". (Eric agrees that this is dangerous, and considers it more relevant than other threat models).

Even if we have lots of individually bounded-yet-efficacious modules, the task of combining them to perform well in new tasks seems like a difficult one which will require a broad understanding of the world. An overseer service which is trained to combine those modules to perform arbitrary tasks may be dangerous because if it is goal-oriented, it can use those modules to fulfil its goals (on the assumption that for most complex tasks, some combination of modules performs well - if not, then we’ll be using a different approach anyway).

While I accept that many services can be trained in a way which makes them naturally bounded and myopic, this is much less clear to me in the case of an overseer which is responsible for large-scale allocation of other services. In addition to superhuman planning capabilities and world-knowledge, it would probably require arbitrarily long episodes so that it can implement and monitor complex plans. My guess is that Eric would argue that this overseer would itself be composed of bounded services, in which case the real disagreement is how competitive that decomposition would be (which relates to point 1.2 above).

Even assuming that the first superintelligent AGI is in fact a system of services as described the CAIS framework, focusing on superintelligent agents which pursue unbounded goals is still more useful for technical researchers. (Note that I’m less confident in this claim than the others).

Eventually we’ll have the technology to build unified agents doing unbounded maximisation. Once built, such agents will eventually overtake CAIS superintelligences because they’ll have more efficient internal structure and will be optimising harder for self-improvement. We shouldn’t rely on global coordination to prevent people from building unbounded optimisers, because it’s hard and humans are generally bad at it.

Conditional on both sorts of superintelligences existing, I think (and I would guess that Eric agrees) that CAIS superintelligences are significantly less likely to cause existential catastrophe. And in general, it’s easier to reduce the absolute likelihood of an event the more likely it is (even a 10% reduction of a 50% risk is more impactful than a 90% reduction of a 5% risk). So unless we think that technical research to reduce the probability of CAIS catastrophes is significantly more tractable than other technical AI safety research, it shouldn’t be our main focus.

As a more general note, I think that one of the main strengths of CAIS is in forcing us to be more specific about what tasks we envisage AGI being used for, rather than picturing it divorced from development and deployment scenarios. However, I worry that the fuzziness of the usual concept of AGI has now been replaced by a fuzzy notion of “service” which makes sense in our current context, but may not in the context of much more powerful AI technology. So while CAIS may be a good model of early steps towards AGI, I think it is a worse model of the period I’m most worried about. I find CAIS most valuable in its role as a research agenda (as opposed to a predictive framework): it seems worth further investigating the properties of AIs composed of modular and bounded subsystems, and the ways in which they might be safer (or more dangerous) than alternatives.

Many thanks to Eric for the time he spent explaining his ideas and commenting on drafts. I also particularly appreciated feedback from Owain Evans, Rohin Shah and Jan Leike.