An Outside View of AI Control

I’ve written much on my skepticism of local AI foom (= intelligence explosion). Recently I said that foom offers the main justification I understand for AI risk efforts now, as well as being the main choice of my Twitter followers in a survey. It was the main argument offered by Eliezer Yudkowsky in our debates here at this blog, by Nick Bostrom in his book Superintelligence, and by Max Tegmark in his recent book Life 3.0 (though he denied so in his reply here).

However, some privately complained to me that I haven’t addressed those with non-foom-based AI concerns. So in this post I’ll consider AI control in the context of a prototypical non-em non-foom mostly-peaceful outside-view AI scenario. In a future post, I’ll try to connect this to specific posts by others on AI risk.

An AI scenario is where software does most all jobs; humans may work for fun, but they add little value. In a non-em scenario, ems are never feasible. As foom scenarios are driven by AI innovations that are very lumpy in time and organization, in non-foom scenarios innovation lumpiness is distributed more like it is in our world. In a mostly-peaceful scenario, peaceful technologies of production matter much more than do technologies of war and theft. And as an outside view guesses that future events are like similar past events, I’ll relate future AI control problems to similar past problems.

Conway’s law of software says that the structure of software tends to reflect the structure of the social organizations that make it. This suggests that the needs of software to have particular modularity structures for particular problems is usually weaker than the needs of organizations to maintain familiar structures of communication and authority. So a world where software does most job tasks could retain a recognizable clumping of tasks into jobs, divisions, firms, professions, industries, cities, and nations.

Today, most lumpiness in firms, industries, cities, etc. is not due to lumpiness in innovation, but instead due to various scale and scope economies, and network effects. Innovation may be modestly lumpy at the scale of particular industries, but not at the level of entire economic sectors. Most innovation comes from many small contributions; big lumps contain only a small fraction of total value.

While innovation is often faster in some areas than in others, most social rates of change tend to be near the doubling time of the economy. An AI world allows for faster growth, as it isn’t held back by slow growing humans; I’ve estimated that it might double monthly. But our first guess should be that social rates of change speed up together; we’ll need specific reasons to expect specific changes, relative to today, in relative rates of change.

Today humans tend to get job specific training in the firms where they work, and general training elsewhere. Similarly, in an AI world specific software may be written in organizations near where it is used, while more general tools are written in more distant organizations.

Today most tasks are done by human brains, which come in a standard-sized unit capable of both doing specific tasks and also related meta-tasks, such as figuring out how to do tasks better, deciding which specific tasks to do when and how, and regrouping how tasks are clumped within organizations. So we tend to first automate tasks that can be done by much smaller units. And while managers and others often specialize in meta-tasks, line workers also do many meta tasks themselves. In contrast, in an AI world meta-tasks tend to be separated more from other tasks, and so done by software that is more different.

In such a division of tasks, most tasks have a relatively narrow scope. For narrow tasks, the main risks are of doing tasks badly, and of hostile agents taking control of key resources. So most control efforts focus on such problems. For narrow tasks there is little extra risk from such tasks being done very well, even if one doesn’t understand how that happens. (War tech is of course an exception; victims can suffer more when war is done well.) The control risks on which AI risk folks focus, of very effective efforts misdirected due to unclear goals, are mainly concentrated in tasks with very wide scopes, such as in investment, management, law, and governance. These are mostly very-meta-tasks.

The future problem of keeping control of advanced software is similar to the past problems of keeping control both of physical devices, and of social organizations. As the tasks we assign to physical devices tend to be narrow, we mostly focus there on specific control failure scenarios. The main risks there are losing control to hostile agents, and doing tasks badly, rather than doing them very well. The main people to get hurt when control is lost are those who rely on such devices, or who are closely connected to such people.

Humans started out long ago organized into small informal bands, and later had to learn to deal with the new organizations and institutions of rulers, command heirarchies, markets, family clans, large scale cultures, networks of talking experts, legal systems, firms, guilds, religions, clubs, and government agencies. Such organizations are often given relatively broad tasks. And even if not taken over by hostile agents, they can drift out of control. For example, organizations may on the surface seem responsive and useful, while increasingly functioning mainly to entrench and perpetuate themselves.

When social organizations get out of control in this way, the people who initiated and then participated in them are the main folks to get hurt. So such initiators and participants thus have incentives to figure out how to avoid such control losses, and this has long been a big focus of organization innovation efforts.

Innovation in control mechanisms has long been an important part of innovation in devices and organizations. People sometimes try to develop better control mechanisms in the abstract, before they’ve seen real systems. They also sometimes experiment in the context of small test versions. But most control innovations come in response to seeing real behaviors associated with typical real versions. The main reason that it becomes harder to implement innovations later is that design features often become entrenched. But if control is important enough, it can be worth paying large costs of change to implement better control features.

Humans one hundred thousand years ago might have tried to think carefully about how to control rulers with simple command hierarchies, and people one thousand years ago might have tried to think carefully about how to control complex firms and government agencies. But the value of such early efforts would have been quite limited, and it wasn’t at all too late to work on such problems after such systems appeared. In peacetime, control failures mainly threatened those who initiated and participated in such organizations, not the world as a whole.

In the AI scenario of this post, the vast majority of advanced future software does tasks of such narrow scopes that their control risks are more like those for physical devices, relative to new social organizations. So people deploying new advanced software will know to focus extra control efforts on software doing wider scope meta-tasks. Furthermore, the main people harmed by failures to control AI assigned to meta-tasks will be those associated with the organizations that do such meta-tasks.

For example, customers who let an AI tell them whom to date may suffer from bad dates. Investors in firms that let AI manage key firm decisions might lose their investments. And citizens who let AI tell their police who to put in jail may suffer in jail, or from undiscouraged crime. But such risks are mostly local, not global risks.

Of course for a long time now, coordination scales have been slowly increasingly worldwide. So over time “local” effects become increasingly larger scale effects. This is a modest reason for everyone to slowly get more concerned about “local” problems elsewhere.

Today is a very early date to be working on AI risk; I’ve estimated that without ems it is several centuries away. We are now pretty ignorant about most aspects of how advanced software will be used and arranged. So it is hard to learn much useful today about how to control future advanced software. We can learn to better control the software we have now, and later on we should expect innovations in software control to speed up roughly as fast as do innovations in making software more effective. Even if control innovations by humans don’t speed up as fast, advanced software will itself be made of many parts, and some parts will want to keep control over other parts.

The mechanisms by which humans today maintain control over organizations include law, property rights, constitutions, command hierarchies, and even democracy. Such broad mechanisms are effective, entrenched and robust enough that future advanced software systems and organizations will almost surely continue to use variations within these broad categories keep control over each other. So humans can reasonably hope to be at least modestly protected in the short run if they can share the use of such systems with advanced software. For example, if law protects software from stealing from software, it may also protect humans from such theft.

Of course humans have long suffered from events like wars and revolutions, events that create risks of harm and loss of control. And the rate of such events can scale with the main rates of change in the economy, which go inversely as the economic doubling time. So a much faster changing future AI economy can have faster rates of such risky events. It seems a robust phenomenon that when the world speeds up, those who do not speed up with it face larger subjective risks if they do not ally with sped-up protectors.

Having humans create and become ems is one reasonable approach to creating sped-up allies for humans. Humans will no doubt also try to place advanced software in such an ally role. Once software is powerful, then attempts by humans to control such software are probably mostly based on copying the general lessons and approaches that advanced software discovers for how to maintain control over advanced software. Humans may also learn extra lessons that are specific to the human control problem, and some of those lessons may come from our era, long before any of this plays out.

But in the sort of AI scenario I’ve described in this post, I find it very hard to see such early efforts as the do-or-die priority that some seem to suggest. Outside of a foom scenario, control failures threaten to cause local, not global losses (though on increasingly larger scales).

From this view, those tempted to spend resources now on studying AI control should consider two reasonable alternatives. The first alternative is to just save more now to grow resources to be used later, when we understand more. The second alternative is to work to innovate with our general control institutions, to make them more robust, and thus better able to handle larger coordination scales, and whatever other problems the future may hold. (E.g., futarchy.)

Okay, this is how I see the AI control problem in a non-em non-foom mostly-peaceful outside-view AI scenario. But clearly others disagree with me; what am I missing?

Added 4 Oct:

In the context of foom, the usual AI concern is a total loss of control of the one super AI, whose goals quickly drift to a random point in the space of possible goals. Humans are then robustly exterminated. As the AI is so smart and inscrutable, any small loss of control is said to open the door to such extreme failure. I have presumed that those who tell me to look at non-foom AI risk are focused on similar failure scenarios.

Today most social systems suffer from agency costs, and larger costs (in % terms) for larger systems. But these mostly take the form of modestly increasing costs. It isn’t that you can’t reliably use these systems to do the things that you want. You just have to pay more. That extra cost mostly isn’t a transfer accumulating in someone else’s account. Instead there is just waste that goes to no one, and there are more cushy jobs and roles where people can comfortably sit as parasites. Over time, even though agency costs take a bigger cut, total costs get lower and humans get more of what they want.

When I say that in my prototypical non-foom AI scenario, AI will still pay agency costs but the AI control problem seems mostly manageable, I mean that very competent future social and software systems will suffer from waste and parasites as do current systems, but that humans can still reliably use such systems to get what they want. Not only are humans not exterminated, they get more than before of what they want.